Play with a live Neptune project -> Take a tour 📈

Performance Metrics in Machine Learning [Complete Guide]

Performance metrics are a part of every machine learning pipeline. They tell you if you’re making progress, and put a number on it. All machine learning models, whether it’s linear regression, or a SOTA technique like BERT , need a metric to judge performance.

Every machine learning task can be broken down to either Regression or Classification , just like the performance metrics. There are dozens of metrics for both problems, but we’re gonna discuss popular ones along with what information they provide about model performance . It’s important to know how your model sees your data!

If you ever participated in a Kaggle competition, you probably noticed the evaluation section. More often than not, there’s a metric on which they judge your performance.

Metrics are different from loss functions . Loss functions show a measure of model performance. They’re used to train a machine learning model (using some kind of optimization like Gradient Descent), and they’re usually differentiable in the model’s parameters. 

Metrics are used to monitor and measure the performance of a model (during training and testing), and don’t need to be differentiable. 

However, if, for some tasks, the performance metric is differentiable, it can also be used as a loss function (perhaps with some regularizations added to it), such as MSE.

May be useful

If you’re looking for an automated way to monitor your model’s performance metrics, check neptune.ai . Here’s the documentation that explains how tracking metricks works (with example).

Regression metrics

Regression models have continuous output. So, we need a metric based on calculating some sort of distance between predicted and ground truth.

In order to evaluate Regression models, we’ll discuss these metrics in detail:

  • Mean Absolute Error (MAE),
  • Mean Squared Error (MSE),
  • Root Mean Squared Error (RMSE),
  • R² (R-Squared).

Note: We’ll use the Boston Housing dataset to implement regressive metrics. You can find the notebook containing all the code used in this blog here .

Mean Squared Error (MSE)

Mean squared error is perhaps the most popular metric used for regression problems. It essentially finds the average of the squared difference between the target value and the value predicted by the regression model.

measuring problem solving performance in ai

  • y_j: ground-truth value
  • y_hat: predicted value from the regression model
  • N: number of datums

Performance metrics - MSE

Few key points related to MSE:

  • It’s differentiable, so it can be optimized better.
  • It penalizes even small errors by squaring them, which essentially leads to an overestimation of how bad the model is.
  • Error interpretation has to be done with squaring factor(scale) in mind. For example in our Boston Housing regression problem, we got MSE=21.89 which primarily corresponds to (Prices)².
  • Due to the squaring factor, it’s fundamentally more prone to outliers than other metrics.

This can be implemented simply using NumPy arrays in Python.

Mean Absolute Error (MAE)

Mean Absolute Error is the average of the difference between the ground truth and the predicted values. Mathematically, its represented as :

measuring problem solving performance in ai

Few key points for MAE

  • It’s more robust towards outliers than MAE, since it doesn’t exaggerate errors.
  • It gives us a measure of how far the predictions were from the actual output. However, since MAE uses absolute value of the residual, it doesn’t give us an idea of the direction of the error, i.e. whether we’re under-predicting or over-predicting the data.
  • Error interpretation needs no second thoughts, as it perfectly aligns with the original degree of the variable.
  • MAE is non-differentiable as opposed to MSE, which is differentiable.

Similar to MSE, this metric is also simple to implement.

Root Mean Squared Error (RMSE)

Root Mean Squared Error corresponds to the square root of the average of the squared difference between the target value and the value predicted by the regression model. Basically, sqrt(MSE). Mathematically it can be represented as:

measuring problem solving performance in ai

It addresses a few downsides in MSE.

Few key points related to RMSE:

  • It retains the differentiable property of MSE.
  • It handles the penalization of smaller errors done by MSE by square rooting it.
  • Error interpretation can be done smoothly, since the scale is now the same as the random variable.
  • Since scale factors are essentially normalized, it’s less prone to struggle in the case of outliers.

Implementation is similar to MSE:

R² Coefficient of determination

R² Coefficient of determination actually works as a post metric, meaning it’s a metric that’s calculated using other metrics. 

The point of even calculating this coefficient is to answer the question “How much (what %) of the total variation in Y(target) is explained by the variation in X(regression line)”

This is calculated using the sum of squared errors. Let’s go through the formulation to understand it better.

Total variation in Y (Variance of Y):

measuring problem solving performance in ai

Percentage of variation described the regression line:

measuring problem solving performance in ai

Subsequently, the percentage of variation described the regression line:

measuring problem solving performance in ai

Finally, we have our formula for the coefficient of determination, which can tell us how good or bad the fit of the regression line is:

measuring problem solving performance in ai

This coefficient can be implemented simply using NumPy arrays in Python.

Few intuitions related to R² results:

  • If the sum of Squared Error of the regression line is small => R² will be close to 1 (Ideal), meaning the regression was able to capture 100% of the variance in the target variable. 
  • Conversely, if the sum of squared error of the regression line is high => R² will be close to 0, meaning the regression wasn’t able to capture any variance in the target variable.
  • You might think that the range of R² is (0,1) but it’s actually (-∞,1) because the ratio of squared errors of the regression line and mean can surpass the value 1 if the squared error of regression line is too high (>squared error of the mean).

Adjusted R²

The Vanilla R² method suffers from some demons, like misleading the researcher into believing that the model is improving when the score is increasing but in reality, the learning is not happening. This can happen when a model overfits the data, in that case the variance explained will be 100% but the learning hasn’t happened. To rectify this, R² is adjusted with the number of independent variables.

Adjusted R² is always lower than R², as it adjusts for the increasing predictors and only shows improvement if there is a real improvement.

measuring problem solving performance in ai

  • n = number of observations
  • k = number of independent variables
  • Ra² = adjusted R²

Classification metrics

Classification problems are one of the world’s most widely researched areas. Use cases are present in almost all production and industrial environments. Speech recognition, face recognition, text classification – the list is endless. 

Classification models have discrete output, so we need a metric that compares discrete classes in some form . Classification Metrics evaluate a model’s performance and tell you how good or bad the classification is, but each of them evaluates it in a different way.

May interest you

24 Evaluation Metrics for Binary Classification (And When to Use Them)

So in order to evaluate Classification models, we’ll discuss these metrics in detail:

  • Confusion Matrix (not a metric but fundamental to others)
  • Precision and Recall

Note: We’re gonna use the UCI Breast cancer dataset to implement classification metrics. You can find the notebook containing all the code used in this blog here .

Classification accuracy is perhaps the simplest metric to use and implement and is defined as the number of correct predictions divided by the total number of predictions, multiplied by 100 . 

We can implement this by comparing ground truth and predicted values in a loop or simply utilizing the scikit-learn module to do the heavy lifting for us (not so heavy in this case).

F1 Score vs ROC AUC vs Accuracy vs PR AUC: Which Evaluation Metric Should You Choose?

Start by just importing the accuracy_score function from the metrics class.

Then, just by passing the ground truth and predicted values, you can determine the accuracy of your model:

Confusion Matrix

Confusion Matrix is a tabular visualization of the ground-truth labels versus model predictions . Each row of the confusion matrix represents the instances in a predicted class and each column represents the instances in an actual class. Confusion Matrix is not exactly a performance metric but sort of a basis on which other metrics evaluate the results.

In order to understand the confusion matrix, we need to set some value for the null hypothesis as an assumption. For example, from our Breast Cancer data, let’s assume our Null Hypothesis H⁰ be “ The individual has cancer ”.

Performance metrics confusion matrix

Each cell in the confusion matrix represents an evaluation factor. Let’s understand these factors one by one:

  • True Positive(TP) signifies how many positive class samples your model predicted correctly.
  • True Negative(TN) signifies how many negative class samples your model predicted correctly.
  • False Positive(FP) signifies how many negative class samples your model predicted incorrectly. This factor represents Type-I error in statistical nomenclature. This error positioning in the confusion matrix depends on the choice of the null hypothesis.
  • False Negative(FN) signifies how many positive class samples your model predicted incorrectly. This factor represents Type-II error in statistical nomenclature. This error positioning in the confusion matrix also depends on the choice of the null hypothesis.

We can calculate the cell values using the code below:

We’ll look at the Confusion Matrix in two different states using two sets of hyper-parameters in the Logistic Regression Classifier.

Precision is the ratio of true positives and total positives predicted:

Performance metrics - precision

0<P<1

The precision metric focuses on Type-I errors (FP). A Type-I error occurs when we reject a true null Hypothesis(H⁰) . So, in this case, Type-I error is incorrectly labeling cancer patients as non-cancerous. 

A precision score towards 1 will signify that your model didn’t miss any true positives, and is able to classify well between correct and incorrect labeling of cancer patients. What it cannot measure is the existence of Type-II error, which is false negatives – cases when a non-cancerous patient is identified as cancerous.

A low precision score (<0.5) means your classifier has a high number of false positives which can be an outcome of imbalanced class or untuned model hyperparameters. In an imbalanced class problem, you have to prepare your data beforehand with over/under-sampling or focal loss in order to curb FP/FN.

For Set-I hyperparameters:

measuring problem solving performance in ai

Output for the above code snippet

As you would have guessed by looking at the confusion matrix values, that FP’s are 0, so the condition is perfect for a 100% precise model on a given hyperparameter setting. In this setting, no type-I error is reported, so the model has done a great job to curb incorrectly labeling cancer patients as non-cancerous.

For set-II hyperparameters:

measuring problem solving performance in ai

Since only type-I error remains in this setting, the precision rate goes down despite the fact that type-II error is 0.

We can deduce from our example that only precision cannot tell you about your model performance on various grounds.

Recall/Sensitivity/Hit-Rate

A Recall is essentially the ratio of true positives to all the positives in ground truth.

Performance metrics - recall

0<R<1

The recall metric focuses on type-II errors (FN). A type-II error occurs when we accept a false null hypothesis(H⁰) . So, in this case, type-II error is incorrectly labeling non-cancerous patients as cancerous. 

Recall towards 1 will signify that your model didn’t miss any true positives, and is able to classify well between correctly and incorrectly labeling of cancer patients. 

What it cannot measure is the existence of type-I error which is false positives i.e the cases when a cancerous patient is identified as non-cancerous.

A low recall score (<0.5) means your classifier has a high number of false negatives which can be an outcome of imbalanced class or untuned model hyperparameters. In an imbalanced class problem, you have to prepare your data beforehand with over/under-sampling or focal loss in order to curb FP/FN.

For set-I hyperparameters:

measuring problem solving performance in ai

From the above confusion matrix values, there is 0 possibility of type-I errors and an abundance of type-II errors. That’s the reason behind the low recall score. It only focuses on type-II errors.

measuring problem solving performance in ai

The only error that’s persistent in this set is type-I errors and no type-II errors are reported. This means that this model has done a great job to curb incorrectly labeling non-cancerous patients as cancerous.

The major highlight of the above two metrics is that both can only be used in specific scenarios since both of them identify only one set of errors.

Precision-Recall tradeoff

To i mprove your model , you can either improve precision or recall – but not both! If you try to reduce cases of non-cancerous patients being labeled as cancerous (FN/type-II), no direct effect will take place on cancerous patients being labeled as non-cancerous.

Here’s a plot depicting the same tradeoff:

Performance metrics - precision recall

This tradeoff highly impacts real-world scenarios, so we can deduce that precision and recall alone aren’t very good metrics to rely on and work with. That’s the reason you see many corporate reports and online competitions urge the submission metric to be a combination of precision and recall.

The F1-score metric uses a combination of precision and recall. In fact, the F1 score is the harmonic mean of the two. The formula of the two essentially is:

measuring problem solving performance in ai

Now, a high F1 score symbolizes a high precision as well as high recall. It presents a good balance between precision and recall and gives good results on imbalanced classification problems .

A low F1 score tells you (almost) nothing — it only tells you about performance at a threshold. Low recall means we didn’t try to do well on very much of the entire test set. Low precision means that, among the cases we identified as positive cases, we didn’t get many of them right. 

But low F1 doesn’t say which cases. High F1 means we likely have high precision and recall on a large portion of the decision (which is informative). With low F1, it’s unclear what the problem is (low precision or low recall?), and whether the model suffers from type-I or type-II error.

So, is F1 just a gimmick? Not really, it’s widely used, and considered a fine metric to converge onto a decision, but not without some tweaks. Using FPR (false positive rates) along with F1 will help curb type-I errors, and you’ll get an idea about the villain behind your low F1 score.

measuring problem solving performance in ai

If you recall our scores in set-I parameters were, P=1 and R=0.49. Thus, by employing both of the metrics we get a score of 0.66 which doesn’t give you information about what type of error is significant, but is still useful in deducing the performance of the model.

measuring problem solving performance in ai

For set-II, parameters were, P=0.35 and R=1. So again, the F1 score sort of sums up the break between P and R. Still, low F1 doesn’t tell you which error is happening.

F1 is no doubt one of the most popular metrics to judge model performance. It’s actually a subset of wider metrics known as the F-scores.

measuring problem solving performance in ai

Putting in beta=1 will fetch you the F1 score. 

AUROC (Area under Receiver operating characteristics curve)

Better known as AUC-ROC score/curves. It makes use of true positive rates(TPR) and false positive rates(FPR).

measuring problem solving performance in ai

  • Intuitively TPR/recall corresponds to the proportion of positive data points that are correctly considered as positive, with respect to all positive data points. In other words, the higher the TPR, the fewer positive data points we will miss.
  • Intuitively FPR/fallout corresponds to the proportion of negative data points that are mistakenly considered as positive, with respect to all negative data points. In other words, the higher the FPR, the more negative data points we will misclassify.

To combine the FPR and the TPR into a single metric, we first compute the two former metrics with many different thresholds for the logistic regression, then plot them on a single graph. The resulting curve is called the ROC curve, and the metric we consider is the area under this curve, which we call AUROC.

Performance metrics - AUROC

A no-skill classifier is one that can’t discriminate between the classes, and would predict a random class or a constant class in all cases. The no-skill line changes based on the distribution of the positive to negative classes. It’s a horizontal line with the value of the ratio of positive cases in the dataset. For a balanced dataset, it’s 0.5.

The area equals the probability that a randomly chosen positive example ranks above (is deemed to have a higher probability of being positive than negative) a randomly chosen negative example.

So, high ROC simply means that the probability of a randomly chosen positive example is indeed positive. High ROC also means your algorithm does a good job at ranking test data, with most negative cases at one end of a scale and positive cases at the other.

ROC curves aren’t a good choice when your problem has a huge class imbalance. The reason for this is not straightforward but can be intuitively seen using the formulas, you can read more about it here . You can still use them in that scenario after processing an imbalance set, or using focal loss techniques.

The AUROC metric has no use other than academic research, and comparing different classifiers.

I hope that you now understand the importance of performance metrics in model evaluation , and know a few quirky little hacks for understanding the soul of your model.

One really important thing to note is that you can adjust these metrics to cater to your specific use case.

For example, take a weighted F1-score . It calculates metrics for each label, and finds their average weight by support (the number of true instances for each label). 

Another example could be a weighted accuracy, or in technical terms: Balanced Accuracy . Balanced accuracy in binary and multiclass classification problems is used to deal with imbalanced datasets. It’s defined as the average recall obtained in each class. Like we mentioned, “cater to specific use cases” , like imbalanced classes.

You can find the notebook containing all the code used in this blog here .

Keep experimenting!

That’s it for now, thank you for reading, and stay tuned for more! Adios!

  • What does your classification metric tell you about your data?
  • Statistics for Business and Economics by Andersonn et al.

Was the article useful?

More about performance metrics in machine learning [complete guide], check out our product resources and related articles below:, scaling machine learning experiments with neptune.ai and kubernetes, building mlops capabilities at gitlab as a one-person ml platform team, how to optimize hyperparameter search using bayesian optimization and optuna, customizing llm output: post-processing techniques, explore more content topics:, manage your model metadata in a single place.

Join 50,000+ ML Engineers & Data Scientists using Neptune to easily log, compare, register, and share ML metadata.

  • Tips & Tricks
  • About TechTalks
  • About Ben Dickson
  • Write for TechTalks

measuring problem solving performance in ai

Boost LLM application development with many-shot learning

apple openai microsoft

Why OpenAI did not release a native ChatGPT app for Windows

robot reasoning chain of thought

How far can you trust chain-of-thought prompting?

robot memory document

Train your LLMs to choose between RAG and internal memory automatically

apple language model

What OpenELM language models say about Apple’s generative AI strategy

robot teaching llama

Fine-tune a Llama-2 language model with a single instruction

robot webcam

What to know about the rising threat of deepfake scams

llamas

4 reasons to use open-source LLMs (especially after the OpenAI drama)

llama rag

No-code retrieval augmented generation (RAG) with LlamaIndex and ChatGPT

lightweight flying llamas

How to make your LLMs lighter with GPTQ quantization

multi-modal language model

What to know about open-source alternatives to GPT-4 Vision

baby llama llm compression

The complete guide to LLM compression

3D gradient descent

A simple guide to gradient descent in machine learning

swirls abstract data

The complete guide to LLM fine-tuning

vector abstract background

What is low-rank adaptation (LoRA)?

AI cybersecurity

What to know about the security of open-source machine learning models

open-source language models

Understanding the impact of open-source language models

deep learning revolution

What we learned from the deep learning revolution

language model document retrieval

AI21 Labs’ mission to make large language models get their facts…

integrated circuit board ic

Democratizing the hardware side of large language models

Artificial intelligence: how to measure the “i” in ai.

Go board

This article is part of  Demystifying AI , a series of posts that (try to) disambiguate the jargon and myths surrounding AI.

Last week, Lee Se-dol, the South Korean Go champion who lost in a historical matchup against DeepMind’s artificial intelligence algorithm AlphaGo in 2016, declared his retirement from professional play.

“With the debut of AI in Go games, I’ve realized that I’m not at the top even if I become the number one through frantic efforts,” Lee told the  Yonhap news agency . “ Even if I become the number one, there is an entity that cannot be defeated.”

Predictably, Se-dol’s comments quickly made the rounds across prominent tech publications, some of them using sensational headlines with AI dominance themes.

Since the dawn of AI, games have been one of the main benchmarks to evaluate the efficiency of algorithms. And thanks to advances in deep learning and reinforcement learning , AI researchers are creating programs that can master very complicated games and beat the most seasoned players across the world. Uninformed analysts have been picking up on these successes to suggest that AI is becoming smarter than humans.

But at the same time, contemporary AI fails miserably at some of the most basic that every human can perform.

This begs the question, does mastering a game prove anything? And if not, how can you measure the level of intelligence of an AI system?

Take the following example. In the picture below, you’re presented with three problems and their solution. There’s also a fourth task that hasn’t been solved. Can you guess the solution?

Abstraction Reasoning Corpus problem

You’re probably going to think that it’s very easy. You’ll also be able to solve different variations of the same problem with multiple walls, and multiple lines, and lines of different colors, just by seeing these three examples. But currently, there’s no AI system, including the ones being developed at the most prestigious research labs, that can learn to solve such a problem with so few examples.

The above example is from “ The Measure of Intelligence ,” a paper by François Chollet, the creator of Keras deep learning library. Chollet published this paper a few weeks before Le-sedol declared his retirement. In it, he provided many important guidelines on understanding and measuring intelligence.

Ironically, Chollet’s paper did not receive a fraction of the attention it needs. Unfortunately, the media is more interested in covering exciting AI news that gets more clicks . The 62-page paper contains a lot of invaluable information and is a must-read for anyone who wants to understand the state of AI beyond the hype and sensation.

But I will do my best to summarize the key recommendations Chollet makes on measuring AI systems and comparing their performance to that of human intelligence.

What’s wrong with current AI?

“The contemporary AI community still gravitates towards benchmarking intelligence by comparing the skill exhibited by AIs and humans at specific tasks, such as board games and video games,” Chollet writes, adding that solely measuring skill at any given task falls short of measuring intelligence.

In fact, the obsession with optimizing AI algorithms for specific tasks has entrenched the community in narrow AI . As a result, work in AI has drifted away from the original vision of developing “thinking machines” that possess intelligence comparable to that of humans.

“Although we are able to engineer systems that perform extremely well on specific tasks, they have still stark limitations, being brittle, data-hungry, unable to make sense of situations that deviate slightly from their training data or the assumptions of their creators, and unable to repurpose themselves to deal with novel tasks without significant involvement from human researchers,” Chollet notes in the paper.

Chollet’s observations are in line with those made by other scientists on the limitations and challenges of deep learning systems . These limitations manifest themselves in many ways:

  • AI models that need millions of examples to perform the simplest tasks
  • AI systems that fail as soon as they face corner cases, situations that fall outside of their training examples
  • Neural networks that are prone to adversarial examples , small perturbations in input data that cause the AI to behave erratically

Here’s an example: OpenAI’s Dota-playing neural networks needed 45,000 years’ worth of gameplay to reach a professional level. The AI is also limited in the number of characters it can play, and the slightest change to the game rules will result in a sudden drop in its performance.

The same can be seen in other fields, such as self-driving cars . Despite millions of hours of road experience, the AI algorithms that power autonomous vehicles can make stupid mistakes, such as crashing into lane dividers or parked firetrucks .

What is intelligence?

Human mind vs artificial intelligence

One of the key challenges that the AI community has struggled with is defining intelligence. Scientists have debated for decades on providing a clear definition that allows us to evaluate AI systems and determine what is intelligent or not.

Chollet borrows the definition by DeepMind cofounder Shane Legg and AI scientist Marcus Hutter: “Intelligence measures an agent’s ability to achieve goals in a wide range of environments.”

Key here is “achieve goals” and “wide range of environments.” Most current AI systems are pretty good at the first part, which is to achieve very specific goals, but bad at doing so in a wide range of environments. For instance, an AI system that can detect and classify objects in images will not be able to perform some other related task, such as drawing images of objects.

Chollet then examines the two dominant approaches in creating intelligence systems: symbolic AI and machine learning.

Symbolic AI vs machine learning

Early generations of AI research focused on symbolic AI , which involves creating an explicit representation of knowledge and behavior in computer programs. This approach requires human engineers to meticulously write the rules that define the behavior of an AI agent.

“It was then widely accepted within the AI community that the ‘problem of intelligence’ would be solved if only we could encode human skills into formal rules and encode human knowledge into explicit databases,” Chollet observes.

But rather than being intelligent by themselves, these symbolic AI systems manifest the intelligence of their creators in creating complicated programs that can solve specific tasks.

The second approach, machine learning systems , is based on providing the AI model with data from the problem space and letting it develop its own behavior. The most successful machine learning structure so far is artificial neural networks , which are complex mathematical functions that can create complex mappings between inputs and outputs.

For instance, instead of manually coding the rules for detecting cancer in x-ray slides, you feed a neural network with many slides annotated with their outcomes, a process called “training.” The AI examines the data and develops a mathematical model that represents the common traits of cancer patterns. It can then process new slides and outputs how likely it is that the patients have cancer.

Advances in neural networks and deep learning have enabled AI scientists to tackle many tasks that were previously very difficult or impossible with classic AI, such as natural language processing , computer vision and speech recognition.

Neural network–based models, also known as connectionist AI, are named after their biological counterparts. They are based on the idea that the mind is a “blank slate” (tabula rasa) that turns experience (data) into behavior. Therefore, the general trend in deep learning has become to solve problems by creating bigger neural networks and providing them with more training data to improve their accuracy.

Chollet rejects both approaches because none of them has been able to create generalized AI that is flexible and fluid like the human mind.

“We see the world through the lens of the tools we are most familiar with. Today, it is increasingly apparent that both of these views of the nature of human intelligence—either a collection of special-purpose programs or a general-purpose Tabula Rasa—are likely incorrect,” he writes.

Truly intelligent systems should be able to develop higher-level skills that can span across many tasks. For instance, an AI program that masters Quake 3 should be able to play other first-person shooter games at a decent level. Unfortunately, the best that current AI systems achieve is “local generalization,” a limited maneuver room within their own narrow domain.

Levels of intelligence

The requirements of broad and general AI

In his paper, Chollet argues that the “generalization” or “generalization power” for any AI system is its “ability to handle situations (or tasks) that differ from previously encountered situations.”

Interestingly, this is a missing component of both symbolic and connectionist AI. The former requires engineers to explicitly define its behavioral boundary and the latter requires examples that outline its problem-solving domain.

Chollet also goes further and speaks of “developer-aware generalization,” which is the ability of an AI system to handle situations that “neither the system nor the developer of the system have encountered before.”

This is the kind of flexibility you would expect from a robo-butler that could perform various chores inside a home without having explicit instructions or training data on them. An example is Steve Wozniak’s famous coffee test, in which a robot would enter a random house and make coffee without knowing in advance the layout of the home or the appliances it contains.

Elsewhere in the paper, Chollet makes it clear that AI systems that cheat their way toward their goal by leveraging priors (rules) and experience (data) are not intelligent. For instance, consider Stockfish, the best rule-base chess-playing program. Stockfish, an open-source project, is the result of contributions from thousands of developers who have created and fine-tuned tens of thousands of rules. A neural network–based example is AlphaZero , the multi-purpose AI that has conquered several board games by playing them millions of times against itself.

Both systems have been optimized to perform a specific task by making use of resources that are beyond the capacity of the human mind. The brightest human can’t memorize tens of thousands of chess rules. Likewise, no human can play millions of chess games in a lifetime.

“Solving any given task with beyond-human level performance by leveraging either unlimited priors or unlimited data does not bring us any closer to broad AI or general AI, whether the task is chess, football, or any e-sport,” Chollet notes.

This is why it’s totally wrong to compare Deep Blue, Alpha Zero, AlphaStar or any other game-playing AI with human intelligence.

Likewise, other AI models, such as Aristo, the program that can pass an eighth-grade science test , does not possess the same knowledge as a middle school student. It owes its supposed scientific abilities to the huge corpora of knowledge it was trained on, not its understanding of the world of science.

(Note: Some AI researchers, such as computer scientist Rich Sutton, believe that the true direction for artificial intelligence research should be methods that can scale with the availability of data and compute resources.)

human brain thinking cognitive science

The Abstraction Reasoning Corpus

In the paper, Chollet presents the Abstraction Reasoning Corpus (ARC), a dataset intended to evaluate the efficiency of AI systems and compare their performance with that of human intelligence. ARC is a set of problem-solving tasks that tailored for both AI and humans.

One of the key ideas behind ARC is to level the playing ground between humans and AI. It is designed so that humans can’t take advantage of their vast background knowledge of the world to outmaneuver the AI. For instance, it doesn’t involve language-related problems, which AI systems have historically struggled with .

On the other hand, it’s also designed in a way that prevents the AI (and its developers) from cheating their way to success. The system does not provide access to vast amounts of training data. As in the example shown at the beginning of this article, each concept is presented with a handful of examples.

The AI developers must build a system that can handle various concepts such as object cohesion, object persistence, and object influence. The AI system must also learn to perform tasks such as scaling, drawing, connecting points, rotating and translating.

ARC test example

Also, the test dataset, the problems that are meant to evaluate the intelligence of the developed system, are designed in a way that prevents developers from solving the tasks in advance and hard-coding their solution in the program. Optimizing for evaluation sets is a popular cheating method in data science and machine learning competitions.

According to Chollet, “ARC only assesses a general form of fluid intelligence, with a focus on reasoning and abstraction.” This means that the test favors “program synthesis,” the subfield of AI that involves generating programs that satisfy high-level specifications. This approach is in contrast with current trends in AI, which are inclined toward creating programs that are optimized for a limited set of tasks (e.g., playing a single game).

In his experiments with ARC, Chollet has found that humans can fully solve ARC tests. But current AI systems struggle with the same tasks. “To the best of our knowledge, ARC does not appear to be approachable by any existing machine learning technique (including Deep Learning), due to its focus on broad generalization and few-shot learning,” Chollet notes.

While ARC is a work in progress, it can become a promising benchmark to test the level of progress toward human-level AI . “We posit that the existence of a human-level ARC solver would represent the ability to program an AI from demonstrations alone (only requiring a handful of demonstrations to specify a complex task) to do a wide range of human-relatable tasks of a kind that would normally require human-level, human-like fluid intelligence,” Chollet observes.

RELATED ARTICLES MORE FROM AUTHOR

robot llm long context

Will infinite context windows kill LLM fine-tuning and RAG?

Thanks for such a nice article.

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed .

Privacy Overview

Discover more from techtalks.

Subscribe now to keep reading and get access to the full archive.

Type your email…

Continue reading

codingtube

Measuring problem-solving performance in AI

We can evaluate an algorithm’s performance in four ways in Artificial intelligence

  • Completeness
  • Time complexity
  • Space complexity

Time and space complexity are always considered with respect to some measure of the problem difficulty. In theoretical computer science, the typical measure is the size of the state space graph, |V | + |E| , where V is the set of vertices (nodes) of the graph and E is the set of edges (links). This is appropriate when the graph is an explicit data structure that is input to the search program. (The map of Romania is an example of this.) In AI , the graph is often represented implicitly by the initial state , actions , and transition model and is frequently infinite

Leave a Comment Cancel reply

You must be logged in to post a comment.

How Can We Measure (Artificial) Intelligence?

measuring problem solving performance in ai

In the October meeting of our AI Reading Group we discussed the paper " On the Measure of Intelligence " by Francois Chollet (2019) in which the author describes today's deep-learning systems as still very limited, as they are essentially skill programs being trained on large amounts of data. Even when these systems are capable of solving entire batteries of tasks, they cannot truly be called intelligent because their designers know what tasks they will face, and current benchmarks are designed to measure only the related skills. To measure intelligence, he argues, we need other benchmarks, ones that focus on generalization and skill-acquisition efficiency. In his Paper, Chollet proposes a new formal definition of intelligence based on algorithmic information theory, which describes intelligence as efficiency in acquiring skills. It considers scope, generalization difficulty, prior knowledge, and experience as critical factors in characterizing intelligent systems. The Abstraction and Reasoning Corpus (ARC) is a set of tasks developed based on this formal definition to provide a more comprehensive testing environment for evaluating AI systems.

The paper consists of just under 60 pages of mainly written text. In order to make the content accessible to a broader audience, I have tried with this blog article to elaborate what I consider to be the most important points for general understanding.

Context and history

measuring problem solving performance in ai

Need for an actionable definition and measure of intelligence

The promise of the field of AI is and has always been to develop machines that possess intelligence comparable to humans. However, current systems are still very limited in their abilities. The question of what we even mean when we talk about intelligence still doesn't have a satisfying answer. To make progress towards the promise, we need a reliable measure of that progress by using precise, quantitative definitions and measures of intelligence. Common sense dictionary definitions of intelligence are not helpful for this purpose as they are not actionable, explanatory or measurable. The same goes with methods like the Turing Test and its variants, as it is outsourcing the task to humans who do not have clear definitions or evaluation protocols.

Two divergent views of intelligence

Regarding the term intelligence, there is still no scientific consensus among researchers on a single definition. However, a common characterization of intelligence includes two aspects: task-specific abilities ("achieving goals") and generality and adaptation ("in a wide range of environments"). These two characterizations also relate closely to two opposing views of intelligence. One view sees the mind as a relatively static assembly of special-purpose mechanisms developed by evolution, only capable of learning what it is programmed to acquire. In another view, the mind is a general purpose "blank slate" capable of turning arbitrary experience into knowledge and skills that could be directed to any problem.

The evolutionary psychology view of human nature posits that much of the cognitive function is due to special-purpose adaptations. In other words, the human brain has evolved to be good at certain tasks because those skills were necessary for survival. This view gave rise to definitions of intelligence and evaluation protocols focused on task-specific performance. The problem with this approach is that it lacks generality. AI systems that are narrowly focused on task performance can often outperform humans on those specific tasks. However, they lack the flexibility and adaptability of humans when it comes to general problem solving. As a result, the focus on task-specific performance has led to a striking lack of generality in AI. In contrast, some researchers have taken the position that intelligence consists of the general ability to acquire new skills through learning, an ability that can be directed to a wide range of previously unknown problems, perhaps, even to any problem. This view of intelligence reflects another long-standing view of human nature that has strongly influenced the history of cognitive science and contrasts with the view of evolutionary psychology: a vision of the mind as a flexible, adaptive, highly general process that transforms experience into behavior, knowledge and skill.

AI evaluation: from measuring skills to measuring broad abilities

The success of artificial intelligence relies on systems that can perform well described tasks at a high level, as measured by a skill-based metric. This focus on task-specific performance often leads to tradeoffs in other areas, such as robustness and flexibility. Therefore, there is a need to go beyond skill-based evaluation to assess these other important attributes. The goal is to build systems with a higher grade of generalization, the ability to deal with situations that are different from those previously encountered. The spectrum of generalization reflects the organization of human cognitive abilities as described in the theories of cognitive psychology.

Psychometrics is a branch of psychology that deals with intelligence testing and the assessment of skills. Modern intelligence tests are designed to be reliable, valid, standardized, and free of bias. Remarkably, in parallel to psychometrics, there has been recent and increasing interest across the field of AI in using batteries of tasks to measure general abilities rather than specific skills. However, these benchmarks are still gameable because the test systems can practice specifically for the target tasks or use task-specific prior knowledge inherited from the system developers.

An alternative approach is to use the insights of psychometrics on skill assessment and test design to develop new types of benchmarks specifically designed to assess broad skills in AI systems. Interest in developing flexible systems and generality is growing, but the AI community has not paid much attention to psychometric assessment. There are several positive developments, including a growing awareness of the need for generalization in the evaluation of RL algorithms and interest in benchmarks for data efficiency and multitasking. However, there are also several negatives, including problems with the robustness of deep learning models, the lack of reproducibility of research results, and the little attention given to the study of capabilities beyond local generalization.

A new perspective

measuring problem solving performance in ai

Critical Assessment

In 1997, IBM's Deep Blue beat Gary Kasparov at chess, leading researchers to realize that they had not learned much about human cognition by developing an artificial chess master. From the perspective of modern AI research, it is obvious that a static chess program based on minimax and tree search does not provide information about human intelligence. But what about a program that is not human-programmed but trained to perform a task based on data? A learning machine may well be intelligent: learning is necessary for adapting to new information and acquiring new skills. But programming through exposure to a large amount of data is no guarantee of generalization or intelligence. Hard-coding prior knowledge in artificial intelligence is not the only way to artificially "buy" performance on a particular task without creating generalization capability. There is another way: add more training data.

It is well known that different individuals have different degrees of cognitive ability. These differences suggest that cognition is a multidimensional object, hierarchically structured with a single general factor - the g-factor. So the question arises: how general is human intelligence? Is the g-factor universal? Would it apply to every task in the universe? This question is of great importance when it comes to artificial intelligence - if there is such a thing as universal intelligence and human intelligence is a realization of that intelligence, then reverse engineering the brain might be the shortest path to it. However, a closer look reveals that human abilities are not universal in an absolute sense but rather specialized for tasks that were prevalent during evolution. For example, humans are not designed for long term planning or large working memory beyond what was necessary for survival in the ancestral environment. In addition, there is a dimensional bias in which humans excel at 2D navigation tasks but struggle with 3D or higher tasks because they were not evolutionarily prepared for them. Thus, "general intelligence" should not be viewed as a binary trait that is either present or absent. Instead, it is on a spectrum determined by 1) the scope of application and 2) the efficiency with which new skills are learned.

Advances in developmental psychology have taught us that the human mind is not merely a collection of special purpose programs hard-coded by evolution. The large majority of the skills and knowledge we possess are acquired during our lifetimes rather than innate. Simultaneously, the mind is not a single, general purpose "blank slate" system capable of learning anything from experience. It is therefore proposed that an actionable test of human-like general intelligence should be founded on innate human knowledge priors. These priors should be made as close as possible to innate human knowledge priors as we understand them, and they should be explicitly and exhaustively described.

Defining intelligence: a formal synthesis

The intelligence of a system is defined as its skill acquisition efficiency over a scope of tasks with respect to priors, experience, and generalization difficulty. This definition encompasses meta-learning priors, memory, and fluid intelligence. The formalism presented in this paper is regarded as useful for research on broad cognitive abilities and can serve as a foundation for new general intelligence benchmarks.

A high-intelligence system is defined as one that is able to high-skill solution programs for high generalization difficulty tasks using little experience and prior knowledge. The measure of intelligence here is tied to the choice of the domain of application (space of tasks and value function over tasks). Optionally, it may also be tied to a choice of sufficient skill levels across the tasks in the scope (sufficient case). Skill is not a property of an intelligent system but a property of the output artefact of the intelligence process (a skill program). High skill is not synonymous with high intelligence, they are entirely different concepts. Intelligence must involve learning and adaptation, i.e. operationalizing information gained from experience to deal with future uncertainties. Intelligence is not curve-fitting, a system that merely produces the simplest possible capability program consistent with known data points could, by this definition, perform well only on tasks that do not present generalization difficulties. An intelligent system must produce behavioral programs that account for future uncertainties.

Besides the information efficiency described above (prior efficiency and experience efficiency with respect to generalization difficulty) of intelligent systems, there are several other alternatives that could be incorporated into the definition. These are metrics like computational efficiency (skill programs that have minimal computational resource consumption and intelligent systems that use minimal computational resources to generate skill programs), time efficiency (minimize latency), energy efficiency (minimize the amount of energy expended) and risk efficiency (encourage safe curricula).

The described framework provides a formal way to reason about the intuitive concepts of "generalization difficulty", "intelligence as skill-acquisition efficiency", and what it means to control for priors and experience when evaluating intelligence. Its main value is that it offers a perspective shift in how we understand and evaluate flexible or general artificial intelligence, with several practical consequences for research directions and evaluation methods.

Consequences for research directions include a focus on developing broad or general purpose abilities rather than pursuing skill alone, an interest in program synthesis, and an interest in curriculum development. Consequences for evaluation methods include: taking into account generalization difficulty when developing a test set, using metrics that can discard solutions that rely on shortcuts, and rigorously characterizing any intelligent system by asking questions about its scope, potential, priors, skill-acquisition efficiency, etc.

Evaluating intelligence in this light

When comparing the intelligence of different systems, it is important to ensure that the comparison is fair. This means that the systems being compared must share the scope of tasks and have comparable levels of potential skill. The comparison should focus on the efficiency with which the system achieves the same level of skill as a human expert. In addition, it is recommended that only systems with similar prior knowledge be compared.

According to the conclusions of the paper with regard to the properties that a candidate benchmark of human-like general intelligence should possess, such an ideal intelligence benchmark should

  • describe its scope of application and its own predictiveness with regard to this scope.
  • be reliable.
  • seek to measure broad abilities and developer-aware generalization. 
  • control for the amount of experience leveraged by test-taking systems during training. 
  • explicitly and exhaustively describe the set of priors it assumes. 
  • work for both humans and machines fairly by only assuming the same priors as possessed by humans.

A benchmark proposal: the ARC dataset

measuring problem solving performance in ai

In the last part, Chollet introduces the Abstraction and Reasoning Corpus (ARC), a dataset intended to serve as a benchmark for the kind of general intelligence defined in the previous sections. He describes the new benchmark as follows:

"ARC can be seen as a general artificial intelligence benchmark, as a program synthesis benchmark, or as a psychometric intelligence test. It is targeted at both humans and artificially intelligent systems that aim at emulating a human-like form of general fluid intelligence."

ARC has the following top-level goals:

  • Stay close to psychometric intelligence tests that should be able to be solvable by humans without specific knowledge or training.
  • Focus on developer-aware generalization rather than task-specific skill
  • Focus on measuring a qualitatively "broad" form of generalization by featuring highly abstract tasks that must be understood by a test-taker using very few examples
  • Quantitatively control for experience only by providing a fixed amount of training data for each task and only featuring tasks that do not lend themselves well to artificially generated new data.
  • Explicitly describe the complete set of priors it assumes. These should be close to innate human prior knowledge.

A test-taker is said to solve a task when, upon seeing the task for the first time, they are able to produce the correct output grid for all test inputs in the task (this includes picking the dimensions of the output grid). For each test input, the test-taker is allowed three trials (this holds for all test-takers, either humans or AI). Only exact solutions (all cells match the expected answer) can be called correct.

measuring problem solving performance in ai

This paper was presented and discussed at the October meeting of our AI Reading Group. After a short presentation of the paper, there was an interesting discussion about what intelligence really means and that, in retrospect, certain abilities that we associate with intelligent thinking and acting (e.g., being able to play chess very well) were not sufficient to call a system intelligent. Concerns have been raised that even with a benchmark that focuses more on adaptability and generalization, it will not be any different. There was an exchange that even then, it might be possible to use shortcuts or pure computational power to accomplish the goal. At the end of the day, we can only measure skills in solving a given task. Nevertheless, the group agreed that the work on the topic and the conception of the ARC Challenge are important steps in the right direction.

Sources : 

On the Measure of Intelligence, François Chollet Paper: https://arxiv.org/abs/1911.01547

GitHub repository, The Abstraction and Reasoning Corpus (ARC): https://github.com/fchollet/ARC

Figures were taken from the paper, visuals were generated with Midjourney v4

Full disclosure: I used AI writing tools to create extractive summaries of some parts of the paper. This was part of an experiment to evaluate the usefulness of these tools for these types of tasks. However, there was a lot of human intelligence involved to verify the results and prevent the publication of incorrect information. If you still notice something in this regard, please send me a short message.

About the Author

Avatar

Share Article

Newest job offers, business systems administrator (atlassian), test management specialist, soc analyst, junior data analyst (m/f/d), intern in group it - information security services (m/f), sap consultant, revenue manager (m/f/d), project manager (f/m/d).

measuring problem solving performance in ai

Create Your Own Career

On our career website "Create Your Own Career" you can discover the wide range of entry and career opportunities at Bertelsmann and be inspired by our employee stories!

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Springer Nature - PMC COVID-19 Collection

Logo of phenaturepg

Principles of Creative Problem Solving in AI Systems

Zhihui chen.

School of Education, South China Normal University, 55 E Zhongshan Ave, Guangzhou, 510631 China

The utilization of Artificial Intelligence (AI) is springing up through all spheres of human activities due to the current global pandemic (COVID-19), which has limited human interactions in our societies and the corporate world. Undoubtedly, AI has innovatively transformed our ways of living and understanding how mechanical systems work on problem solving as or even beyond human beings. The core issues of this book include the following issues: (1) understanding the working mechanism of the human mind on problem solving, and (2) exploring what it means to be computationally creative and how it can be evaluated. By having an overview of the development of AI and Cognitive Science and rebranding the strands of creativity and problem solving, Dr. Ana-Maria Oltețeanu attempts to build cognitive systems, which propose a type of knowledge organization and a small set of processes aimed at solving a diverse number of creative problems. Furthermore, with the help of the defined framework, the relevant computational system is implemented and evaluated by investigating the classical and insight problem solving performance.

Part I of this book includes the previous four chapters, which introduces a series of theories such as creativity (p.11), insight (p.16), and visuospatial intelligence (p.20) to illustrate the necessary process and structure of creative problem solving. The author concludes from the relevant literature that the interplay between knowledge representations and organization processes would play an important role in searching for solutions. For better illustration and understanding, a selection of computational creativity systems is presented, such as AM, HR, Aaron, the Painting fool, Poetry systems, and BACON (p.34–37). Subsequently, from a methodological perspective, Dr. Oltețeanu introduces two different creativity evaluations for human beings and computational machines respectively. On the one hand, when measuring creativity of human, the thinking characteristics of the participants such as divergent thinking (the ability to diverge from subjectively familiar uses and think of other uses) and creative thinking are the primary objective for measurement in some of the most important empirical models. On the other hand, when assessing the creativity in the computational systems, various models of evaluating the behaviors or programs of creative systems are proposed mainly in terms of typicality, quality, and novelty.

In the second part, which comprises chapter 5 th to 8 th , the author develops a cognitive framework to explore how a diverse set of creative problem solving tasks can be solved computationally using a unified set of principles. To facilitate the understanding of insight and creative problem solving, Dr. Oltețeanu puts forward a metaphor, in which representations are seen as cogs in a creative machine and problem solving processes are regarded as clockwork, to view the relationship between creative processes and knowledge (p.69). Building on this idea, a theoretical framework (named as CreaCogs) is proposed based on encoding knowledge, which permits processes of fast and informed search and construction, for creative problem solving. These processes take place conceptually at three levels involving Feature Spaces, Concepts, and Problem Templates (p.91–94). Firstly, whenever an object encoded symbolically is observed, its sensors will be enrolled in the sub-symbolical level of feature maps and spaces. Then, in the following level, various known concepts are grounded in a distributed manner in organized feature spaces, and their names are encoded in a different name tag mapped for functionally constituting another feature. Lastly in the highest level, problem templates are structured representations, which are encoded over multiple concepts, their relations, and the affordance they provide. On the basis of the steps above, an integration of a wide set of principles in the framework would be accessible.

Part III, which forms chapter 9 th to 12 th , mainly focuses on applying the CreaCogs in a set of practical cognitive system cases, and developing a set of tools through which the performance of such systems could be evaluated. It is worth noticing that several evaluation tests of creativity are introduced to illustrate about how to apply implementation of the framework built above. In the preamble of this part, the CreaCogs mechanism of Remote Associates creativity Test (RAT) and Alternative Uses Test (AUT) are explored to develop the corresponding computational systems to solve these test tasks. Based on the practice of implementation and investigation, Dr. Oltețeanu analyzes how to evaluate the performance of the artificial cognitive prototype systems by solving different creativity tasks via inference mechanism or matching algorithm from CreaCogs. The book ends with an overview of the journey of exploring the creative problem solving and an outlook of the relevant experimental work.

Overall, the author provides a revolutionary academic framework to understand the theoretical and empirical cognitive processes involved in creative problem solving by computational systems. Various evaluation of creativity tests and tasks are drawn to illustrate how the cognitive framework works to find solutions of classical or even insight problems, which are stressed in the 2012 paper by Batchelder and Alexander (Insight problem solving: A critical examination of the possibility of formal theory, in The Journal of Problem Solving ), as the alternative productive representations are necessary to overcome the failures of discovering solutions. Besides, it is deep insight when the author describes the cognitive models of creativity through using a variety of schematic diagrams and pictures in this book. That is rather helpful to illustrate how insight and creative problem solving can be viewed as processes of memory management, with both associationist and gestaltic (template pattern-filling) underpinnings, and with processes of recasting and restructuring using from the memory and the environment. From the theoretical matters to the variate practical domains, Dr. Oltețeanu constructs the cognitive systems on the basis of the CreaCogs and develops a set of tools through which the performance of such systems can be evaluated similarly to that of human participants. In short, the theoretical framework and empirical computational exploration contribute to creating the imagination of the efficacy of AI in the area of creative problem solving.

However, the critical issue of the possibility of developing self-adaptive learning by the creative systems has not been further discussed yet. To quote the annotation in the fields of behavioral psychology and cognitive psychology, self-adaptive learning in AI refers to human’s self-adapted learning methods and the habitual condition information processing systems, which forms a method that AI can solve theories and problems independently through discovering and summarizing in operations. Due to emphasizing to develop a framework for analyzing the creative problem solving, the author focuses on introducing the value, mechanism, application, and evaluation of the computational system based on the CreaCogs that is why the issue of self-adaptive learning has rarely been taken into account for now. In summary, this book enhances our understanding of the principles of problem solving in the epoch of AI and deserves to be widely read in this age of intelligent machines. The CreaCogs cognitive framework proposed here could be served as an applicable guide for graduate students and researchers in the sphere of Cognitive Science, AI, and Education.

Declarations

There is no conflict of interest.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Artificial Intelligence Tutorial

Search algorithms, knowledge, reasoning and planning, uncertain knowledge and reasoning, problem-solving in artificial intelligence.

The reflex agents are known as the simplest agents because they directly map states into actions. Unfortunately, these agents fail to operate in an environment where the mapping is too large to store and learn. Goal-based agent, on the other hand, considers future actions and the desired outcomes.

Here, we will discuss one type of goal-based agent known as a problem-solving agent , which uses atomic representation with no internal states visible to the problem-solving algorithms .

Problem-solving agent

The problem-solving agent perfoms precisely by defining problems and its several solutions.

According to psychology, “ a problem-solving refers to a state where we wish to reach to a definite goal from a present state or condition.”

According to computer science, a problem-solving is a part of artificial intelligence which encompasses a number of techniques such as algorithms, heuristics to solve a problem.

Therefore, a problem-solving agent is a goal-driven agent and focuses on satisfying the goal.

Steps performed by Problem-solving agent

  • Goal Formulation: It is the first and simplest step in problem-solving. It organizes the steps/sequence required to formulate one goal out of multiple goals as well as actions to achieve that goal. Goal formulation is based on the current situation and the agent's performance measure (discussed below).
  • Problem Formulation: It is the most important step of problem-solving which decides what actions should be taken to achieve the formulated goal. There are following five components involved in problem formulation:
  • Initial State: It is the starting state or initial step of the agent towards its goal.
  • Actions: It is the description of the possible actions available to the agent.
  • Transition Model: It describes what each action does.
  • Goal Test: It determines if the given state is a goal state.
  • Path cost: It assigns a numeric cost to each path that follows the goal. The problem-solving agent selects a cost function, which reflects its performance measure. Remember, an optimal solution has the lowest path cost among all the solutions.

Note: Initial state, actions , and transition model together define the state-space of the problem implicitly. State-space of a problem is a set of all states which can be reached from the initial state followed by any sequence of actions. The state-space forms a directed map or graph where nodes are the states, links between the nodes are actions, and the path is a sequence of states connected by the sequence of actions.

  • Search: It identifies all the best possible sequence of actions to reach the goal state from the current state. It takes a problem as an input and returns solution as its output.
  • Solution: It finds the best algorithm out of various algorithms, which may be proven as the best optimal solution.
  • Execution: It executes the best optimal solution from the searching algorithms to reach the goal state from the current state.

Example Problems

Basically, there are two types of problem approaches:

  • Toy Problem: It is a concise and exact description of the problem which is used by the researchers to compare the performance of algorithms.
  • Real-world Problem: It is real-world based problems which require solutions. Unlike a toy problem, it does not depend on descriptions, but we can have a general formulation of the problem.

Some Toy Problems

  • 8 Puzzle Problem: Here, we have a 3x3 matrix with movable tiles numbered from 1 to 8 with a blank space. The tile adjacent to the blank space can slide into that space. The objective is to reach a specified goal state similar to the goal state, as shown in the below figure.
  • In the figure, our task is to convert the current state into goal state by sliding digits into the blank space.

Some Toy Problems

In the above figure, our task is to convert the current(Start) state into goal state by sliding digits into the blank space.

The problem formulation is as follows:

  • States: It describes the location of each numbered tiles and the blank tile.
  • Initial State: We can start from any state as the initial state.
  • Actions: Here, actions of the blank space is defined, i.e., either left, right, up or down
  • Transition Model: It returns the resulting state as per the given state and actions.
  • Goal test: It identifies whether we have reached the correct goal-state.
  • Path cost: The path cost is the number of steps in the path where the cost of each step is 1.

Note: The 8-puzzle problem is a type of sliding-block problem which is used for testing new search algorithms in artificial intelligence .

  • 8-queens problem: The aim of this problem is to place eight queens on a chessboard in an order where no queen may attack another. A queen can attack other queens either diagonally or in same row and column.

From the following figure, we can understand the problem as well as its correct solution.

8-queens problem in Artificial Intelligence

It is noticed from the above figure that each queen is set into the chessboard in a position where no other queen is placed diagonally, in same row or column. Therefore, it is one right approach to the 8-queens problem.

For this problem, there are two main kinds of formulation:

  • Incremental formulation: It starts from an empty state where the operator augments a queen at each step.

Following steps are involved in this formulation:

  • States: Arrangement of any 0 to 8 queens on the chessboard.
  • Initial State: An empty chessboard
  • Actions: Add a queen to any empty box.
  • Transition model: Returns the chessboard with the queen added in a box.
  • Goal test: Checks whether 8-queens are placed on the chessboard without any attack.
  • Path cost: There is no need for path cost because only final states are counted.

In this formulation, there is approximately 1.8 x 10 14 possible sequence to investigate.

  • Complete-state formulation: It starts with all the 8-queens on the chessboard and moves them around, saving from the attacks.

Following steps are involved in this formulation

  • States: Arrangement of all the 8 queens one per column with no queen attacking the other queen.
  • Actions: Move the queen at the location where it is safe from the attacks.

This formulation is better than the incremental formulation as it reduces the state space from 1.8 x 10 14 to 2057 , and it is easy to find the solutions.

Some Real-world problems

  • Traveling salesperson problem(TSP): It is a touring problem where the salesman can visit each city only once. The objective is to find the shortest tour and sell-out the stuff in each city.
  • VLSI Layout problem: In this problem, millions of components and connections are positioned on a chip in order to minimize the area, circuit-delays, stray-capacitances, and maximizing the manufacturing yield.

The layout problem is split into two parts:

  • Cell layout: Here, the primitive components of the circuit are grouped into cells, each performing its specific function. Each cell has a fixed shape and size. The task is to place the cells on the chip without overlapping each other.
  • Channel routing: It finds a specific route for each wire through the gaps between the cells.
  • Protein Design: The objective is to find a sequence of amino acids which will fold into 3D protein having a property to cure some disease.

Searching for solutions

We have seen many problems. Now, there is a need to search for solutions to solve them.

In this section, we will understand how searching can be used by the agent to solve a problem.

For solving different kinds of problem, an agent makes use of different strategies to reach the goal by searching the best possible algorithms. This process of searching is known as search strategy.

Measuring problem-solving performance

Before discussing different search strategies, the performance measure of an algorithm should be measured. Consequently, There are four ways to measure the performance of an algorithm:

Completeness: It measures if the algorithm guarantees to find a solution (if any solution exist).

Optimality: It measures if the strategy searches for an optimal solution.

Time Complexity: The time taken by the algorithm to find a solution.

Space Complexity: Amount of memory required to perform a search.

The complexity of an algorithm depends on branching factor or maximum number of successors , depth of the shallowest goal node (i.e., number of steps from root to the path) and the maximum length of any path in a state space.

Search Strategies

There are two types of strategies that describe a solution for a given problem:

Uninformed Search (Blind Search)

This type of search strategy does not have any additional information about the states except the information provided in the problem definition. They can only generate the successors and distinguish a goal state from a non-goal state. These type of search does not maintain any internal state, that’s why it is also known as Blind search.

There are following types of uninformed searches:

  • Breadth-first search
  • Uniform cost search
  • Depth-first search
  • Depth-limited search
  • Iterative deepening search
  • Bidirectional search

Informed Search (Heuristic Search)

This type of search strategy contains some additional information about the states beyond the problem definition. This search uses problem-specific knowledge to find more efficient solutions. This search maintains some sort of internal states via heuristic functions (which provides hints), so it is also called heuristic search .

There are following types of informed searches:

  • Best first search (Greedy search)

Related Posts:

  • Top 10 Artificial Intelligence Technologies in 2020.
  • Constraint Satisfaction Problems in Artificial Intelligence
  • Heuristic Functions in Artificial Intelligence
  • Dynamic Bayesian Networks
  • Utility Functions in Artificial Intelligence
  • Probabilistic Reasoning
  • Quantifying Uncertainty
  • Classical Planning
  • Hidden Markov Models
  • Knowledge Based Agents in AI
  • Data Science
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • Deep Learning
  • Computer Vision
  • Artificial Intelligence
  • AI ML DS Interview Series
  • AI ML DS Projects series
  • Data Engineering
  • Web Scrapping

Problem Solving in Artificial Intelligence

  • Game Playing in Artificial Intelligence
  • Types of Reasoning in Artificial Intelligence
  • Artificial Intelligence - Terminology
  • Artificial Intelligence(AI) Replacing Human Jobs
  • Constraint Satisfaction Problems (CSP) in Artificial Intelligence
  • What Are The Ethical Problems in Artificial Intelligence?
  • Artificial Intelligence | An Introduction
  • Artificial Intelligence - Boon or Bane
  • What is Artificial Intelligence?
  • Artificial Intelligence in Financial Market
  • Artificial Intelligence Tutorial | AI Tutorial
  • Top 15 Artificial Intelligence(AI) Tools List
  • What is Artificial Narrow Intelligence (ANI)?
  • Artificial Intelligence Permeation and Application
  • Dangers of Artificial Intelligence
  • What is the Role of Planning in Artificial Intelligence?
  • Artificial Intelligence (AI) Researcher Jobs in China
  • Artificial Intelligence vs Cognitive Computing
  • 5 Mistakes to Avoid While Learning Artificial Intelligence

The reflex agent of AI directly maps states into action. Whenever these agents fail to operate in an environment where the state of mapping is too large and not easily performed by the agent, then the stated problem dissolves and sent to a problem-solving domain which breaks the large stored problem into the smaller storage area and resolves one by one. The final integrated action will be the desired outcomes.

On the basis of the problem and their working domain, different types of problem-solving agent defined and use at an atomic level without any internal state visible with a problem-solving algorithm. The problem-solving agent performs precisely by defining problems and several solutions. So we can say that problem solving is a part of artificial intelligence that encompasses a number of techniques such as a tree, B-tree, heuristic algorithms to solve a problem.  

We can also say that a problem-solving agent is a result-driven agent and always focuses on satisfying the goals.

There are basically three types of problem in artificial intelligence:

1. Ignorable: In which solution steps can be ignored.

2. Recoverable: In which solution steps can be undone.

3. Irrecoverable: Solution steps cannot be undo.

Steps problem-solving in AI: The problem of AI is directly associated with the nature of humans and their activities. So we need a number of finite steps to solve a problem which makes human easy works.

These are the following steps which require to solve a problem :

  • Problem definition: Detailed specification of inputs and acceptable system solutions.
  • Problem analysis: Analyse the problem thoroughly.
  • Knowledge Representation: collect detailed information about the problem and define all possible techniques.
  • Problem-solving: Selection of best techniques.

Components to formulate the associated problem: 

  • Initial State: This state requires an initial state for the problem which starts the AI agent towards a specified goal. In this state new methods also initialize problem domain solving by a specific class.
  • Action: This stage of problem formulation works with function with a specific class taken from the initial state and all possible actions done in this stage.
  • Transition: This stage of problem formulation integrates the actual action done by the previous action stage and collects the final stage to forward it to their next stage.
  • Goal test: This stage determines that the specified goal achieved by the integrated transition model or not, whenever the goal achieves stop the action and forward into the next stage to determines the cost to achieve the goal.  
  • Path costing: This component of problem-solving numerical assigned what will be the cost to achieve the goal. It requires all hardware software and human working cost.

Please Login to comment...

Similar reads.

author

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

Principles of Creative Problem Solving in AI Systems

Ana-Maria Oltețeanu: Cognition and Creative Machine: Cognitive AI for Creative Problem Solving. Freie Universität Berlin, Berlin, Germany, Springer, Cham, 2020 (Online ISBN: 978–3-030–30322-8), 282 pages, price: €117.69 (eBook), DOI: https://doi.org/10.1007/978–3-030–30322-8

  • Book Review
  • Published: 24 August 2021
  • Volume 31 , pages 555–557, ( 2022 )

Cite this article

measuring problem solving performance in ai

  • Zhihui Chen 1 &
  • Ruixing Ye 1  

2713 Accesses

2 Citations

1 Altmetric

Explore all metrics

Avoid common mistakes on your manuscript.

The utilization of Artificial Intelligence (AI) is springing up through all spheres of human activities due to the current global pandemic (COVID-19), which has limited human interactions in our societies and the corporate world. Undoubtedly, AI has innovatively transformed our ways of living and understanding how mechanical systems work on problem solving as or even beyond human beings. The core issues of this book include the following issues: (1) understanding the working mechanism of the human mind on problem solving, and (2) exploring what it means to be computationally creative and how it can be evaluated. By having an overview of the development of AI and Cognitive Science and rebranding the strands of creativity and problem solving, Dr. Ana-Maria Oltețeanu attempts to build cognitive systems, which propose a type of knowledge organization and a small set of processes aimed at solving a diverse number of creative problems. Furthermore, with the help of the defined framework, the relevant computational system is implemented and evaluated by investigating the classical and insight problem solving performance.

Part I of this book includes the previous four chapters, which introduces a series of theories such as creativity (p.11), insight (p.16), and visuospatial intelligence (p.20) to illustrate the necessary process and structure of creative problem solving. The author concludes from the relevant literature that the interplay between knowledge representations and organization processes would play an important role in searching for solutions. For better illustration and understanding, a selection of computational creativity systems is presented, such as AM, HR, Aaron, the Painting fool, Poetry systems, and BACON (p.34–37). Subsequently, from a methodological perspective, Dr. Oltețeanu introduces two different creativity evaluations for human beings and computational machines respectively. On the one hand, when measuring creativity of human, the thinking characteristics of the participants such as divergent thinking (the ability to diverge from subjectively familiar uses and think of other uses) and creative thinking are the primary objective for measurement in some of the most important empirical models. On the other hand, when assessing the creativity in the computational systems, various models of evaluating the behaviors or programs of creative systems are proposed mainly in terms of typicality, quality, and novelty.

In the second part, which comprises chapter 5 th to 8 th , the author develops a cognitive framework to explore how a diverse set of creative problem solving tasks can be solved computationally using a unified set of principles. To facilitate the understanding of insight and creative problem solving, Dr. Oltețeanu puts forward a metaphor, in which representations are seen as cogs in a creative machine and problem solving processes are regarded as clockwork, to view the relationship between creative processes and knowledge (p.69). Building on this idea, a theoretical framework (named as CreaCogs) is proposed based on encoding knowledge, which permits processes of fast and informed search and construction, for creative problem solving. These processes take place conceptually at three levels involving Feature Spaces, Concepts, and Problem Templates (p.91–94). Firstly, whenever an object encoded symbolically is observed, its sensors will be enrolled in the sub-symbolical level of feature maps and spaces. Then, in the following level, various known concepts are grounded in a distributed manner in organized feature spaces, and their names are encoded in a different name tag mapped for functionally constituting another feature. Lastly in the highest level, problem templates are structured representations, which are encoded over multiple concepts, their relations, and the affordance they provide. On the basis of the steps above, an integration of a wide set of principles in the framework would be accessible.

Part III, which forms chapter 9 th to 12 th , mainly focuses on applying the CreaCogs in a set of practical cognitive system cases, and developing a set of tools through which the performance of such systems could be evaluated. It is worth noticing that several evaluation tests of creativity are introduced to illustrate about how to apply implementation of the framework built above. In the preamble of this part, the CreaCogs mechanism of Remote Associates creativity Test (RAT) and Alternative Uses Test (AUT) are explored to develop the corresponding computational systems to solve these test tasks. Based on the practice of implementation and investigation, Dr. Oltețeanu analyzes how to evaluate the performance of the artificial cognitive prototype systems by solving different creativity tasks via inference mechanism or matching algorithm from CreaCogs. The book ends with an overview of the journey of exploring the creative problem solving and an outlook of the relevant experimental work.

Overall, the author provides a revolutionary academic framework to understand the theoretical and empirical cognitive processes involved in creative problem solving by computational systems. Various evaluation of creativity tests and tasks are drawn to illustrate how the cognitive framework works to find solutions of classical or even insight problems, which are stressed in the 2012 paper by Batchelder and Alexander (Insight problem solving: A critical examination of the possibility of formal theory, in The Journal of Problem Solving ), as the alternative productive representations are necessary to overcome the failures of discovering solutions. Besides, it is deep insight when the author describes the cognitive models of creativity through using a variety of schematic diagrams and pictures in this book. That is rather helpful to illustrate how insight and creative problem solving can be viewed as processes of memory management, with both associationist and gestaltic (template pattern-filling) underpinnings, and with processes of recasting and restructuring using from the memory and the environment. From the theoretical matters to the variate practical domains, Dr. Oltețeanu constructs the cognitive systems on the basis of the CreaCogs and develops a set of tools through which the performance of such systems can be evaluated similarly to that of human participants. In short, the theoretical framework and empirical computational exploration contribute to creating the imagination of the efficacy of AI in the area of creative problem solving.

However, the critical issue of the possibility of developing self-adaptive learning by the creative systems has not been further discussed yet. To quote the annotation in the fields of behavioral psychology and cognitive psychology, self-adaptive learning in AI refers to human’s self-adapted learning methods and the habitual condition information processing systems, which forms a method that AI can solve theories and problems independently through discovering and summarizing in operations. Due to emphasizing to develop a framework for analyzing the creative problem solving, the author focuses on introducing the value, mechanism, application, and evaluation of the computational system based on the CreaCogs that is why the issue of self-adaptive learning has rarely been taken into account for now. In summary, this book enhances our understanding of the principles of problem solving in the epoch of AI and deserves to be widely read in this age of intelligent machines. The CreaCogs cognitive framework proposed here could be served as an applicable guide for graduate students and researchers in the sphere of Cognitive Science, AI, and Education.

Author information

Authors and affiliations.

School of Education, South China Normal University, 55 E Zhongshan Ave, Guangzhou, 510631, China

Zhihui Chen & Ruixing Ye

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Ruixing Ye .

Ethics declarations

Conflict of interest.

There is no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Chen, Z., Ye, R. Principles of Creative Problem Solving in AI Systems. Sci & Educ 31 , 555–557 (2022). https://doi.org/10.1007/s11191-021-00270-7

Download citation

Accepted : 04 August 2021

Published : 24 August 2021

Issue Date : April 2022

DOI : https://doi.org/10.1007/s11191-021-00270-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Find a journal
  • Publish with us
  • Track your research

Javatpoint Logo

Artificial Intelligence

Control System

  • Interview Q

Intelligent Agent

Problem-solving, adversarial search, knowledge represent, uncertain knowledge r., subsets of ai, artificial intelligence mcq, related tutorials.

JavaTpoint

  • Send your Feedback to [email protected]

Help Others, Please Share

facebook

Learn Latest Tutorials

Splunk tutorial

Transact-SQL

Tumblr tutorial

Reinforcement Learning

R Programming tutorial

R Programming

RxJS tutorial

React Native

Python Design Patterns

Python Design Patterns

Python Pillow tutorial

Python Pillow

Python Turtle tutorial

Python Turtle

Keras tutorial

Preparation

Aptitude

Verbal Ability

Interview Questions

Interview Questions

Company Interview Questions

Company Questions

Trending Technologies

Artificial Intelligence

Cloud Computing

Hadoop tutorial

Data Science

Angular 7 Tutorial

Machine Learning

DevOps Tutorial

B.Tech / MCA

DBMS tutorial

Data Structures

DAA tutorial

Operating System

Computer Network tutorial

Computer Network

Compiler Design tutorial

Compiler Design

Computer Organization and Architecture

Computer Organization

Discrete Mathematics Tutorial

Discrete Mathematics

Ethical Hacking

Ethical Hacking

Computer Graphics Tutorial

Computer Graphics

Software Engineering

Software Engineering

html tutorial

Web Technology

Cyber Security tutorial

Cyber Security

Automata Tutorial

C Programming

C++ tutorial

Data Mining

Data Warehouse Tutorial

Data Warehouse

RSS Feed

  • Español – América Latina
  • Português – Brasil
  • Tiếng Việt
  • Android Developers

Practical performance problem solving in Jetpack Compose

1. before you begin.

In this codelab, you learn how to improve the runtime performance of a Compose app. You follow a scientific approach to measure, debug, and optimize performance. You investigate multiple performance issues with system tracing and change non-performant runtime code in a sample app, which contains several screens that represent different tasks. The screens are each built differently and include the following:

  • The first screen is a two-column list with image items and some tags on top of the item. Here, you optimize heavy composables.
  • The second and third screens contain a frequently recomposing state. Here, you remove unnecessary recompositions to optimize performance.
  • The last screen contains unstable items. Here, you stabilize the items with various techniques.

Prerequisites

  • Knowledge of how to build Compose apps.
  • Basic understanding of testing or running macrobenchmarks .

What you learn

  • How to pinpoint performance issues with system traces and composition tracing .
  • How to write performant Compose apps that render smoothly.

What you need

  • The latest stable version of Android Studio
  • A physical Android device with Android 6 (API level 23) or higher

2. Get set up

To get started, follow these steps:

  • Clone the GitHub repository:

Alternatively, you can download the repository as a zip file:

file_download Download the starting point

  • Open the PerformanceCodelab project, which contains the following branches:
  • main : Contains the starter code for this project, where you make changes to complete the codelab.
  • end : Contains the solution code for this codelab.

We recommend that you begin with the main branch and follow the codelab step-by-step at your own pace.

  • If you want to see the solution code, run this command:

Alternatively, you can download the solution code:

file_download Download the final code

Optional: System traces used in this codelab

You will run several benchmarks that capture system traces during the codelab.

If you're not able to run these benchmarks, here's a list of system traces you can download instead:

step0-AccelerateHeavyScreenBenchmark.perfetto-trace

step1-AccelerateHeavyScreenBenchmark.perfetto-trace

step2-AccelerateHeavyScreenBenchmark.perfetto-trace

step3-AccelerateHeavyScreenBenchmark.perfetto-trace

3. Approach to fixing performance issues

Spotting slow, non-performant UI is possible with just plain sight and exploring the app. But before you jump in and start fixing code based on your assumptions, you should measure the performance of your code to understand if your changes will make a difference.

During development, with a debuggable build of your app, you might notice something is not as performant as needed and you might be tempted to start dealing with this problem. But a debuggable app's performance is not representative of what your users will see, so it's important to verify with a non-debuggable app that it actually is a problem. In a debuggable app, all of the code has to be interpreted by the runtime.

When thinking about performance in Compose, there's no hard rule you should follow to implement a particular functionality. You shouldn't do the following prematurely:

  • Don't chase and fix every unstable parameter that sneaks into your code.
  • Don't remove animations causing recomposition of that composable.
  • Don't do hard-to-read optimizations based on your gut feeling.

All of these modifications should be done in an informed way using the available tools to be sure that they're addressing the performance issue.

When dealing with performance issues, you should follow this scientific approach:

  • Establish the initial performance by measuring.
  • Observe what's causing the problem.
  • Modify the code based on the observations.
  • Measure and compare with initial performance.

If you don't follow any structured method, some of the changes might improve performance, but others might degrade it, and you can end up with the same outcome.

We recommend watching the following video on enhancing app performance with Compose that goes through the journey of fixing performance issues and even shows some tips on how to improve it.

Generate Baseline Profiles

Before you dive into investigating performance issues, generate a Baseline Profile for your app . On Android 6 (API level 23) and higher, apps run code interpreted at runtime and compiled just-in-time (JIT) and ahead-of-time (AOT) at installation. Interpreted and JIT compiled code runs slower than AOT, but takes less space on disk and in memory, which is why not all code should be AOT compiled.

By implementing Baseline Profiles, you can improve your app startup by 30% and reduce the code running in JIT mode at runtime by eight times as shown in the following image based on the Now in Android sample app:

b51455a2ca65ea8.png

For more information about Baseline Profiles, see the following resources:

  • Baseline Profiles documentation
  • Improve app performance with Baseline Profiles codelab

Measure performance

To measure performance, we recommend setting up and writing benchmarks with Jetpack Macrobenchmark . Macrobenchmarks are instrumented tests that interact with your app as a user would while monitoring performance of your app. This means they don't pollute the app code with testing code and thus provide reliable performance information.

In this codelab, we already set up the codebase and wrote the benchmarks to focus directly on fixing performance issues. If you're unsure of how to set up and use Macrobenchmark in your project, see the following resources:

  • Inspect app performance with Macrobenchmark codelab
  • Inspecting Performance–MAD skills
  • Write a Macrobenchmark documentation

With Macrobenchmarks, you can choose one of the following compilation modes :

  • None : Resets the compilation state and runs everything in JIT mode.
  • Partial : Pre-compiles the app with Baseline Profiles and/or warm-up iterations, and runs in JIT mode.
  • Full : Pre-compiles the entire app code, so there's no code running in JIT mode.

In this codelab, you only use the CompilationMode.Full() mode for the benchmarks because you only care about the changes that you make in the code, not the compilation state of the app. This approach lets you reduce the variance that would be caused by the code running in JIT mode, which should be reduced when implementing custom Baseline Profiles. Beware that Full mode can have a negative effect on app startup, so don't use it for benchmarks measuring app startup, but use it only for benchmarks measuring runtime performance improvements.

When you're done with the performance improvements and you want to check the performance to see how it performs when your users install the app, use the CompilationMode.Partial() mode that uses baseline profiles.

In the next section, you learn how to read the traces to find the performance problems.

4. Analyze performance with system tracing

With a debuggable build of your app, you can use the Layout Inspector with composition count to quickly understand when something is recomposing too often.

However, it is only part of the overall performance investigation because you only get proxy measurements and not the actual time those composables took to render. It might not matter much if something recomposes N times if the total duration takes less than a millisecond. But on the other hand, it matters if something is composed just once or twice, and takes 100 milliseconds. Oftentimes, a composable might compose only once, and yet take too long to do that and slow your screen.

To reliably investigate performance issues, and give you insight into what your app is doing and whether it takes longer than it should, you can use system tracing with composition tracing.

System tracing gives you timing information of anything that happens in your app. It doesn't add any overhead to your app and therefore you can keep it in the production app without needing to worry about performance negative effects.

Set up composition tracing

Compose automatically populates some information on its runtime phases like when something is recomposing or when a lazy layout prefetches items. However, it's not enough information to actually figure out what might be a problematic section. You can improve the amount of information by setting up the composition tracing , which gives you the name of every single composable that was composed during the trace. This lets you start investigating performance problems without needing to add many custom trace("label") sections.

To enable Composition tracing, follow these steps:

  • Add the runtime-tracing dependency to your :app module:

At this point, you could record a system trace with Android Studio profiler and it would include all the information, but we will use the Macrobenchmark for performance measurements and system traces recording.

  • Add additional dependencies to the :measure module to enable composition tracing with Macrobenchmark:
  • Add the androidx.benchmark.fullTracing.enable=true instrumentation argument to the build.gradle file of the :measure module:

For more information about how to set up composition tracing, such as how to use it from terminal, see the documentation .

Capture initial performance with Macrobenchmark

There are several ways that you can retrieve a system trace file. For example, you could record with Android Studio profiler , capture it on device , or retrieve a system trace recorded with the Macrobenchmark. In this codelab, you use the traces taken by the Macrobenchmark library.

This project contains benchmarks in the :measure module that you can run to get the performance measurements. The benchmarks in this project are set to only run one iteration to save time during this codelab. In the real app, it's recommended to have at least ten iterations if the output variance is high.

To capture the initial performance, use the AccelerateHeavyScreenBenchmark test that scrolls the screen of the first task screen, follow these steps:

  • Open the AccelerateHeavyScreenBenchmark.kt file.
  • Run the benchmark with the gutter action next to the benchmark class:

e93fb1dc8a9edf4b.png

This benchmark scrolls the Task 1 screen and captures frame timing and custom

trace sections.

After the benchmark finishes, you should see the results in the Android Studio output pane:

The important metrics in the output are the following:

  • frameDurationCpuMs : Tells you how long it took to render frames. The shorter, the better.
  • frameOverrunMs : Tells you how much time was over the frame limit, including the work on GPU. A negative number is good because it means that there was still time.

The other metrics, such as the ImagePlaceholderMs metric, are using custom trace sections and outputs summed duration of all those sections in the trace file and how many times it occurred with the ImagePlaceholderCount metric.

All of these metrics can help us understand if the changes we make to our codebase are improving the performance.

Read the trace file

You can read the system trace from either Android Studio or with the web-based tool Perfetto .

While Android Studio profiler is a good way to quickly open a trace and show the process of your app, Perfetto provides more in-depth investigation capabilities for all processes running on a system with powerful SQL queries and more. In this codelab, you use Perfetto to analyze system traces.

  • Open the Perfetto website, which loads the tool's dashboard.
  • Locate the system traces captured by Macrobenchmark on your hosting file system, which are saved in [module]/outputs/connected_android_test_additional_output/benchmarkRelease/connected/[device]/ folder. Every benchmark iteration records a separate trace file, each containing the same interactions with your app.

51589f24d9da28be.png

  • Drag the AccelerateHeavyScreenBenchmark_...iter000...perfetto-trace file to the Perfetto UI and wait until it loads the trace file.
  • Optional: If you're not able to run the benchmark and generate the trace file, download our trace file and drag it to Perfetto:
  • Find the process of your app, which is called com.compose.performance . Usually the foreground app is below the hardware information lanes and a couple of system lanes.
  • Open the drop-down menu with the app's process name. You see the list of threads running in your app. Keep the trace file opened because you need it in the next step.

To find a performance problem in your app, you can leverage the Expected and Actual timelines on top of your app's thread list:

1bd6170d6642427e.png

The Expected Timeline tells you when the system expects the frames being produced by your app to show fluid, performant UI, which is, in this case, 16ms and 600µs (1000ms / 60). The Actual Timeline shows the real duration of frames produced by your app, including GPU work.

You might see different colors, which indicate the following:

  • Green frame : The frame produced on time.
  • Red frame : The janky frame took longer than expected. You should investigate the work done in these frames to prevent performance issues.
  • Light-green frame : The frame was produced within the time limit, but presented late, resulting in an increased input latency.
  • Yellow frame : The frame was janky, but the app wasn't the reason.

When the UI is rendered on screen, the changes are required to be faster than the duration your device expects a frame to be created. Historically this was approximately 16.6ms given that the display refresh rate was 60Hz, but for modern Android devices, it may be approximately 11ms or less because the display refresh rate is 90Hz or faster. It can also be different for each frame due to variable refresh rates .

For example, if your UI is composed of 16 items, then each item has roughly 1ms to be created to prevent any skipped frames. On the other hand, if you only have one item, such as a video player, it can take up to 16ms to compose it without jank.

Understand the system-tracing call chart

In the following image is an example of a simplified version of a system trace showing recomposition.

8f16db803ca19a7d.png

Each bar from the top down is the total time of the bars below it, the bars also correspond to the sections of code of functions called. Compose calls recompose on your composition hierarchy. The first composable is the MaterialTheme . Inside MaterialTheme is a composition local providing the theming information. From there, the HomeScreen composable is called. The home screen composable calls the MyImage and MyButton composables as part of its composition.

Gaps in system traces are from untraced code being run because system traces only show code that is marked for tracing. The code being run is happening after MyImage is called, but before MyButton is called and is taking up the amount of time the gap is sized.

In the next step, you analyze the trace you took in the previous step.

5. Accelerate heavy composables

As a first task when trying to optimize performance of your app, you should seek any heavy composables or a long-running task on the main thread. The long-running work might mean different things depending on how complicated your UI is and how much time there is to compose the UI.

So if a frame is dropped, you need to find which composables are taking too long and make them faster by offloading the main thread or skipping some of the work they do on the main thread.

To analyze the trace taken from the AccelerateHeavyScreenBenchmark test, follow these steps:

  • Open the system trace that you took in the previous step.
  • Zoom in on the first long frame, which contains the UI initialization after the data is loaded. The content of the frame looks similar to the following image:

838787b87b14bbaf.png

In the trace, you can see there are many things happening inside one frame, which can be found under Choreographer#doFrame section. You can see from the image that the biggest chunk of work comes from the composable that contains the ImagePlaceholder section, which loads a big image.

Don't load big images on the main thread

It might be obvious to load images asynchronously from a network using one of the convenience libraries like Coil or Glide , but what if you have a big image locally in your app that you need to show?

The common painterResource composable function that loads an image from resources loads the image on the main thread during composition. This means that if your image is big, it can block the main thread with some work.

In your case, you can see the problem as part of the asynchronous image placeholder. The painterResource composable loads a placeholder image that takes approximately 23ms to load.

c83d22c3870655a7.jpeg

There are several ways that you can improve this problem, including the following:

  • Load the image asynchronously.
  • Make the image smaller so that it loads faster.
  • Use a vector drawable that scales based on required size.

To fix this performance problem, follow these steps:

  • Navigate to the AccelerateHeavyScreen.kt file.
  • Locate the imagePlaceholder() composable that loads the image. The placeholder image has dimensions of 1600x1600px, which is clearly too big for what it shows.

53b34f358f2ff74.jpeg

  • Change the drawable to R.drawable.placeholder_vector :
  • Rerun the AccelerateHeavyScreenBenchmark test, which rebuilds the app and takes the system trace again.
  • Drag the system trace to the Perfetto dashboard.

Alternatively, you can download the trace:

  • Search for the ImagePlaceholder trace section, which shows you directly the improved part.

abac4ae93d599864.png

  • Observe that the ImagePlaceholder function doesn't block the main thread that much anymore.

8e76941fca0ae63c.jpeg

As an alternative solution in the real app, it might not be a placeholder image causing trouble, but some artwork. In this case, you might use Coil's rememberAsyncImage composable, which loads the composable asynchronously. This solution would show empty space until the placeholder is loaded , so beware that you might need to have a placeholder for these kinds of images.

There are still some other things that don't perform well, which you tackle in the next step.

6. Offload a heavy operation to a background thread

If we keep investigating the same item for additional problems, you will encounter sections with the name binder transaction , which take approximately 1ms each.

5c08376b3824f33a.png

Sections called binder transaction show that there was an interprocess communication happening between your process and some system process. It is a normal way of retrieving some information from the system, such as retrieving a system service.

These transactions are included in many of the APIs communicating with the system. For example, when retrieving a system service with getSystemService , registering a broadcast receiver, or requesting a ConnectivityManager .

Unfortunately these transactions don't provide much information about what they're requesting, so you have to analyze your code on the mentioned API usages and then add a custom trace section to be sure it's the problematic part.

To do improve the binder transactions, follow these steps:

  • Open the AccelerateHeavyScreen.kt file.
  • Locate the PublishedText composable. This composable formats a datetime with the current timezone and registers a BroadcastReceiver object that keeps the track of timezone changes. It contains a currentTimeZone state variable with the default system timezone as an initial value and then a DisposableEffect that registers a broadcast receiver for the timezone changes. Lastly, this composable shows a formatted datetime with Text . DisposableEffect , which is a good choice in this scenario because you need a way to unregister the broadcast receiver, which is done in the onDispose lambda. The problematic part, though, is that the code inside the DisposableEffect blocks the main thread:
  • Wrap the context.registerReceiver with a trace call to ensure that this is indeed what's causing all the binder transactions :

In general, a code running that long on the main thread might not cause many troubles, but the fact that this transaction runs for every single item visible on screen might cause problems. Assuming there are six items visible on screen, they need to be composed with the first frame. These calls alone can take 12ms of time, which is almost the whole deadline for one frame.

To fix this, you need to offload the broadcast registration to a different thread. You can do so with coroutines.

  • Get a scope that's tied to the composable lifecycle val scope = rememberCoroutineScope() .
  • Inside the effect, launch a coroutine on a dispatcher that isn't Dispatchers.Main . For example, Dispatchers.IO in this case. This way, the broadcast registration doesn't block the main thread, but the actual state currentTimeZone is kept in the main thread.

There's one more step to optimize this. You don't need a broadcast receiver for each item in the list, but only one. You should hoist it!

You can either hoist it and pass the timezone parameter down the tree of composables or, given it's not used in many places in your UI, you can use a composition local.

For the purpose of this codelab, you keep the broadcast receiver as part of the composables tree. However, in the real app, it might be beneficial to separate it into a data layer to prevent polluting your UI code.

  • Define the composition local with the default system timezone:
  • Update the ProvideCurrentTimeZone composable that takes a content lambda to provide the current time zone:
  • Move the DisposableEffect out of the PublishedText composable into the new one to hoist it there, and replace the currentTimeZone with the state and side effect:
  • Wrap a composable in which you want the composition local to be valid with the ProvideCurrentTimeZone . You can wrap the entire AccelerateHeavyScreen as shown in the following snippet:
  • Change the PublishedText composable to only contain the basic formatting functionality and read the current value of the composition local through LocalTimeZone.current :
  • Rerun the benchmark, which builds the app.

Alternatively, you can download the system trace with corrected code:

  • Drag the trace file to the Perfetto dashboard. All of the binder transactions sections are gone from the main thread.
  • Search for the section name that's similar to the previous step. You can find it in one of the other threads created by coroutines ( DefaultDispatch ):

87feee260f900a76.png

7. Remove unnecessary subcompositions

You moved the heavy code from the main thread, so it's not blocking composition anymore. There's still potential for improvement. You can remove some unnecessary overhead in the form of a LazyRow composable in each item.

In the example, each of the items contain a row of tags as highlighted in the following image:

e821c86604d3e670.png

This row is implemented with a LazyRow composable because it's easy to write it this way. Pass the items to the LazyRow composable and it takes care of the rest:

The problem is that while Lazy layouts excel in layouts where you have much more items than the constrained size, they incur some additional cost, which is unnecessary when the lazy composition is not required.

Given the nature of Lazy composables, which use a SubcomposeLayout composable, they are always shown as multiple chunks of work, first the container and then the items that are currently visible on screen, which is the second chunk of work. You can also find a compose:lazylist:prefetch trace in the system trace, which indicates that additional items are getting into the viewport and therefore they're prefetched to be ready in advance.

b3dc3662b5885a2e.jpeg

To determine roughly how much time this takes in your case, open the same trace file. You can see that there are sections detached from the parent item. Each item consists of the actual item being composed and then the tags items. This way each item results in roughly 2.5 milliseconds of composition time, which, if you multiply by the number of visible items, gives another big chunk of work.

a204721c80497e0f.jpeg

To fix this, follow these steps:

  • Navigate to the AccelerateHeavyScreen.kt file and locate the ItemTags composable.
  • Change the LazyRow implementation to a Row composable that iterates over the tags list as in the following snippet:
  • Rerun the benchmark, which will also build the app.
  • Optional: Download the system tracing with corrected code:
  • Find the ItemTag sections, observe that it takes less time, and it uses the same Compose:recompose root section.

219cd2e961defd1.jpeg

A similar situation might occur with other containers using a SubcomposeLayout composable, for example a BoxWithConstraints composable. It can span creation of the items across Compose:recompose sections, which might not be shown directly as a janky frame, but can be visible to the user. If you can, try to avoid a BoxWithConstraints composable in each item as it might only be needed when you compose a different UI based on the available space.

In this section you learned how to fix compositions that take too long.

8. Compare results with the initial benchmark

Now that you have finished optimizing the screen for performance, you should compare the benchmark results to the initial results.

667294bf641c8fc2.png

  • Select the oldest run that relates to the initial benchmark without any changes and compare the frameDurationCpuMs and frameOverrunMs metrics. You should see results similar to the following table:
  • Select the newest run that relates to the benchmark with all the optimizations. You should see results similar to the following table:

If you specifically check the frameOverrunMs row, you can see that all of the percentiles improved:

In the next section, you learn how to fix a composition that happens too often.

9. Prevent unnecessary recompositions

Compose has 3 phases:

  • Composition determines what to show by building a tree of composables.
  • Layout takes that tree and determines where the composables will appear on screen.
  • Drawing draws the composables on the screen.

The order of these phases is generally the same, allowing data to flow in one direction from composition to layout to drawing to produce a UI frame.

2147ae29192a1556.png

BoxWithConstraints , lazy layouts (for example LazyColumn or LazyVerticalGrid ) and all layouts based on SubcomposeLayout composables are notable exceptions, where the composition of children depends on the parents' layout phases.

Generally, composition is the most expensive phase to run as there is the most work to do and you may also cause other unrelated composables to recompose.

Most frames contain all three phases, but Compose can actually skip a phase entirely if there's no work to do. You can take advantage of this capability to increase the performance of your app.

Defer composition phases with lambda modifiers

Composable functions are run in the composition phase. To allow code to be run at a different time, you can provide it as a lambda function.

To do so, follow these steps:

  • Open the PhasesComposeLogo.kt file
  • Navigate to the Task 2 screen within the app. You see a logo that bounces off the edge of the screen.
  • Open the Layout Inspector and inspect Recomposition counts . You see a rapidly increasing number of recompositions.

a9e52e8ccf0d31c1.png

  • Optional: Locate the PhasesComposeLogoBenchmark.kt file and run it to retrieve the system trace to see the composition of PhasesComposeLogo trace section that occurs on every frame. Recompositions are shown in a trace as repeating sections with the same name.

4b6e72578c89b2c1.jpeg

  • If necessary, close the profiler and Layout Inspector, and then return to the code. You see the PhaseComposeLogo composable that looks like the following:

The logoPosition composable contains logic that changes its state with every frame and looks as follows:

The state is being read in the PhasesComposeLogo composable with the Modifier.offset(x.dp, y.dp) modifier, which means that it reads it in composition.

This modifier is why the app recomposes on every frame of this animation. In this case, there is a simple alternative: the lambda-based Offset modifier.

  • Update the Image composable to use the Modifier.offset modifier, which accepts a lambda that returns IntOffset object as in the following snippet:
  • Rerun the app and check the layout inspector. You see that the animation no longer generates any recomposition.

Remember, you shouldn't have to recompose only to adjust the layout of a screen, especially during scroll, which leads to janky frames. Recomposition that occurs during scroll is almost always unnecessary and should be avoided.

Other lambda modifiers

The Modifier.offset modifier isn't the only modifier with the lambda version. In the following table, you can see the common modifiers that would recompose every time, which can be replaced with its deferred alternatives when passing in a state value that frequently changes:

10. Defer Compose phases with custom layout

Using a lambda-based modifier is often the easiest way to avoid invalidating the composition, but sometimes there isn't a lambda-based modifier that does what you need. In these cases, you can directly implement a custom layout or even Canvas composable to go straight to the draw phase. Compose state reads done inside a custom layout only invalidate layout and skip recomposition. As a general guideline, if you only want to adjust the layout or size, but not add or remove composables, you can often achieve the effect without invalidating composition at all.

  • Open the PhasesAnimatedShape.kt file and then run the app.
  • Navigate to the Task 3 screen. This screen contains a shape that changes size when you click a button. The size value is animated with the animateDpAsState Compose animation API.
  • Open the Layout Inspector.
  • Click Toggle size .
  • Observe that the shape recomposes on every frame of the animation.

63d597a98fca1133.png

The MyShape composable takes the size object as a parameter, which is a state read. This means that when the size object changes, the PhasesAnimatedShape composable (the nearest recomposition scope) is recomposed and subsequently the MyShape composable is recomposed because its inputs have changed.

To skip recomposition, follow these steps:

  • Change the size parameter to a lambda function so that the size changes don't directly recompose the MyShape composable:
  • Update the call site in the PhasesAnimatedShape composable to use the lambda function:

Changing the size parameter to a lambda delays the state read. Now it occurs when the lambda is invoked.

  • Change the body of the MyShape composable to the following:

On the first line of the layout modifier measure lambda, you can see that the size lambda is invoked. This is inside the layout modifier, so it only invalidates layout, not composition.

  • Rerun the app, navigate to the Task 3 screen, and then open the Layout Inspector.
  • Click Toggle Size and then observe that the size of the shape animates the same as before, but the MyShape composable doesn't recompose.

11. Prevent recompositions with stable classes

Compose generates code that can skip execution of the composable if all of its input parameters are stable and haven't changed from previous composition. A type is stable if it is immutable or if it is possible for the Compose engine to know whether its value has changed between recompositions.

If the Compose engine isn't sure if a composable is stable, it will treat it as unstable and won't generate the code logic for skipping recomposition, which means that the composable will recompose every time. This can occur when a class is not a primitive type and one of the following situations occur:

  • It's a mutable class. For example, it contains a mutable property.
  • It's a class defined in a Gradle module that doesn't use Compose. They don't have a dependency on Compose compiler.
  • It's a class that contains an unstable property.

This behavior can be undesirable in some cases, where it causes performance issues and can be changed when you do the following:

  • Enable the strong skipping mode
  • Annotate the parameter with a @Immutable or @Stable annotation.
  • Add the class to the stability configuration file.

For more information on stability, read the documentation .

In this task, you have a list of items that can be added, removed, or checked, and you need to make sure that the items don't recompose when recomposition is unnecessary. There are two types of items alternating between ones that are recreated every time and ones that don't.

The items that are recreated every time are here as a simulation of the real-world use case where data comes from a local database (for example Room or sqlDelight ) or remote data source (such as API requests or Firestore entities), and returns a new instance of the object every time that there's a change.

Several composables have a Modifier.recomposeHighlighter() modifier attached, which you can find in our GitHub repository . This modifier shows a border whenever a composable is recomposed and can serve as an alternative temporary solution to Layout inspector.

Enable strong skipping mode

Jetpack Compose compiler 1.5.4 and higher comes with an option to enable strong skipping mode, which means that even composables with unstable parameters can generate skipping code . This mode is expected to radically reduce the amount of unskippable composables in your project, thus improving performance without any code change.

For the unstable parameters, the skipping logic is compared for instance equality , which means that parameter would be skipped if the same instance was passed to the composable as in the previous case. In contrast, stable parameters use structural equality (by calling the Object.equals() method) to determine skipping logic.

In addition to skipping logic, strong skipping mode also automatically remembers lambdas that are inside a composable function. This fact means that you don't need a remember call to wrap a lambda function , for example one that calls a ViewModel method.

The strong skipping mode can be enabled on a Gradle module basis.

To enable it, follow these steps:

  • Open the app build.gradle.kts file.
  • Update the composeCompiler block with the following snippet:

This adds the experimentalStrongSkipping compiler argument to the Gradle module.

b8a9619d159a7d8e.png

  • Rebuild the project.
  • Open the Task 5 screen, and then observe that the items that use structural equality are marked with an EQU icon and don't recompose when you interact with the list of items.

However, other types of items are still recomposed. You fix them in the next step.

Fix stability with annotations

As mentioned previously, with strong skipping mode enabled, a composable will skip its execution when the parameter has the same instance as in previous composition . This, however, is not true in situations where with every change a new instance of the unstable class is provided.

In your situation, the StabilityItem class is unstable because it contains an unstable LocalDateTime property.

To fix the stability of this class, follow these steps:

  • Navigate to the StabilityViewModel.kt file.
  • Locate the StabilityItem class and annotate it with @Immutable annotation:
  • Rebuild the app.
  • Navigate to the Task 5 screen and observe that none of the list items are recomposed.

This class now uses the structural equality for checking if it changed from previous composition and thus not recomposing them.

There's still the composable that refers to the date of the latest change, which keeps recomposing regardless of what you did until now.

Fix stability with the configuration file

The previous approach works well for classes that are part of your codebase. However, classes that are out of your reach, such as classes from third-party libraries or standard library classes, can't be edited.

You can enable a stability configuration file that takes classes (with possible wildcards) that will be treated as stable.

To enable this, follow these steps:

  • Navigate to the app build.gradle.kts file.
  • Add the stabilityConfigurationFile option to the composeCompiler block:
  • Sync the project with Gradle files.
  • Open the stability_config.conf file in the root folder of this project next to the README.md file.
  • Add the following:
  • Rebuild the app. If the date stays the same, the LocalDateTime class won't cause the Latest change was YYYY-MM-DD composable to recompose.

In your app, you can extend the file to contain patterns, so you don't have to write all the classes that should be treated as stable. So in your case, you can use java.time.* wildcard, which will treat all classes in the package as stable, such as Instant , LocalDateTime , ZoneId , and other classes from java time.

By following the steps, nothing on this screen recomposes except for the item that was added or interacted with, which is expected behavior.

12. Congratulations

Congratulations, you optimized the performance of a Compose app! While only showing a small portion of the performance issues that you might encounter in your app, you learned how to look at other potential problems and how to fix them.

What's next?

If you haven't generated a Baseline Profile for your app, we highly recommend doing so.

You can follow the codelab Improve app performance with Baseline Profiles . If you want more information on setting up benchmarks, see this codelab Inspect app performance with Macrobenchmark .

  • Enhancing Jetpack Compose app performance
  • Performance: System Tracing Basics
  • Debugging recomposition
  • Jetpack Compose Performance documentation
  • Write a Macrobenchmark
  • Android Jank detection with FrameTimeline

Content and code samples on this page are subject to the licenses described in the Content License . Java and OpenJDK are trademarks or registered trademarks of Oracle and/or its affiliates.

More From Forbes

Can Generative AI Solve The Data Overwhelm Problem?

  • Share to Facebook
  • Share to Twitter
  • Share to Linkedin

Data is arguably the most valuable asset for today’s businesses. This means it’s vital that people across the organization are able to work with data and extract insights – insights that (hopefully) lead to better, more informed decisions across the organization.

But all that’s easier said than done. Because, rather than being empowered by data, many people find themselves intimidated (or even paralyzed) by it.

How Bad Is The Data Overwhelm Problem?

In a world that’s full of data – where everything we do generates data – the sheer volume of data that’s available to the average business can become overwhelming. This phenomenon is described by software leaders Oracle as the “ Decision Dilemma ." You could also call it "decision paralysis" or "data anxiety." Whatever you call it, the basic gist is that more data causes anxiety and lack of action instead of better decisions.

For its Decision Dilemma report, Oracle surveyed more than 14,000 employees and business leaders across 17 countries, and the results were eye-opening:

· 83 percent agreed that access to data is essential for helping businesses make decisions, BUT…

· 86 percent said that data makes them feel less confident and

· 72 percent said that data has stopped them from being able to make a decision.

Why Is Chief Boden Leaving ‘Chicago Fire?’ Eamonn Walker’s Exit Explained

Nvidia are splitting 10 for 1 here s what it means and how to profit, massive dota 2 7 36 patch notes add innate abilities and facets.

At the same time, three-quarters of business leaders say the daily volume of decisions they need to make has increased tenfold over the last three years. More decisions to make, but less confidence in making them despite masses of data at our fingertips? This is a potential crisis for business leaders.

Could generative AI help to solve this crisis? Judging by generative AI’s ability to make sense of data and extract useful information – and the fact that generative AI capabilities are already being built into analytics tools – the answer appears to be yes.

What Can Generative AI Do?

How exactly can generative AI be used to interpret data? Use cases include:

· Driving faster and better decision making through better insights: Through real-time tracking of data, decision makers can gain a better grasp of what’s happening across the business and be presented with actionable insights. And this can be achieved through natural language prompts, such as “What are our top three customer behavior trends this month?”

· Acting as a decision-making co-pilot: Thanks to generative AI’s conversational abilities, these tools can function as virtual advisors – a sounding board to help discuss and generate ideas.

· Generating summaries of data: Generative AI can sift through vast quantities of data and create executive summaries that pull out the key points, along with best-practice recommendations.

· Visualizing data: Generative AI can generate analytics reports in an easy-to-digest format – presenting insights from the data not just as text narratives but also in a visual format (graphs, charts, etc.).

· Automating data analytics: Generative AI can potentially automate the data analysis process and provide automatic notifications for, well, anything you want. Spikes in sales, trending website activity, a drop in factory machine performance, increased sick leave, you name it…

· Harnessing predictive capabilities: As well as understanding what’s going on in the business right now , generative AI can help decision makers pre-empt what might be coming down the line.

· Using synthetic data to test ideas and scenarios: By creating large amounts of synthetic data that mimic real-world data, leaders can model scenarios that may be difficult to model with real-world data (for example, because an event is a rare, but impactful occurrence, or because gathering that much data would be difficult and expensive).

· Preparing data: Generative AI can also be used to take care of data preparation tasks such as tagging, classification, segmentation, and anonymization.

· Helping to clean up data for better analysis results: Because generative AI is so good at spotting patterns, it can be used to detect anomalies and inconsistencies in your data – things that could potentially skew results.

Another advantage is that generative AI can, in theory, work with all sorts of messy, unstructured data, including photos and video data, customer feedback comments, and social media posts – meaning, it isn’t just limited to neatly structured data in databases.

Best of all, these incredible capabilities make data much more actionable for decision-makers across the organization – regardless of their data expertise . So, you don't need to be a data expert to harness data in your everyday work. Decision paralysis, begone!

Look Out For Generative AI-Powered Tools

Providers of analytics software and platforms are beginning to build generative AI functions into their tools to enable more intelligent data analytics. For example, tools such as Microsoft Power BI, Teradata VantageCloud, Tableau AI, and Qlik Cloud now incorporate generative AI capabilities. This generally allows for natural language querying of data, easy summaries, tailored reports, and more.

What we’re seeing, then, is a democratization of generative AI and data. This will help to level the playing field between large corporations and smaller enterprises because you no longer need an army of data scientists to gain a competitive advantage.

We urgently need people to become more confident and competent at working with data. I believe generative AI will help to achieve this vision and solve the data overwhelm problem – by giving anyone the ability to analyze vast amounts of data in a more intuitive way. In other words, all you need to do is ask the right questions!

Read more about generative AI and its impact in my new book, Generative AI in Practice, 100+ Amazing Ways Generative Artificial Intelligence Is Changing Business And Society.

Bernard Marr

  • Editorial Standards
  • Reprints & Permissions

Join The Conversation

One Community. Many Voices. Create a free account to share your thoughts. 

Forbes Community Guidelines

Our community is about connecting people through open and thoughtful conversations. We want our readers to share their views and exchange ideas and facts in a safe space.

In order to do so, please follow the posting rules in our site's  Terms of Service.   We've summarized some of those key rules below. Simply put, keep it civil.

Your post will be rejected if we notice that it seems to contain:

  • False or intentionally out-of-context or misleading information
  • Insults, profanity, incoherent, obscene or inflammatory language or threats of any kind
  • Attacks on the identity of other commenters or the article's author
  • Content that otherwise violates our site's  terms.

User accounts will be blocked if we notice or believe that users are engaged in:

  • Continuous attempts to re-post comments that have been previously moderated/rejected
  • Racist, sexist, homophobic or other discriminatory comments
  • Attempts or tactics that put the site security at risk
  • Actions that otherwise violate our site's  terms.

So, how can you be a power user?

  • Stay on topic and share your insights
  • Feel free to be clear and thoughtful to get your point across
  • ‘Like’ or ‘Dislike’ to show your point of view.
  • Protect your community.
  • Use the report tool to alert us when someone breaks the rules.

Thanks for reading our community guidelines. Please read the full list of posting rules found in our site's  Terms of Service.

IMAGES

  1. How do we measure the performance of Problem-Solving Algorithms?

    measuring problem solving performance in ai

  2. How to Measure the Performance of Your AI/Machine Learning Platform?

    measuring problem solving performance in ai

  3. Problem Solving Techniques in Artificial Intelligence (AI)

    measuring problem solving performance in ai

  4. AI Problem Solving

    measuring problem solving performance in ai

  5. AI Problem Solving Process

    measuring problem solving performance in ai

  6. Problem Solving with AI

    measuring problem solving performance in ai

VIDEO

  1. Corporate Problem-Solving through the Power of Listening

  2. Research talk: Computationally efficient large-scale AI

  3. AI Book Analysis -- Pacing, Structure, Emotions, Word Choice, Comps, etc

  4. Intelligent Planning for Large-Scale Multi-Agent Systems

  5. AIOps Essentials: Issue Detection using Anomaly Detection on top of APM

  6. Rethinking Education Metrics #shorts #podcast

COMMENTS

  1. Chapter 3 Solving Problems by Searching

    3.3 Search Algorithms. A search algorithm takes a search problem as input and returns a solution, or an indication of failure. We consider algorithms that superimpose a search tree over the state-space graph, forming various paths from the initial state, trying to find a path that reaches a goal state.

  2. Performance Metrics in Machine Learning [Complete Guide]

    Performance metrics are a part of every machine learning pipeline. They tell you if you're making progress, and put a number on it. All machine learning models, whether it's linear regression, or a SOTA technique like BERT, need a metric to judge performance. Every machine learning task can be broken down to either Regression or ...

  3. PDF Principles of Problem Solving in AI Systems

    1 3. Principles of Creative Problem Solving in AI Systems. 557. empirical computational exploration contribute to creating the imagination of the eficacy of AI in the area of creative problem solving. However, the critical issue of the possibility of developing self-adaptive learning by the creative systems has not been further discussed yet.

  4. How to Measure Your AI Problem-Solving Success

    The initial step of any AI project is to define the problem that needs to be solved. This requires recognizing the goals, constraints, assumptions, and criteria for success. A thorough and precise ...

  5. Artificial intelligence: How to measure the "I" in AI

    In the paper, Chollet presents the Abstraction Reasoning Corpus (ARC), a dataset intended to evaluate the efficiency of AI systems and compare their performance with that of human intelligence. ARC is a set of problem-solving tasks that tailored for both AI and humans. One of the key ideas behind ARC is to level the playing ground between ...

  6. Understanding AI, Part 2: Performance Measurement

    Selecting the right metrics to measure success is difficult, and emphasizing certain aspects of performance can sometimes come at the expense of others. In a study performed by researchers at the Institute for Artificial Intelligence and Decision Support in Vienna, they found that more than three-quarters (77.2%) of the analyzed benchmark ...

  7. Measuring problem-solving performance in AI

    We can evaluate an algorithm's performance in four ways in Artificial intelligence. Completeness; Optimality; Time complexity; Space complexity; Time and space complexity are always considered with respect to some measure of the problem difficulty. In theoretical computer science, the typical measure is the size of the state space graph, |V | + |E|, where V is the set of vertices (nodes) of ...

  8. PDF A Modeling Approach for Measuring the Performance of a Human-AI

    the performance landscape in search of tall peaks. Since in the real world of organizational problem-solving, payoffs and the menu of choices are uncertain [23] (as opposed to the closed world of games)—what Hogarth terms "wicked" problems [24]—the exploration has to contend with a "rugged landscape" [25] (i.e., the risk of local ...

  9. A Modeling Approach for Measuring the Performance of a Human-AI ...

    Since in the real world of organizational problem-solving, payoffs and the menu of choices are uncertain (as ... Sankaran, Ganesh, Marco A. Palomino, Martin Knahl, and Guido Siestrup. 2022. "A Modeling Approach for Measuring the Performance of a Human-AI Collaborative Process" Applied Sciences 12, no. 22: 11642. https: ...

  10. Error Metrics and Performance Fitness Indicators for Artificial

    Artificial intelligence (AI) and Machine learning (ML) train machines to achieve a high level of cognition and perform human-like analysis. Both AI and ML seemingly fit into our daily lives as well as complex and interdisciplinary fields. With the rise of commercial, open-source, and user-catered AI/ML tools, a key question often arises whenever AI/ML is applied to explore a phenomenon or a ...

  11. Performance assessment methodology for AI-supported decision-making in

    Design of decision-making AI This approach seeks to lay the groundwork for a generic model that allows one to measure an AI’s performance on a given task. Many AI and machine learning techniques, i.e. supervised and unsupervised learning, rely on large amounts of data from which to generate models.

  12. AI-assisted evaluation of problem-solving performance using eye

    With the ability to predict learning behaviors, artificial intelligence (AI) is increasingly involved in assessing the performance of problem solving. This study explored the potential of AI to predict whether mathematics problems could be solved based on eye movements and handwriting in a digital problem-solving environment.

  13. How Can We Measure (Artificial) Intelligence?

    AI systems that are narrowly focused on task performance can often outperform humans on those specific tasks. However, they lack the flexibility and adaptability of humans when it comes to general problem solving. As a result, the focus on task-specific performance has led to a striking lack of generality in AI.

  14. PDF COL333/671: Introduction to AI

    Measuring problem-solving performance •Cartoon of search tree: •b is the branching factor •m is the maximum depth •solutions at various depths •d is the depth of the shallowest goal node •Number of nodes in entire tree? •1 + b + b2+ …. bm= O(bm) •Each node can generate the b new nodes … 1 node b b nodes b2nodes bmnodes m is ...

  15. Problem-Solving and Performance Management Tools

    The A3 process allows groups of people to actively collaborate on the purpose, goals, and strategy of a project. It encourages in-depth problem-solving throughout the process and adjusting as needed to ensure that the project most accurately meets its intended goal (see Fig. 6.1).The A3 process is a problem-solving tool Toyota developed to foster learning, collaboration, and personal growth in ...

  16. Principles for evaluation of AI/ML model performance and robustness

    The Department of Defense (DoD) has significantly increased its investment in the design, evaluation, and deployment of Artificial Intelligence and Machine Learning (AI/ML) capabilities to address national security needs. While there are numerous AI/ML successes in the academic and commercial sectors, many of these systems have also been shown to be brittle and nonrobust.

  17. Principles of Creative Problem Solving in AI Systems

    By having an overview of the development of AI and Cognitive Science and rebranding the strands of creativity and problem solving, Dr. Ana-Maria Oltețeanu attempts to build cognitive systems, which propose a type of knowledge organization and a small set of processes aimed at solving a diverse number of creative problems.

  18. Problem-solving in Artificial Intelligence

    The problem-solving agent selects a cost function, which reflects its performance measure. Remember, an optimal solution has the lowest path cost among all the solutions. Note: Initial state, actions , and transition model together define the state-space of the problem implicitly.

  19. Problem Solving in Artificial Intelligence

    There are basically three types of problem in artificial intelligence: 1. Ignorable: In which solution steps can be ignored. 2. Recoverable: In which solution steps can be undone. 3. Irrecoverable: Solution steps cannot be undo. Steps problem-solving in AI: The problem of AI is directly associated with the nature of humans and their activities.

  20. An alternative approach for measuring computational thinking

    Educators emphasize that computational thinking is especially helpful in raising problem-solving skills, improving logical reasoning, and developing analytical thinking. In this context, the aim of this study is to develop an online, interactive, valid, reliable and useful performance-based assessment tool for measuring computational thinking ...

  21. Principles of Creative Problem Solving in AI Systems

    In the second part, which comprises chapter 5 th to 8 th, the author develops a cognitive framework to explore how a diverse set of creative problem solving tasks can be solved computationally using a unified set of principles.To facilitate the understanding of insight and creative problem solving, Dr. Oltețeanu puts forward a metaphor, in which representations are seen as cogs in a creative ...

  22. Performance from problem solving

    At MassMutual, problem solving leads to higher standards, which in turn mean more problems to solve. The constant cycle is raising performance at every level of ... Performance from problem solving. 126 The Lean Management nterprise A system for daily progress, meaningful purpose, and lasting value

  23. Problem Solving Techniques in AI

    Artificial intelligence (AI) problem-solving often involves investigating potential solutions to problems through reasoning techniques, making use of polynomial and differential equations, and carrying them out and use modelling frameworks. A same issue has a number of solutions, that are all accomplished using an unique algorithm.

  24. Practical performance problem solving in Jetpack Compose

    3. Approach to fixing performance issues. Spotting slow, non-performant UI is possible with just plain sight and exploring the app. But before you jump in and start fixing code based on your assumptions, you should measure the of your code to understand if your changes will make a difference.

  25. Showcase Problem-Solving in Admin Evaluations

    Here's how you can showcase your problem-solving skills in a performance evaluation. Powered by AI and the LinkedIn community. 1. Define Issues. 2. Analyze Causes. Be the first to add your ...

  26. Can Generative AI Solve The Data Overwhelm Problem?

    I believe generative AI will help to achieve this vision and solve the data overwhelm problem - by giving anyone the ability to analyze vast amounts of data in a more intuitive way. In other ...