OL_sub-brand_lockup_two-line_rgb_black-ol

  • Data Science
  • Engineering
  • Entrepreneurship
  • Technology Insider
  • Manufacturing
  • MIT Bootcamps
  • MIT Open Learning
  • MITx MicroMasters Programs
  • Online Education
  • Professional Development
  • Quantum Computing

  View All Posts

Here are the Most Common Problems Being Solved by Machine Learning

By: MIT xPRO on August 5th, 2020 3 Minute Read

Print/Save as PDF

Here are the Most Common Problems Being Solved by Machine Learning

Machine Learning

Although machine learning offers important new capabilities for solving today’s complex problems, more organizations may be tempted to apply machine learning techniques as a one-size-fits all solution. 

To use machine learning effectively, engineers and scientists need a clear understanding of the most common issues that machine learning can solve. In a recent MIT xPRO Machine Learning whitepaper titled  " Applications For Machine Learning In Engineering and the Physical Sciences,” Professor Youssef Marzouk and fellow MIT colleagues outlined the potentials and limitations of machine learning in STEM. 

Here are some common challenges that can be solved by machine learning:

Accelerate processing and increase efficiency Machine learning can wrap around existing science and engineering models to create fast and accurate surrogates, identify key patterns in model outputs, and help further tune and refine the models. All this helps more quickly and accurately predict outcomes at new inputs and design conditions.

Quantify and manage risk. Machine learning can be used to model the probability of different outcomes in a process that cannot easily be predicted due to randomness or noise. This is especially valuable for situations where reliability and safety are paramount.

Compensate for missing data. Gaps in a data set can severely limit accurate learning, inference, and prediction. Models trained by machine learning improve with more relevant data. When used correctly, machine learning can also help synthesize missing data that round out incomplete datasets.

Make more accurate predictions or conclusions from your data . You can streamline your data-to-prediction pipeline by tuning how your machine learning model’s parameters will be updated and learning during training. Building better models of your data will also improve the accuracy of subsequent predictions.

Solve complex classification and prediction problems. Predicting how an organism’s genome will be expressed or what the climate will be like in fifty years are examples of highly complex problems. Many modern machine learning problems take thousands or even millions of data samples (or far more) across many dimensions to build expressive and powerful predictors, often pushing far beyond traditional statistical methods.

Create new designs. There is often a disconnect between what designers envision and how products are made. It’s costly and time-consuming to simulate every variation of a long list of design variables. Machine learning can identify key variables, automatically generate good options, and help designers identify which best fits their requirements.

Increase yields. Manufacturers aim to overcome inconsistency in equipment performance and predict maintenance by applying machine learning to flag defects and quality issues before products ship to customers, improve efficiency on the production line, and increase yields by optimizing the use of manufacturing resources.

Machine learning is undoubtedly hitting its stride, as engineers and physical scientists leverage the competitive advantage of big data across industries — from aerospace, to construction, to pharmaceuticals, transportation, and energy. But it has never been more important to understand the physics-based models, computational science, and engineering paradigms upon which machine learning solutions are built.

The list above details the most common problems that organizations can solve with machine learning. For more specific applications across engineering and the physical sciences, download MIT xPRO’s free Machine Learning whitepaper .

MIT Logo

  • More about MIT xPRO
  • About this Site
  • Terms of Service
  • Privacy Policy

openedx-logo

How to Approach Machine Learning Problems

How do you approach machine learning problems? Are neural networks the answer to nearly every challenge you may encounter?

In this article, Toptal Freelance Python Developer Peter Hussami explains the basic approach to machine learning problems and points out where neural may fall short.

How to Approach Machine Learning Problems

By Peter Hussami

Peter’s rare math-modeling expertise includes audio and sensor analysis, ID verification, NPL, scheduling, routing, and credit scoring.

PREVIOUSLY AT

One of the main tasks of computers is to automate human tasks. Some of these tasks are simple and repetitive, such as “move X from A to B.” It gets far more interesting when the computer has to make decisions about problems that are far more difficult to formalize. That is where we start to encounter basic machine learning problems.

Machine learning problems cover illustration

Historically, such algorithms were built by scientists or experts that had intimate knowledge of their field and were largely based on rules. With the explosion of computing power and the availability of large and diverse data sets, the focus has shifted toward a more computational approach.

Most popularized machine learning concepts these days have to do with neural networks, and in my experience, this created the impression in many people that neural networks are some kind of a miracle weapon for all inference problems. Actually, this is quite far from the truth. In the eyes of the statistician, they form one class of inference approaches with their associated strengths and weaknesses, and it completely depends on the problem whether neural networks are going to be the best solution or not.

Quite often, there are better approaches.

In this article, we will outline a structure for attacking machine learning problems. There is no scope for going into too much detail about specific machine learning models , but if this article generates interest, subsequent articles could offer detailed solutions for some interesting machine learning problems.

First, however, let us spend some effort showing why you should be more circumspect than to automatically think “ neural network ” when faced with a machine learning problem.

Pros and Cons of Neural Networks

With neural networks, the inference is done through a weighted “network.” The weights are calibrated during the so-called “learning” process, and then, subsequently, applied to assign outcomes to inputs.

As simple as this may sound, all the weights are parameters to the calibrated network, and usually, that means too many parameters for a human to make sense of.

Neural network theory input-output illustration

So we might as well just consider neural networks as some kind of an inference black box that connects the input to output, with no specific model in between.

Let us take a closer look at the pros and cons of this approach.

Advantages of Neural Networks

  • The input is the data itself. Usable results even with little or no feature engineering.
  • Trainable skill. With no feature engineering, there is no need for such hard-to-develop skills as intuition or domain expertise. Standard tools are available for generic inferences.
  • Accuracy improves with the quantity of data. The more inputs it sees, the better a neural network performs.
  • May outperform classical models when there is not full information about the model. Think of public sentiment, for one.
  • Open-ended inference can discover unknown patterns. If you use a model and leave a consideration out of it, it will not detect the corresponding phenomenon. Neural networks might.

Successful neural network example: Google’s AI found a planet orbiting a distant star—where NASA did not—by analyzing accumulated telescope data.

Disadvantages of Neural Networks

  • They require a lot of (annotated!) data. First, this amount of data is not always available. Convergence is slow. A solid model (say, in physics) can be calibrated after a few observations—with neural networks, this is out of the question. Annotation is a lot of work, not to mention that it, in itself, is not foolproof.
  • No information about the inner structure of the data. Are you interested in what the inference is based on? No luck here. There are situations where manually adjusting the data improves inference by a leap, but a neural network will not be able to help with that.
  • Overfitting issues. It happens often that the network has more parameters than the data justifies, which leads to suboptimal inference.
  • Performance depends on information. If there is full information about a problem, a solid model tends to outperform a neural network.
  • There are sampling problems. Sampling is always a delicate issue, but with a model, one can quickly develop a notion of problematic sampling. Neural networks learn only from the data, so if they get biased data, they will have biased conclusions.

An example of failure: A personal relation told me of a major corporation (that I cannot name) that was working on detecting military vehicles on aerial photos. They had images where there were such vehicles and others that did not. Most images of the former class were taken on a rainy day, while the latter were taken in sunny weather. As a result, the system learned to distinguish light from shadow.

To sum up, neural networks form one class of inference methods that have their pros and cons.

The fact that their popularity outshines all other statistical methods in the eyes of the public has likely more to do with corporate governance than anything else.

Training people to use standard tools and standardized neural network methods is a far more predictable process than hunting for domain experts and artists from various fields. This, however, does not change the fact that using a neural network for a simple, well-defined problem is really just shooting a sparrow with a cannon: It needs a lot of data, requires a lot of annotation work, and in return might just underperform when compared to a solid model. Not the best package.

Still, there is huge power in the fact that they “democratize” statistical knowledge. Once a neural network-based inference solution is viewed as a mere programming tool, it may help even those who don’t feel comfortable with complex algorithms. So, inevitably, a lot of things are now built that would otherwise not exist if we could only operate with sophisticated models.

Approaching Machine Learning Problems

When approaching machine learning problems, these are the steps you will need to go through:

  • Setting acceptance criteria
  • Cleaning your data and maximizing ist information content
  • Choosing the most optimal inference approach
  • Train, test, repeat

Let us see these items in detail.

Different steps of a machine learning problem

Setting Acceptance Criteria

You should have an idea of your target accuracy as soon as possible, to the extent possible. This is going to be the target you work towards.

Cleansing Your Data and Maximizing Its Information Content

This is the most critical step. First of all, your data should have no (or few) errors. Cleansing it of these is an essential first step. Substitute missing values, try to identify patterns that are obviously bogus, eliminate duplicates and any other anomaly you might notice.

As for information, if your data is very informative (in the linear sense), then practically any inference method will give you good results. If the required information is not in there, then the result will be noise. Maximizing the information means primarily finding any useful non-linear relationships in the data and linearizing them. If that improves the inputs significantly, great. If not, then more variables might need to be added. If all of this does not yield fruit, target accuracy may suffer.

With some luck, there will be single variables that are useful. You can identify useful variables if you—for instance—plot them against the learning target variable(s) and find the plot to be function-like (i.e., narrow range in the input corresponds to narrow range in the output). This variable can then be linearized—for example, if it plots as a parabola, subtract some values and take the square root.

For variables that are noisy—narrow range in input corresponds to a broad range in the output—we may try combining them with other variables.

To have an idea of the accuracy, you may want to measure conditional class probabilities for each of your variables (for classification problems) or to apply some very simple form of regression, such as linear regression (for prediction problems). If the information content of the input improves, then so will your inference, and you simply don’t want to waste too much time at this stage calibrating a model when the data is not yet ready. So keep testing as simple as possible.

Choosing the Most Optimal Inference Approach

Once your data is in decent shape, you can go for the inference method (the data might still be polished later, if necessary).

Should you use a model? Well, if you have good reason to believe that you can build a good model for the task, then you probably should. If you don’t think so, but there is ample data with good annotations, then you may go hands-free with a neural network. In practical machine learning applications, however, there is often not enough data for that.

Playing accuracy vs. cover often pays off tremendously. Hybrid approaches are usually completely fine. Suppose the data is such that you can get near-100% accuracy on 80% of it with a simple model? This means you can demonstrable results quickly, and if your system can identify when it’s operating on the 80% friendly territory, then you’ve basically covered most of the problem. Your client may not yet be fully happy, but this will earn you their trust quickly. And there is nothing to prevent you from doing something similar on the remaining data: with reasonable effort now you cover, say, 92% of the data with 97% accuracy. True, on the rest of the data, it’s a coin flip, but you already produced something useful.

For most practical applications, this is very useful. Say, you’re in the lending business and want to decide whom to give a loan, and all you know is that on 70% of the clients your algorithm is very accurate. Great—true, the other 30% of your applicants will require more processing, but 70% can be fully automated. Or: suppose you’re trying to automate operator work for call centers, you can do a good (quick and dirty) job on the most simple tasks only, but these tasks cover 50% of the calls? Great, the call center saves money if they can automate 50% of their calls reliably.

To sum up: If the data is not informative enough, or the problem is too complex to handle in its entirety, think outside the box. Identify useful and easy-to-solve sub-problems until you have a better idea.

Once you have your system ready, learn, test and loop it until you’re happy with the results.

Train, Test, Repeat

After the previous steps, little of interest is left. You have the data, you have the machine learning method, so it’s time to extract parameters via learning and then test the inference on the test set. Literature suggests 70% of the records should be used for training, and 30% for testing.

If you’re happy with the results, the task is finished. But, more likely, you developed some new ideas during the procedure, and these could help you notch up in accuracy. Perhaps you need more data? Or just more data cleansing? Or another model? Either way, chances are you’ll be busy for quite a while.

So, good luck and enjoy the work ahead!

Further Reading on the Toptal Blog:

  • Identifying the Unknown With Clustering Metrics
  • Advantages of AI: Using GPT and Diffusion Models for Image Generation
  • Stars Realigned: Improving the IMDb Rating System
  • Machines and Trust: How to Mitigate AI Bias

Understanding the basics

Machine learning vs. deep learning: what's the difference.

Machine learning includes all inference techniques while deep learning aims at uncovering meaningful non-linear relationships in the data. So deep learning is a subset of machine learning and also a means of automated feature engineering applied to a machine learning problem.

Which language is best for machine learning?

The ideal choice is a language that has both broad programming library support and allows you to focus on the math rather than infrastructure. The most popular language is Python, but algorithmic languages such as Matlab or R or mainstreamers like C++ and Java are all valid choices as well.

Machine learning vs. neural networks: What's the difference?

Neural networks represent just one approach within machine learning with its pros and cons as detailed above.

What is the best way to learn machine learning?

There are some good online courses and summary pages. It all depends on one’s skills and tastes. My personal advice: Think of machine learning as statistical programming. Beef up on your math and avoid all sources that equate machine learning with neural networks.

What are the advantages and disadvantages of artificial neural networks?

Some advantages: no math, feature engineering, or artisan skills required; easy to train; may uncover aspects of the problem not originally considered. Some disadvantages: requires relatively more data; tedious preparation work; leaves no explanation as to why they decide the way they do, overfitting.

  • NeuralNetwork
  • MachineLearning

Peter Hussami

Budapest, Hungary

Member since July 3, 2017

About the author

World-class articles, delivered weekly.

By entering your email, you are agreeing to our privacy policy .

Toptal Developers

  • Algorithm Developers
  • Angular Developers
  • AWS Developers
  • Azure Developers
  • Big Data Architects
  • Blockchain Developers
  • Business Intelligence Developers
  • C Developers
  • Computer Vision Developers
  • Django Developers
  • Docker Developers
  • Elixir Developers
  • Go Engineers
  • GraphQL Developers
  • Jenkins Developers
  • Kotlin Developers
  • Kubernetes Experts
  • Machine Learning Engineers
  • Magento Developers
  • .NET Developers
  • R Developers
  • React Native Developers
  • Ruby on Rails Developers
  • Salesforce Developers
  • SQL Developers
  • Tableau Developers
  • Unreal Engine Developers
  • Xamarin Developers
  • View More Freelance Developers

Join the Toptal ® community.

Advertisement

Advertisement

Machine Learning: Algorithms, Real-World Applications and Research Directions

  • Review Article
  • Published: 22 March 2021
  • Volume 2 , article number  160 , ( 2021 )

Cite this article

problem solving machine learning

  • Iqbal H. Sarker   ORCID: orcid.org/0000-0003-1740-5517 1 , 2  

502k Accesses

1418 Citations

23 Altmetric

Explore all metrics

In the current age of the Fourth Industrial Revolution (4 IR or Industry 4.0), the digital world has a wealth of data, such as Internet of Things (IoT) data, cybersecurity data, mobile data, business data, social media data, health data, etc. To intelligently analyze these data and develop the corresponding smart and automated  applications, the knowledge of artificial intelligence (AI), particularly, machine learning (ML) is the key. Various types of machine learning algorithms such as supervised, unsupervised, semi-supervised, and reinforcement learning exist in the area. Besides, the deep learning , which is part of a broader family of machine learning methods, can intelligently analyze the data on a large scale. In this paper, we present a comprehensive view on these machine learning algorithms that can be applied to enhance the intelligence and the capabilities of an application. Thus, this study’s key contribution is explaining the principles of different machine learning techniques and their applicability in various real-world application domains, such as cybersecurity systems, smart cities, healthcare, e-commerce, agriculture, and many more. We also highlight the challenges and potential research directions based on our study. Overall, this paper aims to serve as a reference point for both academia and industry professionals as well as for decision-makers in various real-world situations and application areas, particularly from the technical point of view.

Similar content being viewed by others

problem solving machine learning

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

problem solving machine learning

Machine learning and deep learning

problem solving machine learning

AI-Based Modeling: Techniques, Applications and Research Issues Towards Automation, Intelligent and Smart Systems

Avoid common mistakes on your manuscript.

Introduction

We live in the age of data, where everything around us is connected to a data source, and everything in our lives is digitally recorded [ 21 , 103 ]. For instance, the current electronic world has a wealth of various kinds of data, such as the Internet of Things (IoT) data, cybersecurity data, smart city data, business data, smartphone data, social media data, health data, COVID-19 data, and many more. The data can be structured, semi-structured, or unstructured, discussed briefly in Sect. “ Types of Real-World Data and Machine Learning Techniques ”, which is increasing day-by-day. Extracting insights from these data can be used to build various intelligent applications in the relevant domains. For instance, to build a data-driven automated and intelligent cybersecurity system, the relevant cybersecurity data can be used [ 105 ]; to build personalized context-aware smart mobile applications, the relevant mobile data can be used [ 103 ], and so on. Thus, the data management tools and techniques having the capability of extracting insights or useful knowledge from the data in a timely and intelligent way is urgently needed, on which the real-world applications are based.

figure 1

The worldwide popularity score of various types of ML algorithms (supervised, unsupervised, semi-supervised, and reinforcement) in a range of 0 (min) to 100 (max) over time where x-axis represents the timestamp information and y-axis represents the corresponding score

Artificial intelligence (AI), particularly, machine learning (ML) have grown rapidly in recent years in the context of data analysis and computing that typically allows the applications to function in an intelligent manner [ 95 ]. ML usually provides systems with the ability to learn and enhance from experience automatically without being specifically programmed and is generally referred to as the most popular latest technologies in the fourth industrial revolution (4 IR or Industry 4.0) [ 103 , 105 ]. “Industry 4.0” [ 114 ] is typically the ongoing automation of conventional manufacturing and industrial practices, including exploratory data processing, using new smart technologies such as machine learning automation. Thus, to intelligently analyze these data and to develop the corresponding real-world applications, machine learning algorithms is the key. The learning algorithms can be categorized into four major types, such as supervised, unsupervised, semi-supervised, and reinforcement learning in the area [ 75 ], discussed briefly in Sect. “ Types of Real-World Data and Machine Learning Techniques ”. The popularity of these approaches to learning is increasing day-by-day, which is shown in Fig. 1 , based on data collected from Google Trends [ 4 ] over the last five years. The x - axis of the figure indicates the specific dates and the corresponding popularity score within the range of \(0 \; (minimum)\) to \(100 \; (maximum)\) has been shown in y - axis . According to Fig. 1 , the popularity indication values for these learning types are low in 2015 and are increasing day by day. These statistics motivate us to study on machine learning in this paper, which can play an important role in the real-world through Industry 4.0 automation.

In general, the effectiveness and the efficiency of a machine learning solution depend on the nature and characteristics of data and the performance of the learning algorithms . In the area of machine learning algorithms, classification analysis, regression, data clustering, feature engineering and dimensionality reduction, association rule learning, or reinforcement learning techniques exist to effectively build data-driven systems [ 41 , 125 ]. Besides, deep learning originated from the artificial neural network that can be used to intelligently analyze data, which is known as part of a wider family of machine learning approaches [ 96 ]. Thus, selecting a proper learning algorithm that is suitable for the target application in a particular domain is challenging. The reason is that the purpose of different learning algorithms is different, even the outcome of different learning algorithms in a similar category may vary depending on the data characteristics [ 106 ]. Thus, it is important to understand the principles of various machine learning algorithms and their applicability to apply in various real-world application areas, such as IoT systems, cybersecurity services, business and recommendation systems, smart cities, healthcare and COVID-19, context-aware systems, sustainable agriculture, and many more that are explained briefly in Sect. “ Applications of Machine Learning ”.

Based on the importance and potentiality of “Machine Learning” to analyze the data mentioned above, in this paper, we provide a comprehensive view on various types of machine learning algorithms that can be applied to enhance the intelligence and the capabilities of an application. Thus, the key contribution of this study is explaining the principles and potentiality of different machine learning techniques, and their applicability in various real-world application areas mentioned earlier. The purpose of this paper is, therefore, to provide a basic guide for those academia and industry people who want to study, research, and develop data-driven automated and intelligent systems in the relevant areas based on machine learning techniques.

The key contributions of this paper are listed as follows:

To define the scope of our study by taking into account the nature and characteristics of various types of real-world data and the capabilities of various learning techniques.

To provide a comprehensive view on machine learning algorithms that can be applied to enhance the intelligence and capabilities of a data-driven application.

To discuss the applicability of machine learning-based solutions in various real-world application domains.

To highlight and summarize the potential research directions within the scope of our study for intelligent data analysis and services.

The rest of the paper is organized as follows. The next section presents the types of data and machine learning algorithms in a broader sense and defines the scope of our study. We briefly discuss and explain different machine learning algorithms in the subsequent section followed by which various real-world application areas based on machine learning algorithms are discussed and summarized. In the penultimate section, we highlight several research issues and potential future directions, and the final section concludes this paper.

Types of Real-World Data and Machine Learning Techniques

Machine learning algorithms typically consume and process data to learn the related patterns about individuals, business processes, transactions, events, and so on. In the following, we discuss various types of real-world data as well as categories of machine learning algorithms.

Types of Real-World Data

Usually, the availability of data is considered as the key to construct a machine learning model or data-driven real-world systems [ 103 , 105 ]. Data can be of various forms, such as structured, semi-structured, or unstructured [ 41 , 72 ]. Besides, the “metadata” is another type that typically represents data about the data. In the following, we briefly discuss these types of data.

Structured: It has a well-defined structure, conforms to a data model following a standard order, which is highly organized and easily accessed, and used by an entity or a computer program. In well-defined schemes, such as relational databases, structured data are typically stored, i.e., in a tabular format. For instance, names, dates, addresses, credit card numbers, stock information, geolocation, etc. are examples of structured data.

Unstructured: On the other hand, there is no pre-defined format or organization for unstructured data, making it much more difficult to capture, process, and analyze, mostly containing text and multimedia material. For example, sensor data, emails, blog entries, wikis, and word processing documents, PDF files, audio files, videos, images, presentations, web pages, and many other types of business documents can be considered as unstructured data.

Semi-structured: Semi-structured data are not stored in a relational database like the structured data mentioned above, but it does have certain organizational properties that make it easier to analyze. HTML, XML, JSON documents, NoSQL databases, etc., are some examples of semi-structured data.

Metadata: It is not the normal form of data, but “data about data”. The primary difference between “data” and “metadata” is that data are simply the material that can classify, measure, or even document something relative to an organization’s data properties. On the other hand, metadata describes the relevant data information, giving it more significance for data users. A basic example of a document’s metadata might be the author, file size, date generated by the document, keywords to define the document, etc.

In the area of machine learning and data science, researchers use various widely used datasets for different purposes. These are, for example, cybersecurity datasets such as NSL-KDD [ 119 ], UNSW-NB15 [ 76 ], ISCX’12 [ 1 ], CIC-DDoS2019 [ 2 ], Bot-IoT [ 59 ], etc., smartphone datasets such as phone call logs [ 84 , 101 ], SMS Log [ 29 ], mobile application usages logs [ 137 ] [ 117 ], mobile phone notification logs [ 73 ] etc., IoT data [ 16 , 57 , 62 ], agriculture and e-commerce data [ 120 , 138 ], health data such as heart disease [ 92 ], diabetes mellitus [ 83 , 134 ], COVID-19 [ 43 , 74 ], etc., and many more in various application domains. The data can be in different types discussed above, which may vary from application to application in the real world. To analyze such data in a particular problem domain, and to extract the insights or useful knowledge from the data for building the real-world intelligent applications, different types of machine learning techniques can be used according to their learning capabilities, which is discussed in the following.

Types of Machine Learning Techniques

Machine Learning algorithms are mainly divided into four categories: Supervised learning, Unsupervised learning, Semi-supervised learning, and Reinforcement learning [ 75 ], as shown in Fig. 2 . In the following, we briefly discuss each type of learning technique with the scope of their applicability to solve real-world problems.

figure 2

Various types of machine learning techniques

Supervised: Supervised learning is typically the task of machine learning to learn a function that maps an input to an output based on sample input-output pairs [ 41 ]. It uses labeled training data and a collection of training examples to infer a function. Supervised learning is carried out when certain goals are identified to be accomplished from a certain set of inputs [ 105 ], i.e., a task-driven approach . The most common supervised tasks are “classification” that separates the data, and “regression” that fits the data. For instance, predicting the class label or sentiment of a piece of text, like a tweet or a product review, i.e., text classification, is an example of supervised learning.

Unsupervised: Unsupervised learning analyzes unlabeled datasets without the need for human interference, i.e., a data-driven process [ 41 ]. This is widely used for extracting generative features, identifying meaningful trends and structures, groupings in results, and exploratory purposes. The most common unsupervised learning tasks are clustering, density estimation, feature learning, dimensionality reduction, finding association rules, anomaly detection, etc.

Semi-supervised: Semi-supervised learning can be defined as a hybridization of the above-mentioned supervised and unsupervised methods, as it operates on both labeled and unlabeled data [ 41 , 105 ]. Thus, it falls between learning “without supervision” and learning “with supervision”. In the real world, labeled data could be rare in several contexts, and unlabeled data are numerous, where semi-supervised learning is useful [ 75 ]. The ultimate goal of a semi-supervised learning model is to provide a better outcome for prediction than that produced using the labeled data alone from the model. Some application areas where semi-supervised learning is used include machine translation, fraud detection, labeling data and text classification.

Reinforcement: Reinforcement learning is a type of machine learning algorithm that enables software agents and machines to automatically evaluate the optimal behavior in a particular context or environment to improve its efficiency [ 52 ], i.e., an environment-driven approach . This type of learning is based on reward or penalty, and its ultimate goal is to use insights obtained from environmental activists to take action to increase the reward or minimize the risk [ 75 ]. It is a powerful tool for training AI models that can help increase automation or optimize the operational efficiency of sophisticated systems such as robotics, autonomous driving tasks, manufacturing and supply chain logistics, however, not preferable to use it for solving the basic or straightforward problems.

Thus, to build effective models in various application areas different types of machine learning techniques can play a significant role according to their learning capabilities, depending on the nature of the data discussed earlier, and the target outcome. In Table 1 , we summarize various types of machine learning techniques with examples. In the following, we provide a comprehensive view of machine learning algorithms that can be applied to enhance the intelligence and capabilities of a data-driven application.

Machine Learning Tasks and Algorithms

In this section, we discuss various machine learning algorithms that include classification analysis, regression analysis, data clustering, association rule learning, feature engineering for dimensionality reduction, as well as deep learning methods. A general structure of a machine learning-based predictive model has been shown in Fig. 3 , where the model is trained from historical data in phase 1 and the outcome is generated in phase 2 for the new test data.

figure 3

A general structure of a machine learning based predictive model considering both the training and testing phase

Classification Analysis

Classification is regarded as a supervised learning method in machine learning, referring to a problem of predictive modeling as well, where a class label is predicted for a given example [ 41 ]. Mathematically, it maps a function ( f ) from input variables ( X ) to output variables ( Y ) as target, label or categories. To predict the class of given data points, it can be carried out on structured or unstructured data. For example, spam detection such as “spam” and “not spam” in email service providers can be a classification problem. In the following, we summarize the common classification problems.

Binary classification: It refers to the classification tasks having two class labels such as “true and false” or “yes and no” [ 41 ]. In such binary classification tasks, one class could be the normal state, while the abnormal state could be another class. For instance, “cancer not detected” is the normal state of a task that involves a medical test, and “cancer detected” could be considered as the abnormal state. Similarly, “spam” and “not spam” in the above example of email service providers are considered as binary classification.

Multiclass classification: Traditionally, this refers to those classification tasks having more than two class labels [ 41 ]. The multiclass classification does not have the principle of normal and abnormal outcomes, unlike binary classification tasks. Instead, within a range of specified classes, examples are classified as belonging to one. For example, it can be a multiclass classification task to classify various types of network attacks in the NSL-KDD [ 119 ] dataset, where the attack categories are classified into four class labels, such as DoS (Denial of Service Attack), U2R (User to Root Attack), R2L (Root to Local Attack), and Probing Attack.

Multi-label classification: In machine learning, multi-label classification is an important consideration where an example is associated with several classes or labels. Thus, it is a generalization of multiclass classification, where the classes involved in the problem are hierarchically structured, and each example may simultaneously belong to more than one class in each hierarchical level, e.g., multi-level text classification. For instance, Google news can be presented under the categories of a “city name”, “technology”, or “latest news”, etc. Multi-label classification includes advanced machine learning algorithms that support predicting various mutually non-exclusive classes or labels, unlike traditional classification tasks where class labels are mutually exclusive [ 82 ].

Many classification algorithms have been proposed in the machine learning and data science literature [ 41 , 125 ]. In the following, we summarize the most common and popular methods that are used widely in various application areas.

Naive Bayes (NB): The naive Bayes algorithm is based on the Bayes’ theorem with the assumption of independence between each pair of features [ 51 ]. It works well and can be used for both binary and multi-class categories in many real-world situations, such as document or text classification, spam filtering, etc. To effectively classify the noisy instances in the data and to construct a robust prediction model, the NB classifier can be used [ 94 ]. The key benefit is that, compared to more sophisticated approaches, it needs a small amount of training data to estimate the necessary parameters and quickly [ 82 ]. However, its performance may affect due to its strong assumptions on features independence. Gaussian, Multinomial, Complement, Bernoulli, and Categorical are the common variants of NB classifier [ 82 ].

Linear Discriminant Analysis (LDA): Linear Discriminant Analysis (LDA) is a linear decision boundary classifier created by fitting class conditional densities to data and applying Bayes’ rule [ 51 , 82 ]. This method is also known as a generalization of Fisher’s linear discriminant, which projects a given dataset into a lower-dimensional space, i.e., a reduction of dimensionality that minimizes the complexity of the model or reduces the resulting model’s computational costs. The standard LDA model usually suits each class with a Gaussian density, assuming that all classes share the same covariance matrix [ 82 ]. LDA is closely related to ANOVA (analysis of variance) and regression analysis, which seek to express one dependent variable as a linear combination of other features or measurements.

Logistic regression (LR): Another common probabilistic based statistical model used to solve classification issues in machine learning is Logistic Regression (LR) [ 64 ]. Logistic regression typically uses a logistic function to estimate the probabilities, which is also referred to as the mathematically defined sigmoid function in Eq. 1 . It can overfit high-dimensional datasets and works well when the dataset can be separated linearly. The regularization (L1 and L2) techniques [ 82 ] can be used to avoid over-fitting in such scenarios. The assumption of linearity between the dependent and independent variables is considered as a major drawback of Logistic Regression. It can be used for both classification and regression problems, but it is more commonly used for classification.

K-nearest neighbors (KNN): K-Nearest Neighbors (KNN) [ 9 ] is an “instance-based learning” or non-generalizing learning, also known as a “lazy learning” algorithm. It does not focus on constructing a general internal model; instead, it stores all instances corresponding to training data in n -dimensional space. KNN uses data and classifies new data points based on similarity measures (e.g., Euclidean distance function) [ 82 ]. Classification is computed from a simple majority vote of the k nearest neighbors of each point. It is quite robust to noisy training data, and accuracy depends on the data quality. The biggest issue with KNN is to choose the optimal number of neighbors to be considered. KNN can be used both for classification as well as regression.

Support vector machine (SVM): In machine learning, another common technique that can be used for classification, regression, or other tasks is a support vector machine (SVM) [ 56 ]. In high- or infinite-dimensional space, a support vector machine constructs a hyper-plane or set of hyper-planes. Intuitively, the hyper-plane, which has the greatest distance from the nearest training data points in any class, achieves a strong separation since, in general, the greater the margin, the lower the classifier’s generalization error. It is effective in high-dimensional spaces and can behave differently based on different mathematical functions known as the kernel. Linear, polynomial, radial basis function (RBF), sigmoid, etc., are the popular kernel functions used in SVM classifier [ 82 ]. However, when the data set contains more noise, such as overlapping target classes, SVM does not perform well.

Decision tree (DT): Decision tree (DT) [ 88 ] is a well-known non-parametric supervised learning method. DT learning methods are used for both the classification and regression tasks [ 82 ]. ID3 [ 87 ], C4.5 [ 88 ], and CART [ 20 ] are well known for DT algorithms. Moreover, recently proposed BehavDT [ 100 ], and IntrudTree [ 97 ] by Sarker et al. are effective in the relevant application domains, such as user behavior analytics and cybersecurity analytics, respectively. By sorting down the tree from the root to some leaf nodes, as shown in Fig. 4 , DT classifies the instances. Instances are classified by checking the attribute defined by that node, starting at the root node of the tree, and then moving down the tree branch corresponding to the attribute value. For splitting, the most popular criteria are “gini” for the Gini impurity and “entropy” for the information gain that can be expressed mathematically as [ 82 ].

figure 4

An example of a decision tree structure

figure 5

An example of a random forest structure considering multiple decision trees

Random forest (RF): A random forest classifier [ 19 ] is well known as an ensemble classification technique that is used in the field of machine learning and data science in various application areas. This method uses “parallel ensembling” which fits several decision tree classifiers in parallel, as shown in Fig. 5 , on different data set sub-samples and uses majority voting or averages for the outcome or final result. It thus minimizes the over-fitting problem and increases the prediction accuracy and control [ 82 ]. Therefore, the RF learning model with multiple decision trees is typically more accurate than a single decision tree based model [ 106 ]. To build a series of decision trees with controlled variation, it combines bootstrap aggregation (bagging) [ 18 ] and random feature selection [ 11 ]. It is adaptable to both classification and regression problems and fits well for both categorical and continuous values.

Adaptive Boosting (AdaBoost): Adaptive Boosting (AdaBoost) is an ensemble learning process that employs an iterative approach to improve poor classifiers by learning from their errors. This is developed by Yoav Freund et al. [ 35 ] and also known as “meta-learning”. Unlike the random forest that uses parallel ensembling, Adaboost uses “sequential ensembling”. It creates a powerful classifier by combining many poorly performing classifiers to obtain a good classifier of high accuracy. In that sense, AdaBoost is called an adaptive classifier by significantly improving the efficiency of the classifier, but in some instances, it can trigger overfits. AdaBoost is best used to boost the performance of decision trees, base estimator [ 82 ], on binary classification problems, however, is sensitive to noisy data and outliers.

Extreme gradient boosting (XGBoost): Gradient Boosting, like Random Forests [ 19 ] above, is an ensemble learning algorithm that generates a final model based on a series of individual models, typically decision trees. The gradient is used to minimize the loss function, similar to how neural networks [ 41 ] use gradient descent to optimize weights. Extreme Gradient Boosting (XGBoost) is a form of gradient boosting that takes more detailed approximations into account when determining the best model [ 82 ]. It computes second-order gradients of the loss function to minimize loss and advanced regularization (L1 and L2) [ 82 ], which reduces over-fitting, and improves model generalization and performance. XGBoost is fast to interpret and can handle large-sized datasets well.

Stochastic gradient descent (SGD): Stochastic gradient descent (SGD) [ 41 ] is an iterative method for optimizing an objective function with appropriate smoothness properties, where the word ‘stochastic’ refers to random probability. This reduces the computational burden, particularly in high-dimensional optimization problems, allowing for faster iterations in exchange for a lower convergence rate. A gradient is the slope of a function that calculates a variable’s degree of change in response to another variable’s changes. Mathematically, the Gradient Descent is a convex function whose output is a partial derivative of a set of its input parameters. Let, \(\alpha\) is the learning rate, and \(J_i\) is the training example cost of \(i \mathrm{th}\) , then Eq. ( 4 ) represents the stochastic gradient descent weight update method at the \(j^\mathrm{th}\) iteration. In large-scale and sparse machine learning, SGD has been successfully applied to problems often encountered in text classification and natural language processing [ 82 ]. However, SGD is sensitive to feature scaling and needs a range of hyperparameters, such as the regularization parameter and the number of iterations.

Rule-based classification : The term rule-based classification can be used to refer to any classification scheme that makes use of IF-THEN rules for class prediction. Several classification algorithms such as Zero-R [ 125 ], One-R [ 47 ], decision trees [ 87 , 88 ], DTNB [ 110 ], Ripple Down Rule learner (RIDOR) [ 125 ], Repeated Incremental Pruning to Produce Error Reduction (RIPPER) [ 126 ] exist with the ability of rule generation. The decision tree is one of the most common rule-based classification algorithms among these techniques because it has several advantages, such as being easier to interpret; the ability to handle high-dimensional data; simplicity and speed; good accuracy; and the capability to produce rules for human clear and understandable classification [ 127 ] [ 128 ]. The decision tree-based rules also provide significant accuracy in a prediction model for unseen test cases [ 106 ]. Since the rules are easily interpretable, these rule-based classifiers are often used to produce descriptive models that can describe a system including the entities and their relationships.

figure 6

Classification vs. regression. In classification the dotted line represents a linear boundary that separates the two classes; in regression, the dotted line models the linear relationship between the two variables

Regression Analysis

Regression analysis includes several methods of machine learning that allow to predict a continuous ( y ) result variable based on the value of one or more ( x ) predictor variables [ 41 ]. The most significant distinction between classification and regression is that classification predicts distinct class labels, while regression facilitates the prediction of a continuous quantity. Figure 6 shows an example of how classification is different with regression models. Some overlaps are often found between the two types of machine learning algorithms. Regression models are now widely used in a variety of fields, including financial forecasting or prediction, cost estimation, trend analysis, marketing, time series estimation, drug response modeling, and many more. Some of the familiar types of regression algorithms are linear, polynomial, lasso and ridge regression, etc., which are explained briefly in the following.

Simple and multiple linear regression: This is one of the most popular ML modeling techniques as well as a well-known regression technique. In this technique, the dependent variable is continuous, the independent variable(s) can be continuous or discrete, and the form of the regression line is linear. Linear regression creates a relationship between the dependent variable ( Y ) and one or more independent variables ( X ) (also known as regression line) using the best fit straight line [ 41 ]. It is defined by the following equations:

where a is the intercept, b is the slope of the line, and e is the error term. This equation can be used to predict the value of the target variable based on the given predictor variable(s). Multiple linear regression is an extension of simple linear regression that allows two or more predictor variables to model a response variable, y, as a linear function [ 41 ] defined in Eq. 6 , whereas simple linear regression has only 1 independent variable, defined in Eq. 5 .

Polynomial regression: Polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is not linear, but is the polynomial degree of \(n^\mathrm{th}\) in x [ 82 ]. The equation for polynomial regression is also derived from linear regression (polynomial regression of degree 1) equation, which is defined as below:

Here, y is the predicted/target output, \(b_0, b_1,... b_n\) are the regression coefficients, x is an independent/ input variable. In simple words, we can say that if data are not distributed linearly, instead it is \(n^\mathrm{th}\) degree of polynomial then we use polynomial regression to get desired output.

LASSO and ridge regression: LASSO and Ridge regression are well known as powerful techniques which are typically used for building learning models in presence of a large number of features, due to their capability to preventing over-fitting and reducing the complexity of the model. The LASSO (least absolute shrinkage and selection operator) regression model uses L 1 regularization technique [ 82 ] that uses shrinkage, which penalizes “absolute value of magnitude of coefficients” ( L 1 penalty). As a result, LASSO appears to render coefficients to absolute zero. Thus, LASSO regression aims to find the subset of predictors that minimizes the prediction error for a quantitative response variable. On the other hand, ridge regression uses L 2 regularization [ 82 ], which is the “squared magnitude of coefficients” ( L 2 penalty). Thus, ridge regression forces the weights to be small but never sets the coefficient value to zero, and does a non-sparse solution. Overall, LASSO regression is useful to obtain a subset of predictors by eliminating less important features, and ridge regression is useful when a data set has “multicollinearity” which refers to the predictors that are correlated with other predictors.

Cluster Analysis

Cluster analysis, also known as clustering, is an unsupervised machine learning technique for identifying and grouping related data points in large datasets without concern for the specific outcome. It does grouping a collection of objects in such a way that objects in the same category, called a cluster, are in some sense more similar to each other than objects in other groups [ 41 ]. It is often used as a data analysis technique to discover interesting trends or patterns in data, e.g., groups of consumers based on their behavior. In a broad range of application areas, such as cybersecurity, e-commerce, mobile data processing, health analytics, user modeling and behavioral analytics, clustering can be used. In the following, we briefly discuss and summarize various types of clustering methods.

Partitioning methods: Based on the features and similarities in the data, this clustering approach categorizes the data into multiple groups or clusters. The data scientists or analysts typically determine the number of clusters either dynamically or statically depending on the nature of the target applications, to produce for the methods of clustering. The most common clustering algorithms based on partitioning methods are K-means [ 69 ], K-Mediods [ 80 ], CLARA [ 55 ] etc.

Density-based methods: To identify distinct groups or clusters, it uses the concept that a cluster in the data space is a contiguous region of high point density isolated from other such clusters by contiguous regions of low point density. Points that are not part of a cluster are considered as noise. The typical clustering algorithms based on density are DBSCAN [ 32 ], OPTICS [ 12 ] etc. The density-based methods typically struggle with clusters of similar density and high dimensionality data.

Hierarchical-based methods: Hierarchical clustering typically seeks to construct a hierarchy of clusters, i.e., the tree structure. Strategies for hierarchical clustering generally fall into two types: (i) Agglomerative—a “bottom-up” approach in which each observation begins in its cluster and pairs of clusters are combined as one, moves up the hierarchy, and (ii) Divisive—a “top-down” approach in which all observations begin in one cluster and splits are performed recursively, moves down the hierarchy, as shown in Fig 7 . Our earlier proposed BOTS technique, Sarker et al. [ 102 ] is an example of a hierarchical, particularly, bottom-up clustering algorithm.

Grid-based methods: To deal with massive datasets, grid-based clustering is especially suitable. To obtain clusters, the principle is first to summarize the dataset with a grid representation and then to combine grid cells. STING [ 122 ], CLIQUE [ 6 ], etc. are the standard algorithms of grid-based clustering.

Model-based methods: There are mainly two types of model-based clustering algorithms: one that uses statistical learning, and the other based on a method of neural network learning [ 130 ]. For instance, GMM [ 89 ] is an example of a statistical learning method, and SOM [ 22 ] [ 96 ] is an example of a neural network learning method.

Constraint-based methods: Constrained-based clustering is a semi-supervised approach to data clustering that uses constraints to incorporate domain knowledge. Application or user-oriented constraints are incorporated to perform the clustering. The typical algorithms of this kind of clustering are COP K-means [ 121 ], CMWK-Means [ 27 ], etc.

figure 7

A graphical interpretation of the widely-used hierarchical clustering (Bottom-up and top-down) technique

Many clustering algorithms have been proposed with the ability to grouping data in machine learning and data science literature [ 41 , 125 ]. In the following, we summarize the popular methods that are used widely in various application areas.

K-means clustering: K-means clustering [ 69 ] is a fast, robust, and simple algorithm that provides reliable results when data sets are well-separated from each other. The data points are allocated to a cluster in this algorithm in such a way that the amount of the squared distance between the data points and the centroid is as small as possible. In other words, the K-means algorithm identifies the k number of centroids and then assigns each data point to the nearest cluster while keeping the centroids as small as possible. Since it begins with a random selection of cluster centers, the results can be inconsistent. Since extreme values can easily affect a mean, the K-means clustering algorithm is sensitive to outliers. K-medoids clustering [ 91 ] is a variant of K-means that is more robust to noises and outliers.

Mean-shift clustering: Mean-shift clustering [ 37 ] is a nonparametric clustering technique that does not require prior knowledge of the number of clusters or constraints on cluster shape. Mean-shift clustering aims to discover “blobs” in a smooth distribution or density of samples [ 82 ]. It is a centroid-based algorithm that works by updating centroid candidates to be the mean of the points in a given region. To form the final set of centroids, these candidates are filtered in a post-processing stage to remove near-duplicates. Cluster analysis in computer vision and image processing are examples of application domains. Mean Shift has the disadvantage of being computationally expensive. Moreover, in cases of high dimension, where the number of clusters shifts abruptly, the mean-shift algorithm does not work well.

DBSCAN: Density-based spatial clustering of applications with noise (DBSCAN) [ 32 ] is a base algorithm for density-based clustering which is widely used in data mining and machine learning. This is known as a non-parametric density-based clustering technique for separating high-density clusters from low-density clusters that are used in model building. DBSCAN’s main idea is that a point belongs to a cluster if it is close to many points from that cluster. It can find clusters of various shapes and sizes in a vast volume of data that is noisy and contains outliers. DBSCAN, unlike k-means, does not require a priori specification of the number of clusters in the data and can find arbitrarily shaped clusters. Although k-means is much faster than DBSCAN, it is efficient at finding high-density regions and outliers, i.e., is robust to outliers.

GMM clustering: Gaussian mixture models (GMMs) are often used for data clustering, which is a distribution-based clustering algorithm. A Gaussian mixture model is a probabilistic model in which all the data points are produced by a mixture of a finite number of Gaussian distributions with unknown parameters [ 82 ]. To find the Gaussian parameters for each cluster, an optimization algorithm called expectation-maximization (EM) [ 82 ] can be used. EM is an iterative method that uses a statistical model to estimate the parameters. In contrast to k-means, Gaussian mixture models account for uncertainty and return the likelihood that a data point belongs to one of the k clusters. GMM clustering is more robust than k-means and works well even with non-linear data distributions.

Agglomerative hierarchical clustering: The most common method of hierarchical clustering used to group objects in clusters based on their similarity is agglomerative clustering. This technique uses a bottom-up approach, where each object is first treated as a singleton cluster by the algorithm. Following that, pairs of clusters are merged one by one until all clusters have been merged into a single large cluster containing all objects. The result is a dendrogram, which is a tree-based representation of the elements. Single linkage [ 115 ], Complete linkage [ 116 ], BOTS [ 102 ] etc. are some examples of such techniques. The main advantage of agglomerative hierarchical clustering over k-means is that the tree-structure hierarchy generated by agglomerative clustering is more informative than the unstructured collection of flat clusters returned by k-means, which can help to make better decisions in the relevant application areas.

Dimensionality Reduction and Feature Learning

In machine learning and data science, high-dimensional data processing is a challenging task for both researchers and application developers. Thus, dimensionality reduction which is an unsupervised learning technique, is important because it leads to better human interpretations, lower computational costs, and avoids overfitting and redundancy by simplifying models. Both the process of feature selection and feature extraction can be used for dimensionality reduction. The primary distinction between the selection and extraction of features is that the “feature selection” keeps a subset of the original features [ 97 ], while “feature extraction” creates brand new ones [ 98 ]. In the following, we briefly discuss these techniques.

Feature selection: The selection of features, also known as the selection of variables or attributes in the data, is the process of choosing a subset of unique features (variables, predictors) to use in building machine learning and data science model. It decreases a model’s complexity by eliminating the irrelevant or less important features and allows for faster training of machine learning algorithms. A right and optimal subset of the selected features in a problem domain is capable to minimize the overfitting problem through simplifying and generalizing the model as well as increases the model’s accuracy [ 97 ]. Thus, “feature selection” [ 66 , 99 ] is considered as one of the primary concepts in machine learning that greatly affects the effectiveness and efficiency of the target machine learning model. Chi-squared test, Analysis of variance (ANOVA) test, Pearson’s correlation coefficient, recursive feature elimination, are some popular techniques that can be used for feature selection.

Feature extraction: In a machine learning-based model or system, feature extraction techniques usually provide a better understanding of the data, a way to improve prediction accuracy, and to reduce computational cost or training time. The aim of “feature extraction” [ 66 , 99 ] is to reduce the number of features in a dataset by generating new ones from the existing ones and then discarding the original features. The majority of the information found in the original set of features can then be summarized using this new reduced set of features. For instance, principal components analysis (PCA) is often used as a dimensionality-reduction technique to extract a lower-dimensional space creating new brand components from the existing features in a dataset [ 98 ].

Many algorithms have been proposed to reduce data dimensions in the machine learning and data science literature [ 41 , 125 ]. In the following, we summarize the popular methods that are used widely in various application areas.

Variance threshold: A simple basic approach to feature selection is the variance threshold [ 82 ]. This excludes all features of low variance, i.e., all features whose variance does not exceed the threshold. It eliminates all zero-variance characteristics by default, i.e., characteristics that have the same value in all samples. This feature selection algorithm looks only at the ( X ) features, not the ( y ) outputs needed, and can, therefore, be used for unsupervised learning.

Pearson correlation: Pearson’s correlation is another method to understand a feature’s relation to the response variable and can be used for feature selection [ 99 ]. This method is also used for finding the association between the features in a dataset. The resulting value is \([-1, 1]\) , where \(-1\) means perfect negative correlation, \(+1\) means perfect positive correlation, and 0 means that the two variables do not have a linear correlation. If two random variables represent X and Y , then the correlation coefficient between X and Y is defined as [ 41 ]

ANOVA: Analysis of variance (ANOVA) is a statistical tool used to verify the mean values of two or more groups that differ significantly from each other. ANOVA assumes a linear relationship between the variables and the target and the variables’ normal distribution. To statistically test the equality of means, the ANOVA method utilizes F tests. For feature selection, the results ‘ANOVA F value’ [ 82 ] of this test can be used where certain features independent of the goal variable can be omitted.

Chi square: The chi-square \({\chi }^2\) [ 82 ] statistic is an estimate of the difference between the effects of a series of events or variables observed and expected frequencies. The magnitude of the difference between the real and observed values, the degrees of freedom, and the sample size depends on \({\chi }^2\) . The chi-square \({\chi }^2\) is commonly used for testing relationships between categorical variables. If \(O_i\) represents observed value and \(E_i\) represents expected value, then

Recursive feature elimination (RFE): Recursive Feature Elimination (RFE) is a brute force approach to feature selection. RFE [ 82 ] fits the model and removes the weakest feature before it meets the specified number of features. Features are ranked by the coefficients or feature significance of the model. RFE aims to remove dependencies and collinearity in the model by recursively removing a small number of features per iteration.

Model-based selection: To reduce the dimensionality of the data, linear models penalized with the L 1 regularization can be used. Least absolute shrinkage and selection operator (Lasso) regression is a type of linear regression that has the property of shrinking some of the coefficients to zero [ 82 ]. Therefore, that feature can be removed from the model. Thus, the penalized lasso regression method, often used in machine learning to select the subset of variables. Extra Trees Classifier [ 82 ] is an example of a tree-based estimator that can be used to compute impurity-based function importance, which can then be used to discard irrelevant features.

Principal component analysis (PCA): Principal component analysis (PCA) is a well-known unsupervised learning approach in the field of machine learning and data science. PCA is a mathematical technique that transforms a set of correlated variables into a set of uncorrelated variables known as principal components [ 48 , 81 ]. Figure 8 shows an example of the effect of PCA on various dimensions space, where Fig. 8 a shows the original features in 3D space, and Fig. 8 b shows the created principal components PC1 and PC2 onto a 2D plane, and 1D line with the principal component PC1 respectively. Thus, PCA can be used as a feature extraction technique that reduces the dimensionality of the datasets, and to build an effective machine learning model [ 98 ]. Technically, PCA identifies the completely transformed with the highest eigenvalues of a covariance matrix and then uses those to project the data into a new subspace of equal or fewer dimensions [ 82 ].

figure 8

An example of a principal component analysis (PCA) and created principal components PC1 and PC2 in different dimension space

Association Rule Learning

Association rule learning is a rule-based machine learning approach to discover interesting relationships, “IF-THEN” statements, in large datasets between variables [ 7 ]. One example is that “if a customer buys a computer or laptop (an item), s/he is likely to also buy anti-virus software (another item) at the same time”. Association rules are employed today in many application areas, including IoT services, medical diagnosis, usage behavior analytics, web usage mining, smartphone applications, cybersecurity applications, and bioinformatics. In comparison to sequence mining, association rule learning does not usually take into account the order of things within or across transactions. A common way of measuring the usefulness of association rules is to use its parameter, the ‘support’ and ‘confidence’, which is introduced in [ 7 ].

In the data mining literature, many association rule learning methods have been proposed, such as logic dependent [ 34 ], frequent pattern based [ 8 , 49 , 68 ], and tree-based [ 42 ]. The most popular association rule learning algorithms are summarized below.

AIS and SETM: AIS is the first algorithm proposed by Agrawal et al. [ 7 ] for association rule mining. The AIS algorithm’s main downside is that too many candidate itemsets are generated, requiring more space and wasting a lot of effort. This algorithm calls for too many passes over the entire dataset to produce the rules. Another approach SETM [ 49 ] exhibits good performance and stable behavior with execution time; however, it suffers from the same flaw as the AIS algorithm.

Apriori: For generating association rules for a given dataset, Agrawal et al. [ 8 ] proposed the Apriori, Apriori-TID, and Apriori-Hybrid algorithms. These later algorithms outperform the AIS and SETM mentioned above due to the Apriori property of frequent itemset [ 8 ]. The term ‘Apriori’ usually refers to having prior knowledge of frequent itemset properties. Apriori uses a “bottom-up” approach, where it generates the candidate itemsets. To reduce the search space, Apriori uses the property “all subsets of a frequent itemset must be frequent; and if an itemset is infrequent, then all its supersets must also be infrequent”. Another approach predictive Apriori [ 108 ] can also generate rules; however, it receives unexpected results as it combines both the support and confidence. The Apriori [ 8 ] is the widely applicable techniques in mining association rules.

ECLAT: This technique was proposed by Zaki et al. [ 131 ] and stands for Equivalence Class Clustering and bottom-up Lattice Traversal. ECLAT uses a depth-first search to find frequent itemsets. In contrast to the Apriori [ 8 ] algorithm, which represents data in a horizontal pattern, it represents data vertically. Hence, the ECLAT algorithm is more efficient and scalable in the area of association rule learning. This algorithm is better suited for small and medium datasets whereas the Apriori algorithm is used for large datasets.

FP-Growth: Another common association rule learning technique based on the frequent-pattern tree (FP-tree) proposed by Han et al. [ 42 ] is Frequent Pattern Growth, known as FP-Growth. The key difference with Apriori is that while generating rules, the Apriori algorithm [ 8 ] generates frequent candidate itemsets; on the other hand, the FP-growth algorithm [ 42 ] prevents candidate generation and thus produces a tree by the successful strategy of ‘divide and conquer’ approach. Due to its sophistication, however, FP-Tree is challenging to use in an interactive mining environment [ 133 ]. Thus, the FP-Tree would not fit into memory for massive data sets, making it challenging to process big data as well. Another solution is RARM (Rapid Association Rule Mining) proposed by Das et al. [ 26 ] but faces a related FP-tree issue [ 133 ].

ABC-RuleMiner: A rule-based machine learning method, recently proposed in our earlier paper, by Sarker et al. [ 104 ], to discover the interesting non-redundant rules to provide real-world intelligent services. This algorithm effectively identifies the redundancy in associations by taking into account the impact or precedence of the related contextual features and discovers a set of non-redundant association rules. This algorithm first constructs an association generation tree (AGT), a top-down approach, and then extracts the association rules through traversing the tree. Thus, ABC-RuleMiner is more potent than traditional rule-based methods in terms of both non-redundant rule generation and intelligent decision-making, particularly in a context-aware smart computing environment, where human or user preferences are involved.

Among the association rule learning techniques discussed above, Apriori [ 8 ] is the most widely used algorithm for discovering association rules from a given dataset [ 133 ]. The main strength of the association learning technique is its comprehensiveness, as it generates all associations that satisfy the user-specified constraints, such as minimum support and confidence value. The ABC-RuleMiner approach [ 104 ] discussed earlier could give significant results in terms of non-redundant rule generation and intelligent decision-making for the relevant application areas in the real world.

Reinforcement Learning

Reinforcement learning (RL) is a machine learning technique that allows an agent to learn by trial and error in an interactive environment using input from its actions and experiences. Unlike supervised learning, which is based on given sample data or examples, the RL method is based on interacting with the environment. The problem to be solved in reinforcement learning (RL) is defined as a Markov Decision Process (MDP) [ 86 ], i.e., all about sequentially making decisions. An RL problem typically includes four elements such as Agent, Environment, Rewards, and Policy.

RL can be split roughly into Model-based and Model-free techniques. Model-based RL is the process of inferring optimal behavior from a model of the environment by performing actions and observing the results, which include the next state and the immediate reward [ 85 ]. AlphaZero, AlphaGo [ 113 ] are examples of the model-based approaches. On the other hand, a model-free approach does not use the distribution of the transition probability and the reward function associated with MDP. Q-learning, Deep Q Network, Monte Carlo Control, SARSA (State–Action–Reward–State–Action), etc. are some examples of model-free algorithms [ 52 ]. The policy network, which is required for model-based RL but not for model-free, is the key difference between model-free and model-based learning. In the following, we discuss the popular RL algorithms.

Monte Carlo methods: Monte Carlo techniques, or Monte Carlo experiments, are a wide category of computational algorithms that rely on repeated random sampling to obtain numerical results [ 52 ]. The underlying concept is to use randomness to solve problems that are deterministic in principle. Optimization, numerical integration, and making drawings from the probability distribution are the three problem classes where Monte Carlo techniques are most commonly used.

Q-learning: Q-learning is a model-free reinforcement learning algorithm for learning the quality of behaviors that tell an agent what action to take under what conditions [ 52 ]. It does not need a model of the environment (hence the term “model-free”), and it can deal with stochastic transitions and rewards without the need for adaptations. The ‘Q’ in Q-learning usually stands for quality, as the algorithm calculates the maximum expected rewards for a given behavior in a given state.

Deep Q-learning: The basic working step in Deep Q-Learning [ 52 ] is that the initial state is fed into the neural network, which returns the Q-value of all possible actions as an output. Still, when we have a reasonably simple setting to overcome, Q-learning works well. However, when the number of states and actions becomes more complicated, deep learning can be used as a function approximator.

Reinforcement learning, along with supervised and unsupervised learning, is one of the basic machine learning paradigms. RL can be used to solve numerous real-world problems in various fields, such as game theory, control theory, operations analysis, information theory, simulation-based optimization, manufacturing, supply chain logistics, multi-agent systems, swarm intelligence, aircraft control, robot motion control, and many more.

Artificial Neural Network and Deep Learning

Deep learning is part of a wider family of artificial neural networks (ANN)-based machine learning approaches with representation learning. Deep learning provides a computational architecture by combining several processing layers, such as input, hidden, and output layers, to learn from data [ 41 ]. The main advantage of deep learning over traditional machine learning methods is its better performance in several cases, particularly learning from large datasets [ 105 , 129 ]. Figure 9 shows a general performance of deep learning over machine learning considering the increasing amount of data. However, it may vary depending on the data characteristics and experimental set up.

figure 9

Machine learning and deep learning performance in general with the amount of data

The most common deep learning algorithms are: Multi-layer Perceptron (MLP), Convolutional Neural Network (CNN, or ConvNet), Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) [ 96 ]. In the following, we discuss various types of deep learning methods that can be used to build effective data-driven models for various purposes.

figure 10

A structure of an artificial neural network modeling with multiple processing layers

MLP: The base architecture of deep learning, which is also known as the feed-forward artificial neural network, is called a multilayer perceptron (MLP) [ 82 ]. A typical MLP is a fully connected network consisting of an input layer, one or more hidden layers, and an output layer, as shown in Fig. 10 . Each node in one layer connects to each node in the following layer at a certain weight. MLP utilizes the “Backpropagation” technique [ 41 ], the most “fundamental building block” in a neural network, to adjust the weight values internally while building the model. MLP is sensitive to scaling features and allows a variety of hyperparameters to be tuned, such as the number of hidden layers, neurons, and iterations, which can result in a computationally costly model.

CNN or ConvNet: The convolution neural network (CNN) [ 65 ] enhances the design of the standard ANN, consisting of convolutional layers, pooling layers, as well as fully connected layers, as shown in Fig. 11 . As it takes the advantage of the two-dimensional (2D) structure of the input data, it is typically broadly used in several areas such as image and video recognition, image processing and classification, medical image analysis, natural language processing, etc. While CNN has a greater computational burden, without any manual intervention, it has the advantage of automatically detecting the important features, and hence CNN is considered to be more powerful than conventional ANN. A number of advanced deep learning models based on CNN can be used in the field, such as AlexNet [ 60 ], Xception [ 24 ], Inception [ 118 ], Visual Geometry Group (VGG) [ 44 ], ResNet [ 45 ], etc.

LSTM-RNN: Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture used in the area of deep learning [ 38 ]. LSTM has feedback links, unlike normal feed-forward neural networks. LSTM networks are well-suited for analyzing and learning sequential data, such as classifying, processing, and predicting data based on time series data, which differentiates it from other conventional networks. Thus, LSTM can be used when the data are in a sequential format, such as time, sentence, etc., and commonly applied in the area of time-series analysis, natural language processing, speech recognition, etc.

figure 11

An example of a convolutional neural network (CNN or ConvNet) including multiple convolution and pooling layers

In addition to these most common deep learning methods discussed above, several other deep learning approaches [ 96 ] exist in the area for various purposes. For instance, the self-organizing map (SOM) [ 58 ] uses unsupervised learning to represent the high-dimensional data by a 2D grid map, thus achieving dimensionality reduction. The autoencoder (AE) [ 15 ] is another learning technique that is widely used for dimensionality reduction as well and feature extraction in unsupervised learning tasks. Restricted Boltzmann machines (RBM) [ 46 ] can be used for dimensionality reduction, classification, regression, collaborative filtering, feature learning, and topic modeling. A deep belief network (DBN) is typically composed of simple, unsupervised networks such as restricted Boltzmann machines (RBMs) or autoencoders, and a backpropagation neural network (BPNN) [ 123 ]. A generative adversarial network (GAN) [ 39 ] is a form of the network for deep learning that can generate data with characteristics close to the actual data input. Transfer learning is currently very common because it can train deep neural networks with comparatively low data, which is typically the re-use of a new problem with a pre-trained model [ 124 ]. A brief discussion of these artificial neural networks (ANN) and deep learning (DL) models are summarized in our earlier paper Sarker et al. [ 96 ].

Overall, based on the learning techniques discussed above, we can conclude that various types of machine learning techniques, such as classification analysis, regression, data clustering, feature selection and extraction, and dimensionality reduction, association rule learning, reinforcement learning, or deep learning techniques, can play a significant role for various purposes according to their capabilities. In the following section, we discuss several application areas based on machine learning algorithms.

Applications of Machine Learning

In the current age of the Fourth Industrial Revolution (4IR), machine learning becomes popular in various application areas, because of its learning capabilities from the past and making intelligent decisions. In the following, we summarize and discuss ten popular application areas of machine learning technology.

Predictive analytics and intelligent decision-making: A major application field of machine learning is intelligent decision-making by data-driven predictive analytics [ 21 , 70 ]. The basis of predictive analytics is capturing and exploiting relationships between explanatory variables and predicted variables from previous events to predict the unknown outcome [ 41 ]. For instance, identifying suspects or criminals after a crime has been committed, or detecting credit card fraud as it happens. Another application, where machine learning algorithms can assist retailers in better understanding consumer preferences and behavior, better manage inventory, avoiding out-of-stock situations, and optimizing logistics and warehousing in e-commerce. Various machine learning algorithms such as decision trees, support vector machines, artificial neural networks, etc. [ 106 , 125 ] are commonly used in the area. Since accurate predictions provide insight into the unknown, they can improve the decisions of industries, businesses, and almost any organization, including government agencies, e-commerce, telecommunications, banking and financial services, healthcare, sales and marketing, transportation, social networking, and many others.

Cybersecurity and threat intelligence: Cybersecurity is one of the most essential areas of Industry 4.0. [ 114 ], which is typically the practice of protecting networks, systems, hardware, and data from digital attacks [ 114 ]. Machine learning has become a crucial cybersecurity technology that constantly learns by analyzing data to identify patterns, better detect malware in encrypted traffic, find insider threats, predict where bad neighborhoods are online, keep people safe while browsing, or secure data in the cloud by uncovering suspicious activity. For instance, clustering techniques can be used to identify cyber-anomalies, policy violations, etc. To detect various types of cyber-attacks or intrusions machine learning classification models by taking into account the impact of security features are useful [ 97 ]. Various deep learning-based security models can also be used on the large scale of security datasets [ 96 , 129 ]. Moreover, security policy rules generated by association rule learning techniques can play a significant role to build a rule-based security system [ 105 ]. Thus, we can say that various learning techniques discussed in Sect. Machine Learning Tasks and Algorithms , can enable cybersecurity professionals to be more proactive inefficiently preventing threats and cyber-attacks.

Internet of things (IoT) and smart cities: Internet of Things (IoT) is another essential area of Industry 4.0. [ 114 ], which turns everyday objects into smart objects by allowing them to transmit data and automate tasks without the need for human interaction. IoT is, therefore, considered to be the big frontier that can enhance almost all activities in our lives, such as smart governance, smart home, education, communication, transportation, retail, agriculture, health care, business, and many more [ 70 ]. Smart city is one of IoT’s core fields of application, using technologies to enhance city services and residents’ living experiences [ 132 , 135 ]. As machine learning utilizes experience to recognize trends and create models that help predict future behavior and events, it has become a crucial technology for IoT applications [ 103 ]. For example, to predict traffic in smart cities, parking availability prediction, estimate the total usage of energy of the citizens for a particular period, make context-aware and timely decisions for the people, etc. are some tasks that can be solved using machine learning techniques according to the current needs of the people.

Traffic prediction and transportation: Transportation systems have become a crucial component of every country’s economic development. Nonetheless, several cities around the world are experiencing an excessive rise in traffic volume, resulting in serious issues such as delays, traffic congestion, higher fuel prices, increased CO \(_2\) pollution, accidents, emergencies, and a decline in modern society’s quality of life [ 40 ]. Thus, an intelligent transportation system through predicting future traffic is important, which is an indispensable part of a smart city. Accurate traffic prediction based on machine and deep learning modeling can help to minimize the issues [ 17 , 30 , 31 ]. For example, based on the travel history and trend of traveling through various routes, machine learning can assist transportation companies in predicting possible issues that may occur on specific routes and recommending their customers to take a different path. Ultimately, these learning-based data-driven models help improve traffic flow, increase the usage and efficiency of sustainable modes of transportation, and limit real-world disruption by modeling and visualizing future changes.

Healthcare and COVID-19 pandemic: Machine learning can help to solve diagnostic and prognostic problems in a variety of medical domains, such as disease prediction, medical knowledge extraction, detecting regularities in data, patient management, etc. [ 33 , 77 , 112 ]. Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus, according to the World Health Organization (WHO) [ 3 ]. Recently, the learning techniques have become popular in the battle against COVID-19 [ 61 , 63 ]. For the COVID-19 pandemic, the learning techniques are used to classify patients at high risk, their mortality rate, and other anomalies [ 61 ]. It can also be used to better understand the virus’s origin, COVID-19 outbreak prediction, as well as for disease diagnosis and treatment [ 14 , 50 ]. With the help of machine learning, researchers can forecast where and when, the COVID-19 is likely to spread, and notify those regions to match the required arrangements. Deep learning also provides exciting solutions to the problems of medical image processing and is seen as a crucial technique for potential applications, particularly for COVID-19 pandemic [ 10 , 78 , 111 ]. Overall, machine and deep learning techniques can help to fight the COVID-19 virus and the pandemic as well as intelligent clinical decisions making in the domain of healthcare.

E-commerce and product recommendations: Product recommendation is one of the most well known and widely used applications of machine learning, and it is one of the most prominent features of almost any e-commerce website today. Machine learning technology can assist businesses in analyzing their consumers’ purchasing histories and making customized product suggestions for their next purchase based on their behavior and preferences. E-commerce companies, for example, can easily position product suggestions and offers by analyzing browsing trends and click-through rates of specific items. Using predictive modeling based on machine learning techniques, many online retailers, such as Amazon [ 71 ], can better manage inventory, prevent out-of-stock situations, and optimize logistics and warehousing. The future of sales and marketing is the ability to capture, evaluate, and use consumer data to provide a customized shopping experience. Furthermore, machine learning techniques enable companies to create packages and content that are tailored to the needs of their customers, allowing them to maintain existing customers while attracting new ones.

NLP and sentiment analysis: Natural language processing (NLP) involves the reading and understanding of spoken or written language through the medium of a computer [ 79 , 103 ]. Thus, NLP helps computers, for instance, to read a text, hear speech, interpret it, analyze sentiment, and decide which aspects are significant, where machine learning techniques can be used. Virtual personal assistant, chatbot, speech recognition, document description, language or machine translation, etc. are some examples of NLP-related tasks. Sentiment Analysis [ 90 ] (also referred to as opinion mining or emotion AI) is an NLP sub-field that seeks to identify and extract public mood and views within a given text through blogs, reviews, social media, forums, news, etc. For instance, businesses and brands use sentiment analysis to understand the social sentiment of their brand, product, or service through social media platforms or the web as a whole. Overall, sentiment analysis is considered as a machine learning task that analyzes texts for polarity, such as “positive”, “negative”, or “neutral” along with more intense emotions like very happy, happy, sad, very sad, angry, have interest, or not interested etc.

Image, speech and pattern recognition: Image recognition [ 36 ] is a well-known and widespread example of machine learning in the real world, which can identify an object as a digital image. For instance, to label an x-ray as cancerous or not, character recognition, or face detection in an image, tagging suggestions on social media, e.g., Facebook, are common examples of image recognition. Speech recognition [ 23 ] is also very popular that typically uses sound and linguistic models, e.g., Google Assistant, Cortana, Siri, Alexa, etc. [ 67 ], where machine learning methods are used. Pattern recognition [ 13 ] is defined as the automated recognition of patterns and regularities in data, e.g., image analysis. Several machine learning techniques such as classification, feature selection, clustering, or sequence labeling methods are used in the area.

Sustainable agriculture: Agriculture is essential to the survival of all human activities [ 109 ]. Sustainable agriculture practices help to improve agricultural productivity while also reducing negative impacts on the environment [ 5 , 25 , 109 ]. The sustainable agriculture supply chains are knowledge-intensive and based on information, skills, technologies, etc., where knowledge transfer encourages farmers to enhance their decisions to adopt sustainable agriculture practices utilizing the increasing amount of data captured by emerging technologies, e.g., the Internet of Things (IoT), mobile technologies and devices, etc. [ 5 , 53 , 54 ]. Machine learning can be applied in various phases of sustainable agriculture, such as in the pre-production phase - for the prediction of crop yield, soil properties, irrigation requirements, etc.; in the production phase—for weather prediction, disease detection, weed detection, soil nutrient management, livestock management, etc.; in processing phase—for demand estimation, production planning, etc. and in the distribution phase - the inventory management, consumer analysis, etc.

User behavior analytics and context-aware smartphone applications: Context-awareness is a system’s ability to capture knowledge about its surroundings at any moment and modify behaviors accordingly [ 28 , 93 ]. Context-aware computing uses software and hardware to automatically collect and interpret data for direct responses. The mobile app development environment has been changed greatly with the power of AI, particularly, machine learning techniques through their learning capabilities from contextual data [ 103 , 136 ]. Thus, the developers of mobile apps can rely on machine learning to create smart apps that can understand human behavior, support, and entertain users [ 107 , 137 , 140 ]. To build various personalized data-driven context-aware systems, such as smart interruption management, smart mobile recommendation, context-aware smart searching, decision-making that intelligently assist end mobile phone users in a pervasive computing environment, machine learning techniques are applicable. For example, context-aware association rules can be used to build an intelligent phone call application [ 104 ]. Clustering approaches are useful in capturing users’ diverse behavioral activities by taking into account data in time series [ 102 ]. To predict the future events in various contexts, the classification methods can be used [ 106 , 139 ]. Thus, various learning techniques discussed in Sect. “ Machine Learning Tasks and Algorithms ” can help to build context-aware adaptive and smart applications according to the preferences of the mobile phone users.

In addition to these application areas, machine learning-based models can also apply to several other domains such as bioinformatics, cheminformatics, computer networks, DNA sequence classification, economics and banking, robotics, advanced engineering, and many more.

Challenges and Research Directions

Our study on machine learning algorithms for intelligent data analysis and applications opens several research issues in the area. Thus, in this section, we summarize and discuss the challenges faced and the potential research opportunities and future directions.

In general, the effectiveness and the efficiency of a machine learning-based solution depend on the nature and characteristics of the data, and the performance of the learning algorithms. To collect the data in the relevant domain, such as cybersecurity, IoT, healthcare and agriculture discussed in Sect. “ Applications of Machine Learning ” is not straightforward, although the current cyberspace enables the production of a huge amount of data with very high frequency. Thus, collecting useful data for the target machine learning-based applications, e.g., smart city applications, and their management is important to further analysis. Therefore, a more in-depth investigation of data collection methods is needed while working on the real-world data. Moreover, the historical data may contain many ambiguous values, missing values, outliers, and meaningless data. The machine learning algorithms, discussed in Sect “ Machine Learning Tasks and Algorithms ” highly impact on data quality, and availability for training, and consequently on the resultant model. Thus, to accurately clean and pre-process the diverse data collected from diverse sources is a challenging task. Therefore, effectively modifying or enhance existing pre-processing methods, or proposing new data preparation techniques are required to effectively use the learning algorithms in the associated application domain.

To analyze the data and extract insights, there exist many machine learning algorithms, summarized in Sect. “ Machine Learning Tasks and Algorithms ”. Thus, selecting a proper learning algorithm that is suitable for the target application is challenging. The reason is that the outcome of different learning algorithms may vary depending on the data characteristics [ 106 ]. Selecting a wrong learning algorithm would result in producing unexpected outcomes that may lead to loss of effort, as well as the model’s effectiveness and accuracy. In terms of model building, the techniques discussed in Sect. “ Machine Learning Tasks and Algorithms ” can directly be used to solve many real-world issues in diverse domains, such as cybersecurity, smart cities and healthcare summarized in Sect. “ Applications of Machine Learning ”. However, the hybrid learning model, e.g., the ensemble of methods, modifying or enhancement of the existing learning techniques, or designing new learning methods, could be a potential future work in the area.

Thus, the ultimate success of a machine learning-based solution and corresponding applications mainly depends on both the data and the learning algorithms. If the data are bad to learn, such as non-representative, poor-quality, irrelevant features, or insufficient quantity for training, then the machine learning models may become useless or will produce lower accuracy. Therefore, effectively processing the data and handling the diverse learning algorithms are important, for a machine learning-based solution and eventually building intelligent applications.

In this paper, we have conducted a comprehensive overview of machine learning algorithms for intelligent data analysis and applications. According to our goal, we have briefly discussed how various types of machine learning methods can be used for making solutions to various real-world issues. A successful machine learning model depends on both the data and the performance of the learning algorithms. The sophisticated learning algorithms then need to be trained through the collected real-world data and knowledge related to the target application before the system can assist with intelligent decision-making. We also discussed several popular application areas based on machine learning techniques to highlight their applicability in various real-world issues. Finally, we have summarized and discussed the challenges faced and the potential research opportunities and future directions in the area. Therefore, the challenges that are identified create promising research opportunities in the field which must be addressed with effective solutions in various application areas. Overall, we believe that our study on machine learning-based solutions opens up a promising direction and can be used as a reference guide for potential research and applications for both academia and industry professionals as well as for decision-makers, from a technical point of view.

Canadian institute of cybersecurity, university of new brunswick, iscx dataset, http://www.unb.ca/cic/datasets/index.html/ (Accessed on 20 October 2019).

Cic-ddos2019 [online]. available: https://www.unb.ca/cic/datasets/ddos-2019.html/ (Accessed on 28 March 2020).

World health organization: WHO. http://www.who.int/ .

Google trends. In https://trends.google.com/trends/ , 2019.

Adnan N, Nordin Shahrina Md, Rahman I, Noor A. The effects of knowledge transfer on farmers decision making toward sustainable agriculture practices. World J Sci Technol Sustain Dev. 2018.

Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD international conference on Management of data. 1998; 94–105

Agrawal R, Imieliński T, Swami A. Mining association rules between sets of items in large databases. In: ACM SIGMOD Record. ACM. 1993;22: 207–216

Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Fast algorithms for mining association rules. In: Proceedings of the International Joint Conference on Very Large Data Bases, Santiago Chile. 1994; 1215: 487–499.

Aha DW, Kibler D, Albert M. Instance-based learning algorithms. Mach Learn. 1991;6(1):37–66.

Article   Google Scholar  

Alakus TB, Turkoglu I. Comparison of deep learning approaches to predict covid-19 infection. Chaos Solit Fract. 2020;140:

Amit Y, Geman D. Shape quantization and recognition with randomized trees. Neural Comput. 1997;9(7):1545–88.

Ankerst M, Breunig MM, Kriegel H-P, Sander J. Optics: ordering points to identify the clustering structure. ACM Sigmod Record. 1999;28(2):49–60.

Anzai Y. Pattern recognition and machine learning. Elsevier; 2012.

MATH   Google Scholar  

Ardabili SF, Mosavi A, Ghamisi P, Ferdinand F, Varkonyi-Koczy AR, Reuter U, Rabczuk T, Atkinson PM. Covid-19 outbreak prediction with machine learning. Algorithms. 2020;13(10):249.

Article   MathSciNet   Google Scholar  

Baldi P. Autoencoders, unsupervised learning, and deep architectures. In: Proceedings of ICML workshop on unsupervised and transfer learning, 2012; 37–49 .

Balducci F, Impedovo D, Pirlo G. Machine learning applications on agricultural datasets for smart farm enhancement. Machines. 2018;6(3):38.

Boukerche A, Wang J. Machine learning-based traffic prediction models for intelligent transportation systems. Comput Netw. 2020;181

Breiman L. Bagging predictors. Mach Learn. 1996;24(2):123–40.

Article   MATH   Google Scholar  

Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.

Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression trees. CRC Press; 1984.

Cao L. Data science: a comprehensive overview. ACM Comput Surv (CSUR). 2017;50(3):43.

Google Scholar  

Carpenter GA, Grossberg S. A massively parallel architecture for a self-organizing neural pattern recognition machine. Comput Vis Graph Image Process. 1987;37(1):54–115.

Chiu C-C, Sainath TN, Wu Y, Prabhavalkar R, Nguyen P, Chen Z, Kannan A, Weiss RJ, Rao K, Gonina E, et al. State-of-the-art speech recognition with sequence-to-sequence models. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018 pages 4774–4778. IEEE .

Chollet F. Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1251–1258, 2017.

Cobuloglu H, Büyüktahtakın IE. A stochastic multi-criteria decision analysis for sustainable biomass crop selection. Expert Syst Appl. 2015;42(15–16):6065–74.

Das A, Ng W-K, Woon Y-K. Rapid association rule mining. In: Proceedings of the tenth international conference on Information and knowledge management, pages 474–481. ACM, 2001.

de Amorim RC. Constrained clustering with minkowski weighted k-means. In: 2012 IEEE 13th International Symposium on Computational Intelligence and Informatics (CINTI), pages 13–17. IEEE, 2012.

Dey AK. Understanding and using context. Person Ubiquit Comput. 2001;5(1):4–7.

Eagle N, Pentland AS. Reality mining: sensing complex social systems. Person Ubiquit Comput. 2006;10(4):255–68.

Essien A, Petrounias I, Sampaio P, Sampaio S. Improving urban traffic speed prediction using data source fusion and deep learning. In: 2019 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE. 2019: 1–8. .

Essien A, Petrounias I, Sampaio P, Sampaio S. A deep-learning model for urban traffic flow prediction with traffic events mined from twitter. In: World Wide Web, 2020: 1–24 .

Ester M, Kriegel H-P, Sander J, Xiaowei X, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd. 1996;96:226–31.

Fatima M, Pasha M, et al. Survey of machine learning algorithms for disease diagnostic. J Intell Learn Syst Appl. 2017;9(01):1.

Flach PA, Lachiche N. Confirmation-guided discovery of first-order rules with tertius. Mach Learn. 2001;42(1–2):61–95.

Freund Y, Schapire RE, et al. Experiments with a new boosting algorithm. In: Icml, Citeseer. 1996; 96: 148–156

Fujiyoshi H, Hirakawa T, Yamashita T. Deep learning-based image recognition for autonomous driving. IATSS Res. 2019;43(4):244–52.

Fukunaga K, Hostetler L. The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans Inform Theory. 1975;21(1):32–40.

Article   MathSciNet   MATH   Google Scholar  

Goodfellow I, Bengio Y, Courville A, Bengio Y. Deep learning. Cambridge: MIT Press; 2016.

Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Advances in neural information processing systems. 2014: 2672–2680.

Guerrero-Ibáñez J, Zeadally S, Contreras-Castillo J. Sensor technologies for intelligent transportation systems. Sensors. 2018;18(4):1212.

Han J, Pei J, Kamber M. Data mining: concepts and techniques. Amsterdam: Elsevier; 2011.

Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In: ACM Sigmod Record, ACM. 2000;29: 1–12.

Harmon SA, Sanford TH, Sheng X, Turkbey EB, Roth H, Ziyue X, Yang D, Myronenko A, Anderson V, Amalou A, et al. Artificial intelligence for the detection of covid-19 pneumonia on chest ct using multinational datasets. Nat Commun. 2020;11(1):1–7.

He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell. 2015;37(9):1904–16.

He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 770–778.

Hinton GE. A practical guide to training restricted boltzmann machines. In: Neural networks: Tricks of the trade. Springer. 2012; 599-619

Holte RC. Very simple classification rules perform well on most commonly used datasets. Mach Learn. 1993;11(1):63–90.

Hotelling H. Analysis of a complex of statistical variables into principal components. J Edu Psychol. 1933;24(6):417.

Houtsma M, Swami A. Set-oriented mining for association rules in relational databases. In: Data Engineering, 1995. Proceedings of the Eleventh International Conference on, IEEE.1995:25–33.

Jamshidi M, Lalbakhsh A, Talla J, Peroutka Z, Hadjilooei F, Lalbakhsh P, Jamshidi M, La Spada L, Mirmozafari M, Dehghani M, et al. Artificial intelligence and covid-19: deep learning approaches for diagnosis and treatment. IEEE Access. 2020;8:109581–95.

John GH, Langley P. Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc. 1995; 338–345

Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: a survey. J Artif Intell Res. 1996;4:237–85.

Kamble SS, Gunasekaran A, Gawankar SA. Sustainable industry 4.0 framework: a systematic literature review identifying the current trends and future perspectives. Process Saf Environ Protect. 2018;117:408–25.

Kamble SS, Gunasekaran A, Gawankar SA. Achieving sustainable performance in a data-driven agriculture supply chain: a review for research and applications. Int J Prod Econ. 2020;219:179–94.

Kaufman L, Rousseeuw PJ. Finding groups in data: an introduction to cluster analysis, vol. 344. John Wiley & Sons; 2009.

Keerthi SS, Shevade SK, Bhattacharyya C, Radha Krishna MK. Improvements to platt’s smo algorithm for svm classifier design. Neural Comput. 2001;13(3):637–49.

Khadse V, Mahalle PN, Biraris SV. An empirical comparison of supervised machine learning algorithms for internet of things data. In: 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), IEEE. 2018; 1–6

Kohonen T. The self-organizing map. Proc IEEE. 1990;78(9):1464–80.

Koroniotis N, Moustafa N, Sitnikova E, Turnbull B. Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: bot-iot dataset. Fut Gen Comput Syst. 2019;100:779–96.

Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, 2012: 1097–1105

Kushwaha S, Bahl S, Bagha AK, Parmar KS, Javaid M, Haleem A, Singh RP. Significant applications of machine learning for covid-19 pandemic. J Ind Integr Manag. 2020;5(4).

Lade P, Ghosh R, Srinivasan S. Manufacturing analytics and industrial internet of things. IEEE Intell Syst. 2017;32(3):74–9.

Lalmuanawma S, Hussain J, Chhakchhuak L. Applications of machine learning and artificial intelligence for covid-19 (sars-cov-2) pandemic: a review. Chaos Sol Fract. 2020:110059 .

LeCessie S, Van Houwelingen JC. Ridge estimators in logistic regression. J R Stat Soc Ser C (Appl Stat). 1992;41(1):191–201.

LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324.

Liu H, Motoda H. Feature extraction, construction and selection: A data mining perspective, vol. 453. Springer Science & Business Media; 1998.

López G, Quesada L, Guerrero LA. Alexa vs. siri vs. cortana vs. google assistant: a comparison of speech-based natural user interfaces. In: International Conference on Applied Human Factors and Ergonomics, Springer. 2017; 241–250.

Liu B, HsuW, Ma Y. Integrating classification and association rule mining. In: Proceedings of the fourth international conference on knowledge discovery and data mining, 1998.

MacQueen J, et al. Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1967;volume 1, pages 281–297. Oakland, CA, USA.

Mahdavinejad MS, Rezvan M, Barekatain M, Adibi P, Barnaghi P, Sheth AP. Machine learning for internet of things data analysis: a survey. Digit Commun Netw. 2018;4(3):161–75.

Marchand A, Marx P. Automated product recommendations with preference-based explanations. J Retail. 2020;96(3):328–43.

McCallum A. Information extraction: distilling structured data from unstructured text. Queue. 2005;3(9):48–57.

Mehrotra A, Hendley R, Musolesi M. Prefminer: mining user’s preferences for intelligent mobile notification management. In: Proceedings of the International Joint Conference on Pervasive and Ubiquitous Computing, Heidelberg, Germany, 12–16 September, 2016; pp. 1223–1234. ACM, New York, USA. .

Mohamadou Y, Halidou A, Kapen PT. A review of mathematical modeling, artificial intelligence and datasets used in the study, prediction and management of covid-19. Appl Intell. 2020;50(11):3913–25.

Mohammed M, Khan MB, Bashier Mohammed BE. Machine learning: algorithms and applications. CRC Press; 2016.

Book   Google Scholar  

Moustafa N, Slay J. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In: 2015 military communications and information systems conference (MilCIS), 2015;pages 1–6. IEEE .

Nilashi M, Ibrahim OB, Ahmadi H, Shahmoradi L. An analytical method for diseases prediction using machine learning techniques. Comput Chem Eng. 2017;106:212–23.

Yujin O, Park S, Ye JC. Deep learning covid-19 features on cxr using limited training data sets. IEEE Trans Med Imaging. 2020;39(8):2688–700.

Otter DW, Medina JR , Kalita JK. A survey of the usages of deep learning for natural language processing. IEEE Trans Neural Netw Learn Syst. 2020.

Park H-S, Jun C-H. A simple and fast algorithm for k-medoids clustering. Expert Syst Appl. 2009;36(2):3336–41.

Liii Pearson K. on lines and planes of closest fit to systems of points in space. Lond Edinb Dublin Philos Mag J Sci. 1901;2(11):559–72.

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.

MathSciNet   MATH   Google Scholar  

Perveen S, Shahbaz M, Keshavjee K, Guergachi A. Metabolic syndrome and development of diabetes mellitus: predictive modeling based on machine learning techniques. IEEE Access. 2018;7:1365–75.

Santi P, Ram D, Rob C, Nathan E. Behavior-based adaptive call predictor. ACM Trans Auton Adapt Syst. 2011;6(3):21:1–21:28.

Polydoros AS, Nalpantidis L. Survey of model-based reinforcement learning: applications on robotics. J Intell Robot Syst. 2017;86(2):153–73.

Puterman ML. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons; 2014.

Quinlan JR. Induction of decision trees. Mach Learn. 1986;1:81–106.

Quinlan JR. C4.5: programs for machine learning. Mach Learn. 1993.

Rasmussen C. The infinite gaussian mixture model. Adv Neural Inform Process Syst. 1999;12:554–60.

Ravi K, Ravi V. A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl Syst. 2015;89:14–46.

Rokach L. A survey of clustering algorithms. In: Data mining and knowledge discovery handbook, pages 269–298. Springer, 2010.

Safdar S, Zafar S, Zafar N, Khan NF. Machine learning based decision support systems (dss) for heart disease diagnosis: a review. Artif Intell Rev. 2018;50(4):597–623.

Sarker IH. Context-aware rule learning from smartphone data: survey, challenges and future directions. J Big Data. 2019;6(1):1–25.

Sarker IH. A machine learning based robust prediction model for real-life mobile phone data. Internet Things. 2019;5:180–93.

Sarker IH. Ai-driven cybersecurity: an overview, security intelligence modeling and research directions. SN Comput Sci. 2021.

Sarker IH. Deep cybersecurity: a comprehensive overview from neural network and deep learning perspective. SN Comput Sci. 2021.

Sarker IH, Abushark YB, Alsolami F, Khan A. Intrudtree: a machine learning based cyber security intrusion detection model. Symmetry. 2020;12(5):754.

Sarker IH, Abushark YB, Khan A. Contextpca: predicting context-aware smartphone apps usage based on machine learning techniques. Symmetry. 2020;12(4):499.

Sarker IH, Alqahtani H, Alsolami F, Khan A, Abushark YB, Siddiqui MK. Context pre-modeling: an empirical analysis for classification based user-centric context-aware predictive modeling. J Big Data. 2020;7(1):1–23.

Sarker IH, Alan C, Jun H, Khan AI, Abushark YB, Khaled S. Behavdt: a behavioral decision tree learning to build user-centric context-aware predictive model. Mob Netw Appl. 2019; 1–11.

Sarker IH, Colman A, Kabir MA, Han J. Phone call log as a context source to modeling individual user behavior. In: Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing (Ubicomp): Adjunct, Germany, pages 630–634. ACM, 2016.

Sarker IH, Colman A, Kabir MA, Han J. Individualized time-series segmentation for mining mobile phone user behavior. Comput J Oxf Univ UK. 2018;61(3):349–68.

Sarker IH, Hoque MM, MdK Uddin, Tawfeeq A. Mobile data science and intelligent apps: concepts, ai-based modeling and research directions. Mob Netw Appl, pages 1–19, 2020.

Sarker IH, Kayes ASM. Abc-ruleminer: user behavioral rule-based machine learning method for context-aware intelligent services. J Netw Comput Appl. 2020; page 102762

Sarker IH, Kayes ASM, Badsha S, Alqahtani H, Watters P, Ng A. Cybersecurity data science: an overview from machine learning perspective. J Big Data. 2020;7(1):1–29.

Sarker IH, Watters P, Kayes ASM. Effectiveness analysis of machine learning classification models for predicting personalized context-aware smartphone usage. J Big Data. 2019;6(1):1–28.

Sarker IH, Salah K. Appspred: predicting context-aware smartphone apps using random forest learning. Internet Things. 2019;8:

Scheffer T. Finding association rules that trade support optimally against confidence. Intell Data Anal. 2005;9(4):381–95.

Sharma R, Kamble SS, Gunasekaran A, Kumar V, Kumar A. A systematic literature review on machine learning applications for sustainable agriculture supply chain performance. Comput Oper Res. 2020;119:

Shengli S, Ling CX. Hybrid cost-sensitive decision tree, knowledge discovery in databases. In: PKDD 2005, Proceedings of 9th European Conference on Principles and Practice of Knowledge Discovery in Databases. Lecture Notes in Computer Science, volume 3721, 2005.

Shorten C, Khoshgoftaar TM, Furht B. Deep learning applications for covid-19. J Big Data. 2021;8(1):1–54.

Gökhan S, Nevin Y. Data analysis in health and big data: a machine learning medical diagnosis model based on patients’ complaints. Commun Stat Theory Methods. 2019;1–10

Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, et al. Mastering the game of go with deep neural networks and tree search. nature. 2016;529(7587):484–9.

Ślusarczyk B. Industry 4.0: Are we ready? Polish J Manag Stud. 17, 2018.

Sneath Peter HA. The application of computers to taxonomy. J Gen Microbiol. 1957;17(1).

Sorensen T. Method of establishing groups of equal amplitude in plant sociology based on similarity of species. Biol Skr. 1948; 5.

Srinivasan V, Moghaddam S, Mukherji A. Mobileminer: mining your frequent patterns on your phone. In: Proceedings of the International Joint Conference on Pervasive and Ubiquitous Computing, Seattle, WA, USA, 13-17 September, pp. 389–400. ACM, New York, USA. 2014.

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015; pages 1–9.

Tavallaee M, Bagheri E, Lu W, Ghorbani AA. A detailed analysis of the kdd cup 99 data set. In. IEEE symposium on computational intelligence for security and defense applications. IEEE. 2009;2009:1–6.

Tsagkias M. Tracy HK, Surya K, Vanessa M, de Rijke M. Challenges and research opportunities in ecommerce search and recommendations. In: ACM SIGIR Forum. volume 54. NY, USA: ACM New York; 2021. p. 1–23.

Wagstaff K, Cardie C, Rogers S, Schrödl S, et al. Constrained k-means clustering with background knowledge. Icml. 2001;1:577–84.

Wang W, Yang J, Muntz R, et al. Sting: a statistical information grid approach to spatial data mining. VLDB. 1997;97:186–95.

Wei P, Li Y, Zhang Z, Tao H, Li Z, Liu D. An optimization method for intrusion detection classification model based on deep belief network. IEEE Access. 2019;7:87593–605.

Weiss K, Khoshgoftaar TM, Wang DD. A survey of transfer learning. J Big data. 2016;3(1):9.

Witten IH, Frank E. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann; 2005.

Witten IH, Frank E, Trigg LE, Hall MA, Holmes G, Cunningham SJ. Weka: practical machine learning tools and techniques with java implementations. 1999.

Wu C-C, Yen-Liang C, Yi-Hung L, Xiang-Yu Y. Decision tree induction with a constrained number of leaf nodes. Appl Intell. 2016;45(3):673–85.

Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Philip SY, et al. Top 10 algorithms in data mining. Knowl Inform Syst. 2008;14(1):1–37.

Xin Y, Kong L, Liu Z, Chen Y, Li Y, Zhu H, Gao M, Hou H, Wang C. Machine learning and deep learning methods for cybersecurity. IEEE Access. 2018;6:35365–81.

Xu D, Yingjie T. A comprehensive survey of clustering algorithms. Ann Data Sci. 2015;2(2):165–93.

Zaki MJ. Scalable algorithms for association mining. IEEE Trans Knowl Data Eng. 2000;12(3):372–90.

Zanella A, Bui N, Castellani A, Vangelista L, Zorzi M. Internet of things for smart cities. IEEE Internet Things J. 2014;1(1):22–32.

Zhao Q, Bhowmick SS. Association rule mining: a survey. Singapore: Nanyang Technological University; 2003.

Zheng T, Xie W, Xu L, He X, Zhang Y, You M, Yang G, Chen Y. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform. 2017;97:120–7.

Zheng Y, Rajasegarar S, Leckie C. Parking availability prediction for sensor-enabled car parks in smart cities. In: Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), 2015 IEEE Tenth International Conference on. IEEE, 2015; pages 1–6.

Zhu H, Cao H, Chen E, Xiong H, Tian J. Exploiting enriched contextual information for mobile app classification. In: Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, 2012; pages 1617–1621

Zhu H, Chen E, Xiong H, Kuifei Y, Cao H, Tian J. Mining mobile user preferences for personalized context-aware recommendation. ACM Trans Intell Syst Technol (TIST). 2014;5(4):58.

Zikang H, Yong Y, Guofeng Y, Xinyu Z. Sentiment analysis of agricultural product ecommerce review data based on deep learning. In: 2020 International Conference on Internet of Things and Intelligent Applications (ITIA), IEEE, 2020; pages 1–7

Zulkernain S, Madiraju P, Ahamed SI. A context aware interruption management system for mobile devices. In: Mobile Wireless Middleware, Operating Systems, and Applications. Springer. 2010; pages 221–234

Zulkernain S, Madiraju P, Ahamed S, Stamm K. A mobile intelligent interruption management system. J UCS. 2010;16(15):2060–80.

Download references

Author information

Authors and affiliations.

Swinburne University of Technology, Melbourne, VIC, 3122, Australia

Iqbal H. Sarker

Department of Computer Science and Engineering, Chittagong University of Engineering & Technology, 4349, Chattogram, Bangladesh

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Iqbal H. Sarker .

Ethics declarations

Conflict of interest.

The author declares no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances in Computational Approaches for Artificial Intelligence, Image Processing, IoT and Cloud Applications” guest edited by Bhanu Prakash K N and M. Shivakumar.

Rights and permissions

Reprints and permissions

About this article

Sarker, I.H. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN COMPUT. SCI. 2 , 160 (2021). https://doi.org/10.1007/s42979-021-00592-x

Download citation

Received : 27 January 2021

Accepted : 12 March 2021

Published : 22 March 2021

DOI : https://doi.org/10.1007/s42979-021-00592-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Machine learning
  • Deep learning
  • Artificial intelligence
  • Data science
  • Data-driven decision-making
  • Predictive analytics
  • Intelligent applications
  • Find a journal
  • Publish with us
  • Track your research

eCornell logo

Outside USA: +1‑607‑330‑3200

Problem‑Solving with Machine Learning Cornell Course

Select start date, problem-solving with machine learning, course overview, key course takeaways.

  • Define and reframe problems using machine learning (supervised learning) concepts and terminology
  • Identify the applicability, assumptions, and limitations of the k-NN algorithm
  • Simplify and make Python code efficient with matrix operations using NumPy, a library for the Python programming language
  • Build a face recognition system using the k-nearest neighbors algorithm
  • Compute the accuracy of an algorithm by implementing loss functions

problem solving machine learning

How It Works

Course author.

Kilian Weinberger

  • Certificates Authored

Kilian Weinberger is an Associate Professor in the Department of Computer Science at Cornell University. He received his Ph.D. from the University of Pennsylvania in Machine Learning under the supervision of Lawrence Saul and his undergraduate degree in Mathematics and Computer Science from the University of Oxford. During his career he has won several best paper awards at ICML (2004), CVPR (2004, 2017), AISTATS (2005) and KDD (2014, runner-up award). In 2011 he was awarded the Outstanding AAAI Senior Program Chair Award and in 2012 he received an NSF CAREER award. He was elected co-Program Chair for ICML 2016 and for AAAI 2018. In 2016 he was the recipient of the Daniel M Lazar ’29 Excellence in Teaching Award. Kilian Weinberger’s research focuses on Machine Learning and its applications. In particular, he focuses on learning under resource constraints, metric learning, machine learned web-search ranking, computer vision and deep learning. Before joining Cornell University, he was an Associate Professor at Washington University in St. Louis and before that he worked as a research scientist at Yahoo! Research in Santa Clara.

  • Machine Learning

Who Should Enroll

  • Programmers
  • Data analysts
  • Statisticians
  • Data scientists
  • Software engineers

Stack To A Certificate

Request information now by completing the form below..

Enter your information to get access to a virtual open house with the eCornell team to get your questions answered live.

  • Español – América Latina
  • Português – Brasil
  • Tiếng Việt
  • TensorFlow Core

Basic training loops

In the previous guides, you have learned about tensors , variables , gradient tape , and modules . In this guide, you will fit these all together to train models.

TensorFlow also includes the tf.Keras API , a high-level neural network API that provides useful abstractions to reduce boilerplate. However, in this guide, you will use basic classes.

Solving machine learning problems

Solving a machine learning problem usually consists of the following steps:

  • Obtain training data.
  • Define the model.
  • Define a loss function.
  • Run through the training data, calculating loss from the ideal value
  • Calculate gradients for that loss and use an optimizer to adjust the variables to fit the data.
  • Evaluate your results.

For illustration purposes, in this guide you'll develop a simple linear model, \(f(x) = x * W + b\), which has two variables: \(W\) (weights) and \(b\) (bias).

This is the most basic of machine learning problems: Given \(x\) and \(y\), try to find the slope and offset of a line via simple linear regression .

Supervised learning uses inputs (usually denoted as x ) and outputs (denoted y , often called labels ). The goal is to learn from paired inputs and outputs so that you can predict the value of an output from an input.

Each input of your data, in TensorFlow, is almost always represented by a tensor, and is often a vector. In supervised training, the output (or value you'd like to predict) is also a tensor.

Here is some data synthesized by adding Gaussian (Normal) noise to points along a line.

Tensors are usually gathered together in batches , or groups of inputs and outputs stacked together. Batching can confer some training benefits and works well with accelerators and vectorized computation. Given how small this dataset is, you can treat the entire dataset as a single batch.

Define the model

Use tf.Variable to represent all weights in a model. A tf.Variable stores a value and provides this in tensor form as needed. See the variable guide for more details.

Use tf.Module to encapsulate the variables and the computation. You could use any Python object, but this way it can be easily saved.

Here, you define both w and b as variables.

The initial variables are set here in a fixed way, but Keras comes with any of a number of initializers you could use, with or without the rest of Keras.

Define a loss function

A loss function measures how well the output of a model for a given input matches the target output. The goal is to minimize this difference during training. Define the standard L2 loss, also known as the "mean squared" error:

Before training the model, you can visualize the loss value by plotting the model's predictions in red and the training data in blue:

Define a training loop

The training loop consists of repeatedly doing three tasks in order:

  • Sending a batch of inputs through the model to generate outputs
  • Calculating the loss by comparing the outputs to the output (or label)
  • Using gradient tape to find the gradients
  • Optimizing the variables with those gradients

For this example, you can train the model using gradient descent .

There are many variants of the gradient descent scheme that are captured in tf.keras.optimizers . But in the spirit of building from first principles, here you will implement the basic math yourself with the help of tf.GradientTape for automatic differentiation and tf.assign_sub for decrementing a value (which combines tf.assign and tf.sub ):

For a look at training, you can send the same batch of x and y through the training loop, and see how W and b evolve.

Do the training

Plot the evolution of the weights over time:

Visualize how the trained model performs

The same solution, but with Keras

It's useful to contrast the code above with the equivalent in Keras.

Defining the model looks exactly the same if you subclass tf.keras.Model . Remember that Keras models inherit ultimately from module.

Rather than write new training loops each time you create a model, you can use the built-in features of Keras as a shortcut. This can be useful when you do not want to write or debug Python training loops.

If you do, you will need to use model.compile() to set the parameters, and model.fit() to train. It can be less code to use Keras implementations of L2 loss and gradient descent, again as a shortcut. Keras losses and optimizers can be used outside of these convenience functions, too, and the previous example could have used them.

Keras fit expects batched data or a complete dataset as a NumPy array. NumPy arrays are chopped into batches and default to a batch size of 32.

In this case, to match the behavior of the hand-written loop, you should pass x in as a single batch of size 1000.

Note that Keras prints out the loss after training, not before, so the first loss appears lower, but otherwise this shows essentially the same training performance.

In this guide, you have seen how to use the core classes of tensors, variables, modules, and gradient tape to build and train a model, and further how those ideas map to Keras.

This is, however, an extremely simple problem. For a more practical introduction, see Custom training walkthrough .

For more on using built-in Keras training loops, see this guide . For more on training loops and Keras, see this guide . For writing custom distributed training loops, see this guide .

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2024-03-23 UTC.

  • Machine Learning
  • Español – América Latina
  • Português – Brasil
  • Tiếng Việt
  • Foundational courses
  • Crash Course

This page lists the exercises in Machine Learning Crash Course.

The majority of the Programming Exercises use the California housing data set .

Programming exercises run directly in your browser (no setup required!) using the Colaboratory platform. Colaboratory is supported on most major browsers, and is most thoroughly tested on desktop versions of Chrome and Firefox. If you'd prefer to download and run the exercises offline, see these instructions for setting up a local environment.

In March, 2020, this course began using Programming Exercises coded with tf.keras. If you'd prefer to use the legacy Estimators Programming Exercises, you can find them on GitHub .

  • Check Your Understanding: Supervised Learning, Features and Labels

Descending into ML

  • Check Your Understanding: Mean Squared Error

Reducing Loss

  • Optimizing Learning Rate
  • Check Your Understanding: Batch Size
  • Playground: Learning Rate and Convergence

First Steps with TensorFlow

  • Programming Exercise: NumPy Ultraquick Tutorial
  • Programming Exercise: pandas UltraQuick Tutorial
  • Programming Exercise: Linear Regression with Synthetic Data
  • Programming Exercise: Linear Regression with a Real Dataset

Training and Test Sets

  • Playground: Training Sets and Test Sets
  • Check Your Intuition: Validation
  • Programming Exercise: Validation Sets and Test Sets

Feature Crosses

  • Playground: Introducing Feature Crosses, More Complex Feature Crosses
  • Check Your Understanding: Feature Crosses
  • Programming Exercise: Representation with Feature Crosses

Regularization for Simplicity

  • Playground: Overcrossing?
  • Check Your Understanding: L 2 Regularization, L 2 Regularization and Correlated Features
  • Playground: Examining L 2 Regularization

Classification

  • Check Your Understanding: Accuracy, Precision, Recall, Precision and Recall
  • Check Your Understanding: ROC and AUC
  • Programming Exercise: Binary Classification

Regularization for Sparsity

  • Check Your Understanding: L 1 Regularization, L 1 vs. L 2 Regularization
  • Playground: Examining L 1 Regularization

Intro to Neural Nets

  • Playground: A First Neural Network, Neural Net Initialization, Neural Net Spiral
  • Programming Exercise: Intro to Neural Networks

Training Neural Nets

  • Backpropagation algorithm visual explanation

Multi-Class Neural Nets

  • Programming Exercise: Multi-Class Classification with MNIST
  • Check Your Understanding: Fairness
  • Programming Exercise: Intro to Fairness

Static vs. Dynamic Training

  • Check Your Understanding: Online Training, Offline Training

Static vs. Dynamic Inference

  • Check Your Understanding: Offline Inference, Online Inference

Data Dependencies

  • Check Your Understanding: Data Dependencies

Programming

Check your understanding.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2022-07-18 UTC.

Help | Advanced Search

Computer Science > Machine Learning

Title: measuring mathematical problem solving with the math dataset.

Abstract: Many intellectual endeavors require mathematical problem solving, but this skill remains beyond the capabilities of computers. To measure this ability in machine learning models, we introduce MATH, a new dataset of 12,500 challenging competition mathematics problems. Each problem in MATH has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations. To facilitate future research and increase accuracy on MATH, we also contribute a large auxiliary pretraining dataset which helps teach models the fundamentals of mathematics. Even though we are able to increase accuracy on MATH, our results show that accuracy remains relatively low, even with enormous Transformer models. Moreover, we find that simply increasing budgets and model parameter counts will be impractical for achieving strong mathematical reasoning if scaling trends continue. While scaling Transformers is automatically solving most other text-based tasks, scaling is not currently solving MATH. To have more traction on mathematical problem solving we will likely need new algorithmic advancements from the broader research community.

Submission history

Access paper:.

  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

DBLP - CS Bibliography

Bibtex formatted citation.

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Machine Learning for Problem Solving

Description.

The main premise of the course is to equip students with the intuitive understanding of machine learning concepts grounded in real-world applications. The course is designed to deliver the practical knowledge and experience necessary for recognizing and formulating machine learning problems in the real world, as well as of the best practices and tools for effectively applying machine learning in practice. The emphasis will be on learning and practicing the machine learning process, involving the cycle of feature design, modeling, and evaluation.  Visit the Heinz College website for a more detailed description of the course.  Students are expected to have the following background: - Basic knowledge of probability  - Basic knowledge of linear algebra  - Working knowledge of basic computing principles - Basic programming skills at a level sufficient to write a reasonably non-trivial computer program in Python

Learning Outcomes

see course website https://www.andrew.cmu.edu/user/lakoglu/courses/95828/index.htm

Prerequisites Description

see https://www.andrew.cmu.edu/user/lakoglu/courses/95828/policy.htm

  • Syllabus (Leman Akoglu - S24)

10 Real World Problems That Machine Learning Can Solve

  • 10 Real World Problems That Machine Learning Can Solve

Machine Learning is one of the modern-day technologies, which was also just an idea a few years back. None of us ever believed the fact that our vision can be evolved to a point where it starts to feel like science fiction. But we have narrowed the possibilities as we are getting closer to automation by building intelligent machines capable of performing tasks without any human touch or intelligence.

It all started in 1943 when the brilliant mathematician Alan Turing created “The Bombe,” a machine that was cracking a staggering total of 80,000 Enigma messages each month. Not only did it help Allied Forces win World War II, but it also asked a simple question “Can machines figure out ideas on their own?”

We strongly believe that introduction of Artificial Intelligence and its sub-parts like Machine Learning is the answer to that question. Today we are not talking about one of the trending advances in technology which is hot today and forgotten tomorrow; we are discussing Machine Learning which is here to stay and make lives easier on both ends. You might have no idea about “Machine Learning,” but after reading this blog, you will know how it shapes and streamlines the way we work, live, and communicate, in short, “Our Future.” 

What is Machine Learning?

Machine Learning is a sub-array of artificial intelligence where computer algorithms independently learn from data and information without any human intervention. It’s a practice of using built-in algorithms to analyze data and further learn from it. 

The objective of machine learning is to adapt to new data independently and make decisions and suggestions based on thousands of calculations and analyses. As it is a sub-part of artificial intelligence, the process is accomplished by infusing deep learning applications and AI machines from their fed data. 

How Do Problem-Solving And Machine Learning Co-Relates?

The basic definition of problem-solving is to find the most accurate process of finding solutions to complex problems. When we look at the ML, then it is pretty vast and is expanding rapidly. How do problem-solving and machine learning correlate? 

In 1977, Tom Mitchell shared a “well-conditioned” definition to the ML, which we believe is the perfect representation of today’s market scenario and how we can use ML to make things easier for businesses from all around the globe. He stated, “An ML is a computer program which is said to learn from experience E to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.”

So, suppose you want your program to predict something, for example, identifying a person from a group of individuals using their interaction methods (task T). In that case, you can feed every individual’s way of interaction data (experience E) to a machine learning algorithm. It will successfully learn to PREDICT the individual pattern through their respective way of interaction (performance measure P). It means you are always one step ahead as you already know what they might say or do. 

With the inclusion of machine learning in your business, you can use the previously gathered data from your industry to develop better ways than the existing solutions using Machine Learning. The main objective of ML is never to make “perfect” speculations because ML trades in domains. Its main goal is to come up with good enough estimates so these predictions can be helpful in different ways. 

Who’s Using It?

The principal element of machine learning is data. All the major industries that work with large amounts of data have already acknowledged the worth of machine learning technology. With endless data, you have endless possibilities. It opens gates to many practical ways that can help scale your business to further domains. By gathering valuable insights from this data, organizations are finding a more efficient way to work and gaining an edge over their competitors.

Some of the top industries in the world that are currently using Machine Learning are: 

  • Financial Services

Banks and other financial institutions in Singapore are using ML for two key purposes: preventing fraud and identifying essential insights in data.

  • Transportation

ML has already become a crucial prospect to delivery companies and other transportation enterprises. They use it to analyze data to identify patterns and even trends, making routes more efficient and even predicting potential problems to increase profitability.  

  • Health and Care

Health Care is also reaping the benefit of ML as it is helping medical experts analyze data to identify patterns to offer better diagnoses and treatment to the patients. 

Ecommerce has already become part of our daily lives. Retail businesses are using ML to recommend items that users might LIKE by analyzing their previous purchases.

  • Oil & Gas

Machine learning in the Oil and Gas industry is immense and will expand to various factors very soon. It is currently being used to detect new energy sources, analyze minerals in the ground, and even anticipate any kind of refinery sensor failures. 

10 Real-World Problems that Machine Learning can solve           

1. recommending products after collecting previous data.

Recommendation systems are one of the most common machine learning use cases in day-to-day life. These systems are used mainly by search engines like Google and Bing and the top eCommerce platforms like Amazon and eBay.

The ML integrated systems show a list of recommended products individually for each of their consumers. These suggestions are based on data like previous purchases, wish lists, searches, clicks, inquiries, and browsing history. This data is fed to a comprehensive ML algorithm to strike the user at the right moment and enhance their customer engagement. 

2. Works as the Best Image and Video Recognition Tool

If you have come across features like face recognition, text detection, object detection, and landmark detection, it is because of the integration of deep learning in machine learning. When ML algorithms are trained with innovative deep learning frameworks, they can quickly identify and classify objects and make things easier for a non-native person.

MNL can also be used to determine handwritten text by segmenting a piece of writing into smaller images, each containing a single character. 

3. Your Virtual Assistant

A virtual assistant, which is also very common as an AI assistant, is an application program that comprehends natural language voice commands and finishes the tasks for the users like searching the web, booking an appointment, etc. If you have also asked Google Assistant in your android phones to wake you up at 5 AM or asked Siri on your iPhones for directions to the nearest restaurant, then ML has also made your life easier.

Some principal personal assistants or smart assistants available in the market are Siri, Google Assistants, Alexa, Echo, and Google mini. These assistants can help you look for information by voice commands or answer your questions by searching your query on the web. 

4. Ingenious Gaming Using ML

With the advancement in technologies, we can improve the graphics of the games and give them a mind at the same time. Lately, if you have been facing difficulties beating the bot in a chess game, then ML might take it over. Today’s games not only simply analyze your moves but are also learning how to play the game better than you by practicing numerous times. Now using your mind against such an intelligent system will surely give you brains and make you smarter at the same time. 

5. Devising Superior Health Care Methods

Even hospitals are utilizing machine learning to cure and treat patients. Thanks to our wearable devices, doctors can get accurate data on our health from anywhere in the world and suggest an aid if they find something helpful. The integration of ML in some essential tools can quickly provide real-time insights and combine with the explosion of computing power. 

It can help doctors to diagnose critically faster and more accurately. Not only this, AI is assisting in the development of new medications and treatments, predicting harmful reactions at the early stages, and working towards finding a way to lower the costs of healthcare for providers and patients. 

6. Protecting Environment in the Most Impactful Way

Aforesaid, possibilities are endless with ML, and it’s just the beginning. Recently IBM’s Green Horizon Project was acknowledged by experts worldwide as it accurately predicted weather and pollution forecasts. We can use it to save and predict natural forecasts with the expertise of the professionals from Singapore by our side. It is helping city planners to run every kind of scenario just by feeding previous data to their ML algorithm to find ways for minimum environmental impact. 

7. Real-Time Dynamic Pricing

You might have already encountered this scenario while booking a flight ticket to travel on Christmas or booking a cab at peak hours. You will notice a big gap between the regular pricing and pricing at that particular time. So, in these scenarios, the ML and data analysis techniques are helping businesses to get to know more about their users. It answers two critical questions. 

First, how are customers reacting to surge prices? And second, whether they are looking for customers because of surge pricing? The integration of AI and ML helps the businesses and the users as it helps determine when customers are looking for the best promotional and discounted prices. 

8. Innovations in the Finance Sector Including Stock Market

The functioning of the finance sector is about to change in the upcoming years completely. Thanks to technologies like mobile app development and machine learning, the stock market is at its all-time high. 

Thanks to AI, deep learning, and machine learning, it has become easy for users to predict the market price by feeding it the previous data. It will allow traders to make better and steady decisions which means less financial loss. Not only this, the machine learning-based anomaly detection models can easily monitor your every transaction request and alert you of any kind of suspicious activity.

9. Commute Predictions Using Machine Learning

Almost everyone uses GPS services while driving. A programmed GPS helps us in finding the proper navigation to our destination. But it’s the integration of ML and its features like congestion analysis GPS that helps us by telling us the path to avoid traffic and reach our destination on time. It saves the data like our location and velocities of the vehicles on the same path to determine the traffic and let us know whether it would be the right path to follow. 

10. Online Video OTT Streaming Applications 

The pioneers of online video streaming services are Netflix and Amazon Prime; both of them combined have killed the traditional way of watching television. But how were they able to keep the customer engaged on their platforms? First by offering impressive content, secondly by getting personalized with the users. 

They were able to capture mass audiences using machine learning. At the right time, they integrated ML in their program and fed it the user’s data like day and time they watch content, type of content they like to watch, browsing pattern, whether they like to watch trailers before they watch a movie or a show, etc. They are using practical machine learning frameworks to engage their audience by providing quality streaming service right to their homes. 

The possibilities with machine learning are limitless. All you need to do is find a comprehensive way to use it in your particular business domain to improve your services. Sometimes it can be indispensable to understand the problem at hand, as you can’t use any ML algorithm for your business needs.

Every problem is different from the previous one in machine learning. This means that you can’t just feed some data to a machine learning algorithm with a neural network and pray for the results.

Every situation demands a different approach and that is why it is crucial to consider looking for professional Machine Learning experts.

People who must have enough experience in the field and can work with an open mind to first understand your business requirements. And then come up with the incredible machine learning algorithm that benefits your business immensely without wasting anyone’s time. 

At ICore Singapore, you get all-inclusive Artificial Intelligence and Machine Learning Development services, redefining the way your business operates.

With the right mix of AI/ML development teams, you can trust us for high-quality solutions that cover all your needs.

Our Writing team believes you had a great time reading this piece. If this helped you in any possible way, a pat on our back would be great!

Share your thoughts via comments or just a thumbs up would do!

Our IT Solutions

  • Web Development Services
  • PHP Web Development Services
  • Laravel Development Services
  • CakePHP Development Services
  • CodeIgniter Development Services
  • Python Development Services
  • Full Stack Development Services
  • React JS Development Services
  • Node JS Development Services
  • Angular JS Development Services
  • Web Design Services
  • eCommerce Development Services
  • iOS/iPhone App Development Services
  • Android App Development Services
  • Hybrid App Development Services
  • WooCommerce Development Services
  • OpenCart Development Services
  • Shopify Development Services
  • CRM Software Development Services
  • ERP Software Development Services
  • Shopify Website Design & Development Services
  • WordPress Development Services
  • Magento Development Services
  • Digital Marketing Services
  • Search Engine Optimization (SEO) Services
  • Search Engine Marketing (SEM) Services
  • Social Media Marketing (SMM) Services
  • Content Marketing Services

Recent Blog Posts

  • UI vs. UX Design: Definition, Uses & Differences [2024]
  • Magento ecommerce Platform: Is It Worth Choosing?
  • Top 9 eCommerce Sites in Singapore
  • How Much it Costs to build a Website in Singapore?
  • Choosing Magento Web Development Service for ecommerce?
  • Must Have Tools Required For UI/UX Design? [2024]
  • Top Skills You Need to Become UI/UX Designer in Singapore
  • How to increase eCommerce sales of your online store?
  • Why IoT is the Future of Mobile App Development
  • 6 Reasons Developers Should Use Webflow
  • List of Key Factors to Consider For Measuring eCommerce Success
  • Website SEO Optimization Tips to Follow in 2024
  • Guide to Choose the Right eCommerce Platform in Singapore
  • Best Flutter Development Tools in 2024
  • Social Media Trends You Cannot Ignore in 2024
  • Digital Marketing and Small Business: Understanding the bond
  • New Possibilities with Python App Development
  • React Native for Cross-Platform Mobile App Development
  • Website Maintenance Cost in Singapore: Detailed Explanation
  • Top Reasons Why Python Development is Widely Used in Enterprise
  • Top Python Image-Processing Libraries for Machine Learning
  • Integrating Python in Power BI Accomplishing for Better Results
  • 7 Best Web Design Software in 2024
  • Top OpenCart Development Companies in Singapore 2024
  • Is your eCommerce Website Design Search Engine Friendly?
  • Top CodeIgniter Web Development Companies 2024
  • How to Scale your Web Application
  • Why Choose Magento 2 for Your eCommerce Website
  • Django Web Framework For App Development
  • Comprehensive Guide on Web Scraping Using Python
  • Common Web Design Mistakes to Avoid in 2024
  • Tips to Hire a Web Development Team
  • Top UI/UX Design Agencies in Singapore 2024
  • Top Appointment Scheduling Apps to Streamline Your Business
  • Social Media Strategy and Budget: 2024
  • Django vs Flask: The Best Python Web Framework?
  • Top Features to Look for in a Maid App in Singapore
  • Top Magento Ecommerce Agency in Singapore 2024
  •  Top Front End Development Tools for 2024
  • AI to automating routine IT tasks and optimizing service delivery!
  • Python 2 Vs Python 3 and Why is it the Right Time for Migration
  • Top 20 Python Based CMS Of All Time
  • Which ERP Software Solution will be best for future?
  • Top 7 Python Web Development Libraries to Choose From in 2024
  • What is the Cost of UI/UX Design Agency in Singapore?
  • Top Automotive Industries Utilizing Mobile Apps
  • How Food Ordering App helps to scale up Restaurant Business
  • 11 Tips for automotive website design
  • Ecommerce SEO: A Need of Online Marketing

What is machine learning?

""

For most of our history, we’ve thought that learning—the ability to adjust our behavior based on collected information—was something only humans did. The past few decades have changed all that. We now know that animals of all kinds learn from experience, teaching, and even play. But it is not only animals that learn: there’s increasing evidence that plants do, too . And if you’ve ever unlocked a phone with facial recognition, or interacted with a virtual assistant, you’ve experienced firsthand that machines, too, are capable of learning.

Get to know and directly engage with McKinsey experts on machine learning.

Michael Chui is a partner at the McKinsey Global Institute and is based in McKinsey’s Bay Area office; Tamim Saleh  is a senior partner in the London office, where Alex Sukharevsky  is a senior partner; and Alex Singla  is a senior partner in the Chicago office.

Machine learning is a form of artificial intelligence  (AI) that can adapt to a wide range of inputs, including large data sets and human instruction. (Some machine learning algorithms are specialized in training themselves to detect patterns; this is called deep learning, which we explore in detail in a separate Explainer .) The term “machine learning” was first coined in 1959 by computer scientist Arthur Samuel, who defined it as “a computer’s ability to learn without being explicitly programmed.” It follows, then, that machine learning algorithms are able to detect patterns and learn how to make predictions and recommendations by processing data and experiences, rather than by receiving explicit programming instruction. The algorithms also adapt in response to new data and experiences to improve over time.

Today, the need—and potential—for machine learning is greater than ever. The volume and complexity of data that is now being generated is far too vast for humans to reckon with. In the years since its widespread deployment, machine learning has had impact in a number of industries, including medical-imaging analysis  and high-resolution weather forecasting.

How did machine learning evolve into generative AI?

Machine learning as a discipline was first introduced in 1959, building on formulas and hypotheses dating back to the 1930s . But it wasn’t until the late 1990s that machine learning truly flowered, as steady advances in digitization, computing languages capable of greater nuance, and cheaper computing power and memory enabled data scientists to train machine learning models to independently learn from data sets rather than rely on rules written for them. The broad availability of inexpensive cloud services later accelerated advances in machine learning even further.

Deep learning is a more advanced version of machine learning that is particularly adept at processing a wider range of data resources (text as well as unstructured data including images), requires even less human intervention, and can often produce more accurate results than traditional machine learning. Deep learning uses neural networks—based on the ways neurons interact in the human brain —to ingest and process data through multiple neuron layers that can recognize increasingly complex features of the data. For example, an early neuron layer might recognize something as being in a specific shape; building on this knowledge, a later layer might be able to identify the shape as a stop sign. Similar to machine learning, deep learning uses iteration to self-correct and to improve its prediction capabilities. Once it “learns” what a stop sign looks like, it can recognize a stop sign in a new image.

Circular, white maze filled with white semicircles.

Introducing McKinsey Explainers : Direct answers to complex questions

This technological advancement was foundational to the AI tools emerging today. ChatGPT, released in late 2022, made AI visible—and accessible—to the general public for the first time. ChatGPT, and other language models like it, were trained on deep learning tools called transformer networks to generate  content in response to prompts . Transformer networks allow generative AI (gen AI) tools to weigh different parts of the input sequence differently when making predictions. Transformer networks, comprising encoder and decoder layers, allow gen AI models to learn relationships and dependencies between words in a more flexible way compared with traditional machine and deep learning models. That’s because transformer networks are trained on huge swaths of the internet (for example, all traffic footage ever recorded and uploaded) instead of a specific subset of data (certain images of a stop sign, for instance). Foundation models trained on transformer network architecture—like OpenAI’s ChatGPT or Google’s BERT—are able to transfer what they’ve learned from a specific task to a more generalized set of tasks, including generating content. At this point, you could ask a model to create a video of a car going through a stop sign.

Foundation models can create content, but they don’t know the difference between right and wrong, or even what is and isn’t socially acceptable. When ChatGPT was first created, it required a great deal of human input to learn. OpenAI employed a large number of human workers all over the world to help hone the technology, cleaning and labeling data sets and reviewing and labeling toxic content, then flagging it for removal. This human input is a large part of what has made ChatGPT so revolutionary.

What kinds of neural networks are used in deep learning?

There are three types of neural networks used in deep learning:

  • Feed-forward neural network . In this simple neural network, first proposed in 1958, information moves in only one direction: forward from the model’s input layer to its output layer, without ever traveling backward to be reanalyzed by the model. That means you can feed, or input, data into the model, then “train” the model to predict something about different data sets. As just one example, feed-forward neural networks are used in banking, among other industries, to detect fraudulent financial transactions. Here’s how it works: first, you train a model to predict whether a transaction is fraudulent based on a data set you’ve used to manually label transactions as fraudulent or not. Then you can use the model to predict whether new, incoming transactions are fraudulent so you can flag them for closer study or block them outright.

Convolutional neural network (CNN) . CNNs are a type of feed-forward neural network whose connectivity connection is inspired by the organization of the brain’s visual cortex, the part of the brain that processes images. As such, CNNs are well suited to perceptual tasks, like being able to identify bird or plant species based on photographs. Business use cases  include diagnosing diseases from medical scans or detecting a company logo in social media to manage a brand’s reputation or to identify potential joint marketing opportunities.

Here’s how they work:

  • First, the CNN receives an image—for example, of the letter “A”—that it processes as a collection of pixels.
  • In the hidden layers, the CNN identifies unique features—for example, the individual lines that make up the letter “A.”
  • The CNN can then classify a different image as the letter “A” if it finds that the new image has the same unique features previously identified as making up the letter.

Recurrent neural network (RNN) . RNNs are artificial neural networks whose connections include loops, meaning the model both moves data forward and loops it backward to run again through previous layers. RNNs are helpful for predicting a sentiment or an ending of a sequence, like a large sample of text, speech, or images. They can do this because each individual input is fed into the model by itself as well as in combination with the preceding input.

Continuing with the banking example, RNNs can help detect fraudulent financial transactions just as feed-forward neural networks can, but in a more complex way. Whereas feed-forward neural networks can help predict whether one individual transaction is likely to be fraudulent, recurrent neural networks can “learn” from the financial behavior of an individual—such as a sequence of transactions like a credit card history—and measure each transaction against the person’s record as a whole. It can do this in addition to using the general learnings of the feed-forward neural network model.

For more on deep learning, and neural networks and their use cases, see our executive’s guide to AI . Learn more about McKinsey Digital .

Which sectors can benefit from machine learning?

McKinsey collated more than 400 use cases of machine and deep learning across 19 industries and nine business functions. Based on our analysis, we believe that nearly any industry can benefit  from machine and deep learning. Here are a few examples of use cases that cut across several sectors:

  • Predictive maintenance. This use case is crucial for any industry or business that relies on equipment. Rather than waiting until a piece of equipment breaks down, companies can use predictive maintenance  to project when maintenance will be needed, thereby reducing downtime and lowering operating costs. Machine learning and deep learning have the capacity to analyze large amounts of multifaceted data, which can increase the precision of predictive maintenance. For example, AI practitioners can layer in data from new inputs, like audio and image data, which can add nuance to a neural network’s analysis.
  • Logistics optimization. Using AI to optimize logistics  can reduce costs through real-time forecasts and behavioral coaching. For example, AI can optimize routing of delivery traffic, improving fuel efficiency and reducing delivery times.
  • Customer service. AI techniques in call centers  can help enable a more seamless experience for customers and more efficient processing. The technology goes beyond understanding a caller’s words: deep learning analysis of audio can assess a customer’s tone. If the automated call service detects that a caller is getting upset, the system can reroute to a human operator or manager.

Learn more about McKinsey Digital .

What are some examples of organizations using machine learning?

Canny leaders have been applying machine learning to business problems for years. Here are a few examples :

  • Teams in the National Basketball Association have worked with start-up Second Spectrum, which uses machine learning to digitize teams’ games to create predictive models. The models allow coaches to distinguish between, as CEO Rajiv Maheswaran puts it, “a bad shooter who takes good shots and a good shooter who takes bad shots.”
  • More than a dozen European banks have replaced older statistical-modeling approaches with machine learning techniques. In some cases, they’ve experienced 10 percent increases in sales of new products, 20 percent savings in capital expenditures, 20 percent increases in cash collections, and 20 percent declines in churn.
  • Vistra, a large US-based power producer, built and deployed  an AI-powered heat rate optimizer based on a neural network model. The model combed through years of data to help Vistra attain the most efficient thermal efficiency of a specific power plant.

How can mainstream organizations capture the full potential of machine learning?

To help capture the full potential value of AI and machine learning technologies, mainstream adopters can consider the following actions :

  • Reimagine challenges as machine learning problems . Not all business problems are machine learning problems. But some can be reframed as machine learning problems, which can enable novel approaches to creating solutions. This requires appropriate data sources, as well as clear definitions of ideal outcomes and objectives.
  • Put machine learning at the core of enterprise architecture . Organizations can put machine learning at the core of their enterprise tech platforms, not as an auxiliary to systems architectures built around rules-based logic.
  • Develop a human-centered talent strategy . To capture these possibilities, enterprises need workforces capable of guiding technological adoption and proactively shaping how employees use new AI tools.

Machine learning is here to stay. Gen AI has shone a light on machine learning, making traditional AI visible—and accessible—to the general public for the first time. The efflorescence of gen AI will only accelerate the adoption of broader machine learning and AI. Leaders who take action now can help ensure their organizations are on the machine learning train as it leaves the station.

Learn more about McKinsey Digital . And check out machine learning–related job opportunities if you’re interested in working with McKinsey.

Articles referenced:

  • “ Author Talks: Dr. Fei-Fei Li sees ‘worlds’ of possibilities in a multidisciplinary approach to AI ,” December 11, 2023
  • “ A new and faster machine learning flywheel for enterprises ,” March 10, 2023, Medha Bankhwal  and Roger Roberts
  • “ The state of AI in 2022—and a half decade in review ,” December 6, 2022, Michael Chui , Bryce Hall , Helen Mayhew , Alex Singla , and Alex Sukharevsky
  • “ Operationalizing machine learning in processes ,” September 27, 2021, Rohit Panikkar , Tamim Saleh , Maxime Szybowski, and Rob Whiteman
  • “ An executive’s guide to AI ,” November 17, 2020, Michael Chui , Vishnu Kamalnath, and Brian McCarthy
  • “ An executive’s guide to machine learning ,” June 1, 2015, Dorian Pyle and Cristina San José

""

Want to know more about machine learning ?

Related articles.

Operationalizing machine learning in processes

Operationalizing machine learning in processes

Machine-learning_1536x1536_Original

An executive’s guide to machine learning

problem solving machine learning

Author Talks: Dr. Fei-Fei Li sees ‘worlds’ of possibilities in a multidisciplinary approach to AI

InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architects.

View an example

We protect your privacy.

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

  • English edition
  • Chinese edition
  • Japanese edition
  • French edition

Back to login

Login with:

Don't have an infoq account, helpful links.

  • About InfoQ
  • InfoQ Editors

Write for InfoQ

  • About C4Media

Choose your language

problem solving machine learning

Discover transformative insights to level up your software development decisions. Use code LIMITEDOFFERIDSBOSTON24 for an exclusive offer.

problem solving machine learning

Get practical advice from senior developers to navigate your current dev challenges. Use code LIMITEDOFFERIDSMUNICH24 for an exclusive offer.

problem solving machine learning

Level up your software skills by uncovering the emerging trends you should focus on. Register now.

problem solving machine learning

Your monthly guide to all the topics, technologies and techniques that every professional needs to know about. Subscribe for free.

InfoQ Homepage News Challenges and Solutions for Building Machine Learning Systems

Challenges and Solutions for Building Machine Learning Systems

May 09, 2024 4 min read

Ben Linders

According to Camilla Montonen, the challenges of building machine learning systems have to do mostly with creating and maintaining the model. MLOps platforms and solutions contain components needed to build machine systems, but MLOps is not about the tools; it is a culture and a set of practices. Montonen suggests that we should bridge the divide between practices of data science and machine learning engineering.

Camilla Montonen spoke about building machine learning systems at NDC Oslo 2023 .

Challenges that come with deploying machine learning systems to production include how to clean, curate and manage model training data, how to efficiently train and evaluate the model, and how to measure whether or not the model continues to perform well in production, Montonen said. Other challenges are how to calculate and serve the predictions the model makes on new data, how to handle missing and corrupted data and edge cases, how and when to efficiently re-train this model, and how to version control and store these different versions, she added.

There is a set of common components that are usually part of a machine learning system, Montonen explained: a feature store, an experiment tracking system so that data scientists can easily version the various models that they produce, a model registry or model versioning system to keep track of which model is currently deployed to production, and a data quality monitoring system to detect when some issues with data quality might arise. These components are now part of many MLOps platforms and solutions that are available on the market, she added.

Montonen argued that the tools and components do solve the problems for the systems they were designed for, but often fail to account for the fact that in a typical company, the evolution of a machine learning system is governed by factors that are often far outside of the realm of technical issues.

MLOps is not about the tools, it’s about the culture, Montonen claimed. It is not about just adding a model registry or a feature store to your stack, but about how the people who build and maintain your system interact with it, and reducing any and all friction points to a minimum, as she explained:

This can involve everything from thinking about git hygiene in your ML code repositories, designing how individual components of pipelines should be tested, thinking about how to keep feedback loops between data science experimentation environments and production environments, and maintaining a high standard of engineering throughout the code base.

We should strive towards bridging the divide between the practice of data science, which prioritizes rapid experimentation and iteration over robust production quality code, and the practice of machine learning engineering, which prioritizes version control, controlled delivery and deployment to production via CI/CD pipelines, automated testing and more thoughtfully crafted production code that is designed to be maintained over a longer period of time, Montonen said.

Instead of immediately adopting a bunch of MLOps tools that are more likely to complicate your problems instead of solving them, Montonen suggested going back to basics:

Begin with an honest diagnosis of why your machine learning team is struggling.

The largest gains in terms of data scientists’ development velocity and production reliability can be gained with a few surprisingly basic and simple investments into testing, CI/CD, and git hygiene, Montonen concluded.

InfoQ interviewed Camilla Montonen about building machine learning systems.

InfoQ: How well do the currently available MLOps tools and components solve the problem that software engineers are facing?

Camilla Montonen : Most big MLOps tooling providers grew out of projects started by engineers working on large language model training or computer vision model training, and are great for those use cases. They fail to account for the fact that in most small and medium sized companies that are not Big Tech, we are not training SOTA computer vision models; we’re building models to predict customer churn or help our users find interesting items. In these particular cases, these ready-made components are often not flexible enough to account for the many idiosyncrasies that accumulate in ML systems as time goes on.

InfoQ: What’s your advice to companies that are struggling with deploying their machine learning systems?

Montonen : Find out what your machine learning team is struggling with before introducing any tools or solutions. Is the code base complex? Are data scientists deploying ML pipeline code into production from their local machines, making it hard to keep track of which code changes are running in production? Is it hard to pinpoint what code changes are responsible for bugs that arise in production? Perhaps you need to invest in some refactoring and a proper CI/CD process and tooling. Are your new models performing worse in online A/B tests compared to your production models, but you have no insight into why this happens? Perhaps you need to invest in a simple dashboard that tracks key metrics. Having a diagnosis of your current problems will help you identify what tools will actually solve them and help you reason about tradeoffs. Most MLOps tools require some learning/maintenance/integration efforts so it is good to know that the problem you are solving with them is worth these tradeoffs.

About the Author

Rate this article, this content is in the culture & methods topic, related topics:.

  • Culture & Methods
  • AI, ML & Data Engineering
  • Data Science
  • Collaboration
  • Agile Conferences
  • Artificial Intelligence
  • Machine Learning

Related Editorial

Related sponsored content, popular across infoq, you don’t need a css framework, amazon s3 unauthorized request billing issue: an empty s3 bucket can dramatically increase the costs, open-source apm and observability tool coroot now ga, 9 steps towards an agile architecture, polyglot programming with webassembly: a practical approach, object-oriented ux (ooux) with sophia prater, related content, the infoq newsletter.

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

problem solving machine learning

H. Milton Stewart School of Industrial and Systems Engineering

College of engineering, galactic jedi: fusing star wars passion with problem-solving in machine learning advancements.

The Force of ML awakens    

"This application problem is indeed significant and worthy of serious consideration. While you may assert that I lack the capability to resolve it, it remains undeniable that this issue holds considerable importance."  

With this wisdom imparted by his advisor and professor Jianjun Shi, Shancong Mou started his academic journey on developing Artificial Intelligence (AI) and Machine Learning (ML)-enabled data fusion methodologies aimed at addressing real and significant engineering challenges. 

As a fan of both the epic Star Wars saga and machine learning, Mou, expressed his enthusiasm for leveraging the force of ML for quality and productivity improvement in advanced manufacturing systems. 

"If a problem remains unsolved, it highlights both its complexity and the urgent need for creative solutions." 

Jedi-Level Precision 

From the iconic lightsaber to the X-wing aircraft, Mou shared his fascination with Star Wars’ depiction of advanced control systems and aircraft maneuverability.  

“The spacecraft in those films maneuver with remarkable precision, navigating the narrow corridors of the Death Star effortlessly. Crafting such aircraft would entail integrating millions of intricate parts manufactured to the highest standards. Achieving such precision in manufacturing, coupled with stringent quality control, would indeed be challenging but groundbreaking.” 

Much like a Jedi, governed by a problem-solving philosophy and mindset, one of Mou’s research project explores physics-informed machine learning for the control and design optimization of complex engineering systems.  

One major application is the reduction of variation in fuselage assembly processes, a critical step in the manufacturing process of modern airplanes, such as the Boeing 787 . [1] 

In another research avenue, Mou innovates with generative models, specifically generative adversarial networks (GANs) , to learn and interpret the underlying patterns of normal signals.  

This approach, termed 'robust GAN inversion', transcends traditional statistical methods to reconstruct signals from corruption, offering a distributional-assumption-free perspective, which provides a tool for unsupervised fine-grained anomaly detection.  

These elements are crucial in high-value and safety-critical industrial applications, such as personal electronic manufacturing process quality monitoring.  

Advanced Sensors & ML: A New Hope for Manufacturing  

The synergy between increasingly advanced sensor capabilities and the development of cutting-edge ML methodologies acts as a crucial factor in achieving unprecedented levels of product detail monitoring and defect detection.  

Mou noted, "Quality and productivity improvement is the goal of my research. The development of sensing technology offers new opportunities and challenges for further adopting/developing advanced ML algorithms to solve this problem." 

Mou’s vision for the integration of ML in IE, mirrors the innovative spirit seen in the Star Wars saga, potentially leading towards a future where technology and human expertise converge to create smarter, cleaner and more efficient manufacturing systems. 

[1] Dominic Gates, “Boeing delivers its first 787 Dreamliner in more than a year”  

https://www.seattletimes.com/business/boeing-aerospace/boeing-delivers-its-first-787-dreamliner-in-more-than-a-year/  

Author: Atharva Anand Dave 

problem solving machine learning

Shancong Mou

Machine Unlearning in 2024

41 minute read

Written by Ken Liu ∙ May 2024

1. A bit of history & motivations for unlearning

2.1. exact unlearning.

  • 2.2. "Unlearning" via differential privacy
  • 2.3. Empirical unlearning with known example space
  • 2.4. Empirical unlearning with unknown example space

2.5. Just ask for unlearning?

3. evaluating unlearning, 4.1. the spectrum of unlearning hardness, 4.2. copyright protection, 4.3. retrieval-based ai systems, 4.4. ai safety.

As our ML models today become larger and their (pre-)training sets grow to inscrutable sizes, people are increasingly interested in the concept of machine unlearning to edit away undesired things like private data, stale knowledge, copyrighted materials, toxic/unsafe content, dangerous capabilities, and misinformation, without retraining models from scratch.

Machine unlearning can be broadly described as removing the influences of training data from a trained model. At its core, unlearning on a target model seeks to produce an unlearned model that is equivalent to—or at least “behaves like”—a retrained model that is trained on the same data of target model, minus the information to be unlearned.

There’s a lot hidden in the above description. How do we describe the information to be unlearned? Do we always have ground-truth retrained models? If not, how do we actually evaluate the unlearning? Can we even verify and audit the unlearning? Is pretending to unlearn, as humans often do, sufficient? Is unlearning even the right solution? If so, for what problems?

The precise definitions of unlearning, the techniques, the guarantees, and the metrics/evaluations would depend on:

  • The ML task (e.g., binary classification or language modeling);
  • The data to unlearn (e.g., a set of images, news articles, or the knowledge of making napalm );
  • The unlearning algorithm (e.g., heuristic fine-tuning vs deleting model components);
  • The goal of unlearning (e.g., for user privacy or harmfulness removal).

In this educational post, I hope to give a gentle, general ML audience introduction to machine unlearning and touch on things like copyright protection , New York Times v. OpenAI , right-to-be-forgotten , NeurIPS machine unlearning challenge , retrieval-based AI systems , AI safety , along with some of my thoughts on the field. While unlearning is broad topic applicable to most ML models, we will focus a lot on foundation models .

problem solving machine learning

People have thought about the unlearning problem for a while now . The initial research explorations were primarily driven by Article 17 of GDPR (European Union’s privacy regulation), often referred to as “ right-to-be-forgotten ” ( RTBF ) since 2014. RTBF basically says a user has the right to request deletion of their data from a service provider (e.g. deleting your Gmail account).

RTBF was well-intentioned. It was also very actionable when said service providers store user data in a structured way, like how Google removed a bunch of links from its index in repsonse to RTBF requests.

However, RTBF wasn’t really proposed with machine learning in mind. In 2014, policymakers wouldn’t have predicted that deep learning will be a giant hodgepodge of data & compute, and that separating and interpreting this hodgepodge turned out to be hard. The hardness of erasing data from ML models has subsequently motivated research on what is later referred to as “ data deletion ” and “ machine unlearning ”.

A decade later in 2024, user privacy is no longer the only motivation for unlearning. We’ve gone from training small convolutional nets on face images to training giant language models on pay-walled, copyrighted , toxic , dangerous, and otherwise harmful content, all of which we may want to “erase” from the ML models—sometimes with access to only a handful of examples. The nature of the models has changed too. Instead of using many small specialized models each good at one task, people started using a single giant model that knows just about any task.

Currently, I think the motivations for unlearning fall into two categories:

Access revocation (think unlearning private and copyrighted data). In an ideal world, data should be thought of as “borrowed” (possibly unpermittedly) and thus can be “returned”, and unlearning should enable such revocation.

Unlearning is challenging from this perspective. One key difficulty is that our limited understanding of deep learning itself makes data trained into a model akin to “consumables” (which can’t just be “returned” after consumption). Data may also be non-fungible (e.g. your chat history) and may even be thought of as labor with its own financial and control interests. Another challenge is that access revocation may require a proof of unlearning; as we will explore in the coming sections, this isn’t always possible.

These difficulties suggest that it’s perhaps also worth revising laws like RTBF and thinking about alternatives such as data markets , where data owners are properly compensated so they won’t want to request unlearning in the first place. To illustrate, suppose Bob ate Alice’s cheesecake (data), Alice would much rather Bob pay her or return something equivalent (compensation) than Bob puking to his pre-eating state (unlearning).

In practice, one way to implement access revocation is via some form of periodic re-training of the base model. Many model providers already do this to keep their models competitive and up-to-date. For example, OpenAI can collect a bunch of unlearning requests, and batch-satisfy them during the re-training every year (or, guided by RTBF’s “ undue delay ” period by which the request must be satisfied). More broadly, this suggests socio-technical solutions for unlearning: policymakers can mandate such periodic re-training and set economically viable deadlines to offload the costs to the model owners.

Model correction & editing (think toxicity, bias, stale/dangerous knowledge removal). That is, the model was trained on something undesirable and we’d like to fix it. This is closely related to the model editing literature. The concept of “ corrective machine unlearning ”, where unlearning serves to correct the impact of bad data, was recently proposed to capture this motivation. From this perspective, unlearning may also be viewed as a post-training risk mitigation mechanism for AI safety concerns (discussed further in Section 4).

Unlike access revocation, we could be more lenient towards with model correction since the edit is more of a desire than a necessity mandated by law, much like model accuracy on image classification or toxicity of generated text. (Of course, these can cause real harm too.) Here, we won’t necessarily need formal guarantees for the unlearning to be practically useful; we have plenty of examples where people would happily deploy models that are deemed “sufficiently safe”. The recent WMDP benchmark , which quizzes a model on hazardous knowledge, is a good example of empirically evaluating unlearning efficacy.

2. Forms of unlearning

Unlearning is trivially satisfied if we can just retrain the model without the undesired data. However, we want something better because (1) retraining can be expensive and (2) it can be a lot of work just to find out what to remove from training data—think finding all Harry Potter references in a trillion tokens. Unlearning techniques essentially seek to mitigate or avoid this retraining cost while producing identical or similar results.

The unlearning literature can roughly be categorized into the following:

  • Exact unlearning
  • “Unlearning” via differential privacy
  • Empirical unlearning, where data to be unlearned are precisely known (training examples)
  • Empirical unlearning, where data to be unlearned are underspecified (think “knowledge”)
  • Just ask for unlearning?

Forms 2-4 are sometimes known as “ approximate unlearning ” in that the unlearned model approximates the behavior of the retrained model. Form 5 is quite new and interesting, and more specific to instruction-following models.

problem solving machine learning

In the following, we will go through what each of these types roughly looks like, along with what I think are the promises, caveats, and questions to ask looking forward.

Exact unlearning roughly asks that the unlearned model and the retrained model to be distributionally identical ; that is, they can be exactly the same under fixed randomness.

Techniques for exact unlearning are characterized by the early work of Cao & Yang and SISA . In SISA, a very simple scheme, the training set is split into $N$ non-overlapping subsets, and a separate model is trained for each subset. Unlearning involves retraining the model corresponding to and without the data points to be unlearned. This reduces cost from vanilla retraining by $1/N$ (cheaper if we keep model checkpoints). Inference then involves model ensembling. 1

problem solving machine learning

More generally, the essence of exact unlearning of this form is that we want modular components in the learning algorithm to correspond to different (potentially disjoint) sets of the training examples.

There are several benefits of exact unlearning:

  • The algorithm is the proof . If we implement something like SISA, we know by design that the unlearned data never contributed to other components. As it turns out, formally proving the model has unlearned something is quite challenging otherwise.
  • It turns the unlearning problem into an accuracy/efficiency problem. This makes exact unlearning more approachable due to the messiness of unlearning evaluation and lack of benchmarks.
  • Interpretability by design . By providing a structure to learning, we also have better understanding of how certain data points contribute to performance.

The main drawback seems obvious: modern scaling law of large models argues against excessive data & model sharding as done in SISA. Or does it? I think it would be very interesting to revisit sharding in the context of large models, in light of the recent model merging literature that suggests the feasibility of weight-space merging between large models. As we’ll learn in the coming sections, the messiness of approximate unlearning and its evaluation, especially in the context of large models, makes exact unlearning very appealing.

2.2. “Unlearning” via differential privacy

This line of work roughly says: if the model behaves more or less the same with or without any particular data point, then there’s nothing we need to unlearn from that data point. More broadly, we are asking for distributional closeness between the unlearned and the retrained models.

For readers unfamilar with differential privacy (DP) in machine learning, DP defines a quantifiable indistinguishability guarantee between two models $M$, $M’$ trained on datasets $X$, $X’$ that differ in any single training example. The canonical procedure, DP-SGD , works by clipping the L2-norm of the per-example gradients and injecting some per-coordinate Gaussian noise to the gradients. The idea is that the noise would mask or obscure the contribution of any single gradient (example), such that the final model isn’t sensitive any exmaple. It is usually denoted by ($\varepsilon, \delta$)-DP; the stronger the noise, the smaller the scalars ($\varepsilon, \delta$), the more private.

The intuition is that if an adversary cannot (reliably) tell apart the models, then it is as if this data point has never been learned—thus no need to unlearn. DP can be used to achieve this form of unlearning, but due to the one-sidedness of unlearning (where we only care about data removal, not addition), DP is a strictly stronger definition . This notion of unlearning is sometimes known as “ ($\alpha, \beta$)-unlearning ” where ($\alpha, \beta$) serve similar roles as ($\varepsilon, \delta$) to measure distributional closeness.

Example techniques along this direction include: (1) storing checkpoints of (DP) convex models and unlearning is retraining from those checkpoints; and (2) on top of the previous technique, add SISA for adaptive unlearning requests (i.e. those that come in after observing the published model).

DP-based unlearning is good in that it gives some form of a statistical guarantee. However, there are some important considerations that limit its applicability to large models :

  • Many such unlearning results apply only to convex models or losses .
  • What levels of unlearning (values of $(\varepsilon, \delta)$-DP or $(\alpha, \beta)$-unlearning) are sufficient? Who decides?
  • For large models, current ML systems don’t fit well with the per-example workloads of DP-like procedures. The memory overhead will also be prohibitive.
  • Moreoever, like DP, the guarantees can fall off quickly with more unlearning requests (at best the rate of $O(\sqrt k)$ with $k$ requests following DP composition theorems).
  • DP-like definitions implicitly assume we care about all data points equally . But some examples are more likely to receive unlearning request, and some examples would not have contributed to the learning at all.
  • DP-like procedures may also just hurt model accuracy a lot, sometimes in an unfair way.

For large models in particular, it’s also worth distinguishing the cases of unlearning pre-training data vs unlearning fine-tuning data . The latter is a lot more tractable; for example, we could indeed fine-tune large models with differential privacy but not so much with pre-training.

2.2.1. Forging and its implications on DP-like unlearning definitions

An unlearning procedure may sometimes require an external audit , meaning that we’d like to prove that the unlearning procedure has actually happened.

The main idea of “ forging ” is that there exists two distinct datasets that, when trained on, would produce the same gradients and (thus) the same models . This is true intuitively:

  • Think linear regression of points on a perfect line; removing any 1 point doesn’t change the fitted line;
  • Think mini-batch GD, where replacing one example gradient with the sum of several “fake” gradients would give the same batch gradient.

Forging implies that DP-based approximate unlearning may not be auditable —that is, the unlearning service provider cannot formally prove that the forget set is really forgotten. In fact, if we only look at the model weights, even exact unlearning may not be auditable.

While one can brush this off as a theoretical result, it does mean that policymakers should think carefully about how a future version of “right-to-be-forgotten” (if any) should look like and whether similar policies are legally and technically enforceable.

Indeed, what qualifies as an “audit” could very well be definition and application dependent. If the auditor only cares that the unlearned model performs poorly on a specified set of inputs (say on a set of face images), then even empirical unlearning is “auditable” (see next section).

2.3. Empirical unlearning with known example space (“example unlearning”)

This line of work is essentially “training to unlearn” or “unlearning via fine-tuning”: just take a few more heuristically chosen gradient steps to shape the original model’s behavior into what we think the retrained model would do (while also optionally resetting some parameters in the model). It may also be referred to as “example unlearning”, since the training, retain, and forget sets are often clearly defined.

The NeurIPS 2023 Machine Unlearning Challenge collected many methods along this direction. The challenge roughly runs as follows:

  • You are given a face image dataset with designated retain/forget example splits for the training set, a target model trained on everything, and a secret model trained only on the retain set.
  • You are asked to design an unlearning algorithm that produces unlearned model(s) from the target model that “match” the secretly kept model.
  • The “match” or evaluation metric uses a DP-like output-space similarity over 512 seeds: for each forget example, compute an “empirical $\varepsilon$” over 512 unlearned models based on true/false positive rates of an adversary (also provided by the organizer), and aggregate across examples.
  • All models are a small ConvNet.

To give an intuition about how well empirical unlearning is doing without fully explaining the metric: the ground-truth retrained model gets about ~0.19, the winning submission gets to ~0.12, and the baseline (simple gradient ascent on forget set) is ~0.06. 2

So what do the winning ideas look like? Something along the lines of the following:

  • Gradient ascent on the forget set;
  • Gradient descent on the retain set (and hope that catastrophic forgetting takes care of unlearning);
  • Gradient descent on the forget set, but with uniformly random labels (to “confuse” the model);
  • Minimize KL divergence on outputs between unlearned model and original model on the retain set (to regularize unlearned model performance on unrelated data);
  • Re-initialize weights that had similar gradients on the retain set and forget sets, and finetune these weights on the retain set;
  • Prune 99% of weights by L1-norm and fine-tune on the retain set;
  • Reset first/last $k$ layers and fine-tune on the retain set; and
  • Heuristic/arbitrary combinations of the above.

Indeed, despite the heuristic nature of these approaches, these are what most empirical unlearning algorithms , especially those on large (language) models , are doing these days.

People explore empirical approaches because theoretical tools are usually impractical; for example, enforcing DP simply hurts accuracy and efficiency too much, even for the GPU rich. On the flip side, empirical methods are often fast and easy to implement, and their effects are often qualitatively visible.

Another key motivation for empirical unlearning is that counterfactuals are unclear, especially on LLMs. In deep learning, we often don’t know how the retrained model would behave on unseen data. What should the LLM think who Biden is, if not a politician? Should image classifiers give uniformly random predictions for unlearned images? Do they generalize? Or are they confidently wrong? Any of these is possible and it can be up to the practitioner to decide. It also means that behaviors that are equally plausible can lead to wildly different measurements (e.g., KL divergence between output distributions of unlearned & retrained model), complicating theoretical guarantees.

2.4. Empirical unlearning with unknown example space (“concept/knowledge unlearning”)

What if the train, retain, or forget sets are poorly specified or just not specified at all? Foundation models that train on internet-scale data may get requests to unlearn a “ concept ”, a “ fact ”, or a piece of “ knowledge ”, all of which we cannot easily associate a set of examples. The terms “ model editing ”, “ concept editing ”, “ model surgery ”, and “ knowledge unlearning ” are closely related to this notion of unlearning. 3

The underspecification of the unlearning requests means that we now have to deal with the notions of “ unlearning scope ” (or “ editing scope ”) and “ entailment ”. That is, unlearning requests may provide canonical examples to indicate what to unlearn, but the same information can manifest in the (pre-)training set in many different forms with many different downstream implications such that simply achieving unlearning on these examples—even exactly —would not suffice.

For example:

  • The association “Biden is the US president” is dispersed throughout various forms of text from news articles, books, casual text messages, or this very blog post. Can we ever unlearn all occurrences? Moreover, does unlearning Joe Biden also entail unlearning the color of Biden’s cat ?
  • Artists may request to unlearn art style by providing art samples, but they won’t be able to collect everything they have on the internet and their adaptations .
  • New York Times may request to unlearn news articles, but they cannot enumerate quotes and secondary transformations of these articles.

Such vagueness also suggests that unlearning pre-training data from large models are perhaps necessarily empirical: it is unlikely to derive formal guarantees if we can’t clearly specify what to (and what not to) unlearn in the trillions of tokens and establish clear information boundaries between different entities. An interesting implication of achieving unlearning empirically is that the unlearning itself can be unlearned .

What does existing work do, then, with underspecified unlearning requests? Most techniques are more or less the same as before , except now we also need to find the examples to fine-tune on. For example, attempting to unlearn Harry Potter involves asking GPT-4 to come up with plausible alternative text completions (e.g. that Mr. Potter studies baking instead of magic); and attempting to unlearn harmful behavior involves collecting examples of hatespeech.

Another set of techniques involves training the desired behavior (or its opposite) into task / control vectors and harnessing the capability of large models to undergo weight-space merging or activation steering . The fundamental approach of the above is more or less the same, nevertheless—obtaining these edit vectors involves (heuristically) designing what gradients to take and what data on which to take them. One could also frame the unlearning problem as an alignment problem and applies the forget examples with a DPO-like objective .

It turns out that powerful, instruction-following LLMs like GPT-4 are smart enough to pretend to unlearn . This means crafting prompts to induce a (sufficiently) safe behavior for the target unlearning application.

This is an interesting approach because no gradients are involved whatsoever (big plus from a systems perspective), and intuitively the end results could very well be as good as existing empirical unlearning techniques. Among different ways we could prompt, past work explored the following two directions.

Literally asking to pretend unlearning. We can ask in the system prompt to, say, pretend to not know who Harry Potter is. By design, this works best for common entities, facts, knowledge, or behaviors (e.g. the ability to utter like Trump) that are well-captured in the pre-training set, since the LLM needs to know it well to pretend not knowing it well . On the other hand, suppose now we’d like to unlearn the address of an obscure person; the pre-training set is so large that we suspect it’s part of training data. We now face a variant of the Streisand effect : is it even worth asking the model to pretend unlearning by accurately describing it in-context, and subsequently risk leaking it in subsequent model responses?

Few-shot prompting or “ in-context unlearning ”. Suppose we now have a clearly defined set of forget examples with corresponding labels. We can flip their labels and put them in the prompt, along with more retain examples with correct labels, with the intuition that the model would treat these falsely labelled forget examples as truths and act accordingly—much like one could jailbreak a model this way. 4 Indeed, this works best when the forget examples and the counterfactual labels are clearly defined and (somewhat) finite. It may work for factual associations (e.g. Paris is the captial of France) by enumerating a lot of examples, but unlikely to work for unlearning toxic behaviors (where space of possible outputs is much larger).

In a sense, these approaches are complementary as they work for different kinds of unlearning requests.

More broadly, one could imagine a boxed LLM system for unlearning through prompting, where :

  • Only the input and output interfaces are exposed (like ChatGPT);
  • Different instances of a powerful LLM are responsible for accurately mimicking different parts of a desired unlearning behavior (for example, one LLM instance specializes in general trivia-style QA while anoother handles sequence completions);
  • An orchestrator/router LLM decides which unlearning worker instance to call depending on the input; and
  • A composer/summarizer LLM that drafts the final output conforming to the desired unlearning behavior; it may also apply some output filtering.

Some readers may grumble about the heuristic nature of such prompting-based techniques; that there is no proof of unlearning whatsoever. We should keep in mind that fine-tuning based empirical unlearning, as most recent approaches do, is perhaps not fundamentally different. I think it ultimately comes down to the following questions:

  • Which of fine-tuning or prompting can better steer model behavior ?
  • Which of them are less susceptible to attacks (exposing less surfaces and/or requiring more effort for an adversary to revert the unlearning)?

My intuition of our current models says that both questions point to fine-tuning based unlearning, but this is very much up for debate and can change as we get more powerful models and better defense mechanisms. For example, the recent notion of an instruction hierarchy may help make such as an LLM system less susceptible to malicious prompts.

It might be useful to note that humans don’t really “unlearn” a piece of knowledge either. 5 In fact, by claiming to have unlearned something, we often have: (1) not only learned it well to be able to make the very claim that we have unlearned it, and (2) consciously decided that it’s no longer useful / beneficial to apply this knowledge to our current world state. Who is to say that unlearning for LLMs should be any different?

Unlearning is messy for many reasons. But one of the biggest broken things about unlearning is evaluation. In general, we care about three aspects:

  • Efficiency : how fast is the algorithm compared to re-training?
  • Model utility : do we harm performance on the retain data or orthogonal tasks?
  • Forgetting quality : how much of the “forget data” is actually unlearned? How fast can we recover (re-learn) them?

Evaluating efficiency and model utility are easier; we already measure them during training. The key challenge is in understanding the forgetting quality. 6

If the forget examples are specified, this feels easy too. For example, unlearning a particular image class intuitively means getting a near-chance accuracy on the images in that class. An evaluation protocol may measure accuracy (high on retain & test set, low on forget set) or the likelihood of the forget text sequences (lower the better).

However, these intuitive choices of metrics aren’t necessarily principled or extensible to settings like knowledge unlearning in LLMs. Expecting the model to perform poorly on an unlearned image ignores generalization , as the forget examples could very well be an interpolation/duplicate of certain retain examples. And we don’t always have oracle models that have never seen the forget examples; e.g., do we have LLMs that have never seen New York Times articles?

Evaluating unlearning on LLMs had been more of an art than science. For example, to unlearn “Harry Potter” as an entity, people would visualize how the token probabilities would decay for Harry Potter related text—and some other folks would come along and show that the model can indeed still answer Harry Potter trivia questions. The key issue has been the desperate lack of datasets and benchmarks for unlearning evaluation.

Since 2024, nevertheless, the benchmarking crisis is getting better. There are two recent projects worth highlighting:

  • TOFU : A benchmark focusing on unlearning individuals (specifically book authors). It involves asking GPT-4 to create fake author profiles, fine-tuning an LLM on them, and using the fine-tune as the unlearning target model and the original LLM as the oracle “retrained” model. It provides QA pairs on the generated fake authors to evaluate a model’s knowledge of these authors before/after applying unlearning.
  • WMDP : A benchmark focusing on unlearning dangerous knowledge, specifically on biosecurity, cybersecurity, and chemical security. It provides 4000+ multiple-choice questions to test a model’s hazardous knowledge before/after applying unlearning. As part of the report the authors also propose an activation steering based empirical unlearning method.

TOFU and WMDP depart from previous unlearning evaluation in that they are both “high-level” and focus on the model’s knowledge retention and understanding as opposed to example-level metrics like forget sequence perplexity. This is particularly relevant for LLMs as they are generally capabale of giving the same answer in many different ways that example-level metrics can’t capture.

Looking forward, I think application-oriented unlearning benchmarks like TOFU and WMDP, as opposed to instance-based evaluation like that of the NeurIPS unlearning challenge , are more useful for evaluating foundation models, owing to the multi-tasking nature of these models and the disparate definitions of “unlearning success” for each of these tasks. Indeed, one might imagine separate benchmarks on unlearning personally identifiable information (PII), copyrighted content, speech toxicity, or even model backdoors . For example, for unlearning PII, we might care about exact token regurgitation, whereas for toxicity, the unlearning metric would be the score reported by a ToxiGen classifier.

4. Practice, pitfalls, and prospects of unlearning

Unlearning is a hard problem, especially in the context of foundation models. As we actively research to make unlearning work in practice, it helps to philosophize a bit on what unlearning really means and whether it is the right solution for our current problems.

Intuitively, unlearning infrequent textual occurrences in LLMs like car accidents in Palo Alto should be easier than unlearning frequent occurrences like “Biden is the US president”, which is in turn easier than unlearning fundamental facts like “the sun rises every day”.

This spectrum of unlearning hardness emerges because as a piece of knowledge becomes more fundamental, it will have more associations with other pieces of knowledge (e.g. as premises or corollaries) and an exponentially larger unlearning scope. In fact, a piece of knowledge can be so embedded in the model’s implicit knowledge graph that it cannot be unlearned without introducing contraditions and harming the model’s utility. 7

This intuition implies that certain unlearning requests are much harder or simply unsatisfiable (any attempts are bound to have flaws). Indeed, humans have experiences that form the basis of their subsequent actions and world models; it is subjective, blurry, and philosophical as to what capacity can humans unlearn their formative past memories.

More broadly, the unlearning hardness problem applies to all kinds of models, and for reasons beyond embeddedness in a knowledge/entailment graph. Let’s consider two more seemingly contradictory intuitions for unlearning hardness:

  • An example seen later in the training should be easy to unlearn , since the model would have moved only slightly in weight space (e.g. due to decayed learning rate) and one could either just revert gradients or revert to a previous checkpoint (if stored). In contrast, examples seen early gets “built on” by later examples (in the curriculum learning sense), making them harder to unlearn.
  • An example seen later should be harder to unlearn , since examples seen earlier are gradually (or catastrophically) forgotten over the course of training; this may be especially true for LLMs.

Failure to reconcile these intuition would suggest that the interplay across memorization/forgetting , example importance (in the sense of data selection and coresets ), learning hardness (in the sense of prediction flips ), and unlearning hardness is unclear.

Here are some interesting research questions :

  • Is there a qualitative/fundamental difference between unlearning “easy” data (e.g. a local news event) and “hard” data (e.g. cats have four legs)?
  • If there is a spectrum of unlearning hardness, does there exist a threshold to tell apart what is “easy” and “hard”, and thus what is unlearnable or shouldn’t be unlearned? Does there exist, or can we train, such an oracle classifier? Can humans even tell?
  • How does this relate to influence functions and data attribution ? If a certain piece of knowledge (as it manifests in a model’s output) can be attributed to a larger fraction of the training data, does it make it harder to unlearn?
  • Can we benchmark how easy is it to unlearn something?

On the surface, unlearning seems to be a promising solution for copyright protection: if a model violates the copyright of some content, we could attempt to unlearn said content. 8 It is conceivable that to resolve copyright violations via unlearning, provable and exact unlearning is necessary (and possibly sufficient); on the other hand, approximate unlearning, without guarantees and with the possibility of being hacked, is certainly insufficient and likely unnecessary.

In practice, however, there is a lot more nuance due to the questionable effectiveness of current unlearning methods and the unclear legal landscape at the intersection of AI and copyright. Since I am no legal expert (and clearly none of this section constitutes legal advice), we will mostly focus on asking questions. The central question seems to be: is unlearning the right solution for copyright protection?

Recall that the fair use doctrine 9 permits limited use of copyrighted material contigent on four factors: (1) purpose and character of the use (“transformativeness”), (2) the nature of the copyrighted work, (3) amount and substantiality of the use, and (4) the effect on material’s value. If the use of copyrighted content in a model qualifies as fair use, then unlearning such content from the model is unnecessary.

Suppose a model is trained on some copyrighted content and is risking copyright violation, as in New York Times v. OpenAI . Should OpenAI invest in (empirical) unlearning algorithms on ChatGPT? Or should they focus on the transformativeness axis of fair use and invest in deploying empirical guardrails , such as prompting, content moderation, and custom alignment to prevent the model from regurgitating training data? The latter seems to be what’s being implemented in practice.

More broadly, there could also be economic solutions to copyright violation as alternatives to unlearning. For example, model owners may provide an exact unlearning service (e.g. via periodic retraining) while also offering to indemnify model users for copyright infringement in the mean time, as seen in the case of OpenAI’s “ Copyright Shield ”. People are also starting to explore how one may price copyrighted data using Shapley values. In general, it is unclear right now how much of a role (if any) unlearning will play for resolving copyright related issues. Exact unlearning (extending to retrieval-based systems, see next section) does hold promises since deletion is clean and provable, but it seems that legally binding auditing procedures/mechanisms need to be first in place.

An obvious alternative to unlearning is to not learn at all. One way this could manifest for an LLM is that we take all content from the pre-training set that may receive unlearning requests (e.g., New York Times articles) and put them to an external data/vector store. Any questions relating to them will then be RAG ’ed during inference, and any unlearning requests can be trivially satisfied by removing the data from the database. Min et al. demonstrates that this approach can be competitive to (though not quite matching) the trained baseline in terms of final perplexity.

Retrieval-based solutions are promising because of the increasing capabilities of the base models to reason in-context. However, there are few considerations before taking retrieval systems as the no-brainer solution to unlearning:

  • Removing protected content from pre-training corpus can be a hard de-duplication problem. Much like removing data contamination is hard , how can we be sure that paraphrases, quotations/citations, or other adaptations of the protected content are removed?
  • What if the data to be unlearned can’t be retrieved? Today we fine-tune many things into a model that aren’t documents or knowledge items; for example, it is unclear (yet) if things like as human preferences and desired behaviors (e.g. ability to write concisely) can be “retrieved” from a database.
  • Dumping stuff in-context can open new attack surfaces. Many RAG methods for LLMs work by putting related content in-context and ask the model to reason on them. Having the protected data in-context means they are now more susceptible to data extraction (simple prompting attacks may work just fine ).
  • Utility gap between retrieval and training. While there is evidence that retrieval-based solutions can be competitive, there is no general consensus that retrieval alone can replace fine-tune workloads; indeed, they can be complementary . More broadly, what if the space of unlearnable data is too large such that if all of it goes to an external store, the base model wouldn’t be as useful?

As models become more capable and are granted agency , one concrete application domain for unlearning that is gaining traction is AI safety .

Roughly speaking, safety concerns stem from a model’s knowledge (e.g., recipe of napalm ), behaviors (e.g., exhibiting bias ), and capabilities (e.g., hacking websites). Examining current AI systems and extrapolating forward, one may imagine the following examples to apply unlearning and improve AI safety:

  • removing hazardous knowledge , as seen in the WMDP benchmark;
  • removing model poisons and backdoors , where models respond to adversarially planted input triggers;
  • removing manipulative behaviors , such as the ability to perform unethical persuasions or deception;
  • removing bias and toxicity ; or even
  • removing power-seeking tendencies .

For safety-oriented applications, it is worth noting that unlearning should be treated as a post-training risk mitigation and defense mechanism , alongside existing tools like alignment fine-tuning and content filters. And as with any tool, we should view unlearning through its trade-offs in comparison to other tools in the toolbox (e.g., unlearning is more adaptive but more expensive than content filters), as opposed to brushing it off because of the potential lack of guarantees and efficacy.

Acknowledgements : The author would like to thank Aryaman Arora, Jiaao Chen, Irena Gao, John Hewitt, Shengyuan Hu, Peter Kairouz, Sanmi Koyejo, Xiang Lisa Li, Percy Liang, Eric Mitchell, Rylan Schaeffer, Yijia Shao, Chenglei Si, Pratiksha Thaker, Xindi Wu for helpful discussions and feedback before and during the drafting of this post. Any hot/bad takes are those of the author.

If you find this post helpful, it can be cited as:

Liu, Ken Ziyu. (Apr 2024). Machine Unlearning in 2024. Ken Ziyu Liu - Stanford Computer Science. https://ai.stanford.edu/~kzliu/blog/unlearning .

problem solving machine learning

Technically, SISA may not give exact unlearning in the sense of identical model distributions between the retrained model and the unlearned model, since after a sequence of unlearning requests, the data shards may end up in a state that we wouldn’t otherwise get into in the first place (e.g., some shards have way more data than others after unlearning). For practical purposes, nevertheless, this is subtle enough that the nice properties about exact unlearning, as discussed later in the section, would still hold.  ↩

It is also worth noting that the unlearning metric used in the NeurIPS unlearning challenge was disputed: why should we stick to a DP-like distributional closeness metric to a single secretly-kept re-trained model, when retraining itself can give a different model due to randomness?  ↩

More broadly, “unlearning” falls under the umbrella of “model editing” in the sense that a deletion is also an edit. Similarly, one could argue that the concept of “continual learning” falls under the umbrella too, where an association (say an input/label pair, or a piece of factual association) is updated by deleting of an old association and creating a new, clearly specified association. One could imagine using continual learning to help achieve unlearning and vice versa.  ↩

There is also evidence that in-context demonstrations mostly serve to elicit a particular behavior and that the labels don’t even matter that much. It’s unclear yet how we could reconcile this finding with “in-context unlearning”.  ↩

Humans do forget things though, which is different. The ML analogy might be “catastrophic forgetting”; humans similarly forget things under information overload.  ↩

In particular, recall that for exact unlearning , understanding forgetting quality isn’t strictly necessary because the algorithm would remove the forget data from the picture by construction (through retraining). Thus it may be acceptable even if the unlearned model does well on the forget set (as it could be a result of generalization from the retain set). We will focus the discussions of unlearning evaluation on approximate unlearning.  ↩

Note that this “embeddedness” of a piece of data is related but distinct from whether the data is in or out of distribution , which should also affects how an unlearning algorithm should behave (e.g. unlearning a perfect inlier should be no-op for an ideal unlearning algorithm).  ↩

Of course, we must first verify that such content has been trained on by the model in the first place. We can be almost certain that contents like Wikipedia articles are trained on, but we are generally less sure about a random blogpost somewhere on the internet. This is basically the membership inference problem.  ↩

Fair use is a doctrine applicable specifically in the United States. The reader should refer to related doctrines in corresponding jurisdictions, such as fair dealings in Commonwealth countries.  ↩

Ohio State nav bar

The Ohio State University

  • BuckeyeLink
  • Find People
  • Search Ohio State

problem solving machine learning

New machine learning algorithm promises advances in computing

Digital twin models may enhance future autonomous systems.

Systems controlled by next-generation computing algorithms could give rise to better and more efficient machine learning products, a new study suggests. 

Using machine learning tools to create a digital twin, or a virtual copy, of an electronic circuit that exhibits chaotic behavior, researchers found that they were successful at predicting how it would behave and using that information to control it.

Many everyday devices, like thermostats and cruise control, utilize linear controllers – which use simple rules to direct a system to a desired value. Thermostats, for example, employ such rules to determine how much to heat or cool a space based on the difference between the current and desired temperatures.

Robert Kent

As a result, advanced devices like self-driving cars and aircraft often rely on machine learning-based controllers, which use intricate networks to learn the optimal control algorithm needed to best operate. However, these algorithms have significant drawbacks, the most demanding of which is that they can be extremely challenging and computationally expensive to implement. 

Now, having access to an efficient digital twin is likely to have a sweeping impact on how scientists develop future autonomous technologies, said Robert Kent, lead author of the study and a graduate student in physics at The Ohio State University. 

“The problem with most machine learning-based controllers is that they use a lot of energy or power and they take a long time to evaluate,” said Kent. “Developing traditional controllers for them has also been difficult because chaotic systems are extremely sensitive to small changes.”

These issues, he said, are critical in situations where milliseconds can make a difference between life and death, such as when self-driving vehicles must decide to brake to prevent an accident.

The study was published recently in Nature Communications.

Compact enough to fit on an inexpensive computer chip capable of balancing on your fingertip and able to run without an internet connection, the team’s digital twin was built to optimize a controller’s efficiency and performance, which researchers found resulted in a reduction of power consumption. It achieves this quite easily, mainly because it was trained using a type of machine learning approach called reservoir computing. 

“The great thing about the machine learning architecture we used is that it’s very good at learning the behavior of systems that evolve in time,” Kent said. “It’s inspired by how connections spark in the human brain.”

Although similarly sized computer chips have been used in devices like smart fridges, according to the study, this novel computing ability makes the new model especially well-equipped to handle dynamic systems such as self-driving vehicles as well as heart monitors, which must be able to quickly adapt to a patient’s heartbeat.   

“Big machine learning models have to consume lots of power to crunch data and come out with the right parameters, whereas our model and training is so extremely simple that you could have systems learning on the fly,” he said. 

To test this theory, researchers directed their model to complete complex control tasks and compared its results to those from previous control techniques. The study revealed that their approach achieved a higher accuracy at the tasks than its linear counterpart and is significantly less computationally complex than a previous machine learning-based controller. 

“The increase in accuracy was pretty significant in some cases,” said Kent. Though the outcome showed that their algorithm does require more energy than a linear controller to operate, this tradeoff means that when it is powered up, the team’s model lasts longer and is considerably more efficient than current machine learning-based controllers on the market. 

“People will find good use out of it just based on how efficient it is,” Kent said. “You can implement it on pretty much any platform and it’s very simple to understand.” The algorithm was recently made available to scientists. 

Outside of inspiring potential advances in engineering, there’s also an equally important economic and environmental incentive for creating more power-friendly algorithms, said Kent. 

As society becomes more dependent on computers and AI for nearly all aspects of daily life, demand for data centers is soaring, leading many experts to worry over digital systems’ enormous power appetite and what future industries will need to do to keep up with it. 

And because building these data centers as well as large-scale computing experiments can generate a large carbon footprint , scientists are looking for ways to curb carbon emissions from this technology. 

To advance their results, future work will likely be steered toward training the model to explore other applications like quantum information processing, Kent said. In the meantime, he expects that these new elements will reach far into the scientific community. 

“Not enough people know about these types of algorithms in the industry and engineering, and one of the big goals of this project is to get more people to learn about them,” said Kent. “This work is a great first step toward reaching that potential.”

This study was supported by the U.S. Air Force’s Office of Scientific Research. Other Ohio State co-authors include Wendson A.S. Barbosa and Daniel J. Gauthier. 

More Ohio State News

Ohio state academic year culminates with student fashion show.

Students at The Ohio State University who are preparing to enter a variety of career fields showcased their creativity with a recent fashion show at the Ohio Union.

Putting sourdough under the microscope

Since sourdough starters are created from wild yeast and bacteria in the flour, it creates a favorable environment for many types of microbes to flourish. There can be more than 20 different species of yeast and 50 different species of bacteria in a sourdough starter. The most robust become the dominant species.

Ohio State President’s Buckeye Accelerator to launch student ventures

The third annual President’s Buckeye Accelerator has awarded $50,000 each to six teams of Ohio State student entrepreneurs to further their startup business ventures. 

Ohio State News

Contact: Admissions | Webmaster | Page maintained by University Communications

Request an alternate format of this page | Web Services Status | Nondiscrimination notice

ScienceDaily

New machine learning algorithm promises advances in computing

Digital twin models may enhance future autonomous systems.

Systems controlled by next-generation computing algorithms could give rise to better and more efficient machine learning products, a new study suggests.

Using machine learning tools to create a digital twin, or a virtual copy, of an electronic circuit that exhibits chaotic behavior, researchers found that they were successful at predicting how it would behave and using that information to control it.

Many everyday devices, like thermostats and cruise control, utilize linear controllers -- which use simple rules to direct a system to a desired value. Thermostats, for example, employ such rules to determine how much to heat or cool a space based on the difference between the current and desired temperatures.

Yet because of how straightforward these algorithms are, they struggle to control systems that display complex behavior, like chaos.

As a result, advanced devices like self-driving cars and aircraft often rely on machine learning-based controllers, which use intricate networks to learn the optimal control algorithm needed to best operate. However, these algorithms have significant drawbacks, the most demanding of which is that they can be extremely challenging and computationally expensive to implement.

Now, having access to an efficient digital twin is likely to have a sweeping impact on how scientists develop future autonomous technologies, said Robert Kent, lead author of the study and a graduate student in physics at The Ohio State University.

"The problem with most machine learning-based controllers is that they use a lot of energy or power and they take a long time to evaluate," said Kent. "Developing traditional controllers for them has also been difficult because chaotic systems are extremely sensitive to small changes."

These issues, he said, are critical in situations where milliseconds can make a difference between life and death, such as when self-driving vehicles must decide to brake to prevent an accident.

The study was published recently in Nature Communications.

Compact enough to fit on an inexpensive computer chip capable of balancing on your fingertip and able to run without an internet connection, the team's digital twin was built to optimize a controller's efficiency and performance, which researchers found resulted in a reduction of power consumption. It achieves this quite easily, mainly because it was trained using a type of machine learning approach called reservoir computing.

"The great thing about the machine learning architecture we used is that it's very good at learning the behavior of systems that evolve in time," Kent said. "It's inspired by how connections spark in the human brain."

Although similarly sized computer chips have been used in devices like smart fridges, according to the study, this novel computing ability makes the new model especially well-equipped to handle dynamic systems such as self-driving vehicles as well as heart monitors, which must be able to quickly adapt to a patient's heartbeat.

"Big machine learning models have to consume lots of power to crunch data and come out with the right parameters, whereas our model and training is so extremely simple that you could have systems learning on the fly," he said.

To test this theory, researchers directed their model to complete complex control tasks and compared its results to those from previous control techniques. The study revealed that their approach achieved a higher accuracy at the tasks than its linear counterpart and is significantly less computationally complex than a previous machine learning-based controller.

"The increase in accuracy was pretty significant in some cases," said Kent. Though the outcome showed that their algorithm does require more energy than a linear controller to operate, this tradeoff means that when it is powered up, the team's model lasts longer and is considerably more efficient than current machine learning-based controllers on the market.

"People will find good use out of it just based on how efficient it is," Kent said. "You can implement it on pretty much any platform and it's very simple to understand." The algorithm was recently made available to scientists.

Outside of inspiring potential advances in engineering, there's also an equally important economic and environmental incentive for creating more power-friendly algorithms, said Kent.

As society becomes more dependent on computers and AI for nearly all aspects of daily life, demand for data centers is soaring, leading many experts to worry over digital systems' enormous power appetite and what future industries will need to do to keep up with it.

And because building these data centers as well as large-scale computing experiments can generate a large carbon footprint, scientists are looking for ways to curb carbon emissions from this technology.

To advance their results, future work will likely be steered toward training the model to explore other applications like quantum information processing, Kent said. In the meantime, he expects that these new elements will reach far into the scientific community.

"Not enough people know about these types of algorithms in the industry and engineering, and one of the big goals of this project is to get more people to learn about them," said Kent. "This work is a great first step toward reaching that potential."

This study was supported by the U.S. Air Force's Office of Scientific Research. Other Ohio State co-authors include Wendson A.S. Barbosa and Daniel J. Gauthier.

  • Electronics
  • Telecommunications
  • Energy Technology
  • Computers and Internet
  • Neural Interfaces
  • Information Technology
  • Computer Science
  • Artificial intelligence
  • Alan Turing
  • Computing power everywhere
  • Grid computing
  • Data mining

Story Source:

Materials provided by Ohio State University . Original written by Tatyana Woodall. Note: Content may be edited for style and length.

Journal Reference :

  • Robert M. Kent, Wendson A. S. Barbosa, Daniel J. Gauthier. Controlling chaos using edge computing hardware . Nature Communications , 2024; 15 (1) DOI: 10.1038/s41467-024-48133-3

Cite This Page :

Explore More

  • Controlling Shape-Shifting Soft Robots
  • Brain Flexibility for a Complex World
  • ONe Nova to Rule Them All
  • AI Systems Are Skilled at Manipulating Humans
  • Planet Glows With Molten Lava
  • A Fragment of Human Brain, Mapped
  • Symbiosis Solves Long-Standing Marine Mystery
  • Surprising Common Ideas in Environmental ...
  • 2D All-Organic Perovskites: 2D Electronics
  • Generative AI That Imitates Human Motion

Trending Topics

Strange & offbeat.

IMAGES

  1. 8 problems that can be easily solved by Machine Learning

    problem solving machine learning

  2. How Machine Learning Can Help Solving Business Problems?

    problem solving machine learning

  3. 9 Real-World Problems that can be Solved by Machine Learning

    problem solving machine learning

  4. Machine Learning: Solving Real World Problems

    problem solving machine learning

  5. Problem solving process using machine learning

    problem solving machine learning

  6. 9 Real-World Problems that can be Solved by Machine Learning

    problem solving machine learning

VIDEO

  1. Machine Teaching Demo

  2. Probabilistic ML

  3. Extreme Learning Machine: Learning Without Iterative Tuning

  4. upper solving machine

  5. Top5 Machine Learning Facts! #facts #science #machinelearning #artificialintelligence

  6. Unlocking Success: Ask the Right Questions for Business Growth

COMMENTS

  1. Here are the Most Common Problems Being Solved by Machine Learning

    Although machine learning offers important new capabilities for solving today's complex problems, more organizations may be tempted to apply machine learning techniques as a one-size-fits all solution. ... Many modern machine learning problems take thousands or even millions of data samples (or far more) across many dimensions to build ...

  2. Introduction to Machine Learning Problem Framing

    Introduction to Machine Learning Problem Framing teaches you how to determine if machine learning (ML) is a good approach for a problem and explains how to outline an ML solution. Identify if ML is a good solution for a problem. Learn how to frame an ML problem. Understand how to pick the right model and define success metrics.

  3. Practical Machine Learning Problems

    We can read authoritative definitions of machine learning, but really, machine learning is defined by the problem being solved. Therefore the best way to understand machine learning is to look at some example problems. ... then only a problem can percept in the view of solving as machine learing problem. Reply. Akash Deep Singh September 27 ...

  4. How to Approach Machine Learning Problems

    Approaching Machine Learning Problems. When approaching machine learning problems, these are the steps you will need to go through: Setting acceptance criteria. Cleaning your data and maximizing ist information content. Choosing the most optimal inference approach. Train, test, repeat. Let us see these items in detail.

  5. Machine Learning: Algorithms, Real-World Applications and ...

    Machine Learning algorithms are mainly divided into four categories: Supervised learning, Unsupervised learning, Semi-supervised learning, and Reinforcement learning , as shown in Fig. 2. In the following, we briefly discuss each type of learning technique with the scope of their applicability to solve real-world problems.

  6. Problem-Solving with Machine Learning

    This course begins by helping you reframe real-world problems in terms of supervised machine learning. Through understanding the "ingredients" of a machine learning problem, you will investigate how to implement, evaluate, and improve machine learning algorithms. Ultimately, you will implement the k-Nearest Neighbors (k-NN) algorithm to ...

  7. PDF Solving Machine Learning Problems

    1.1. Solving Machine Learning Problems This work is the rst to successfully solve Machine Learning problems (or questions) using Machine Learning. Speci cally, our model handles the wide variety of topics covered in MIT's Introduction to Machine Learning course (6.036), except for coding questions and

  8. Machine Learning 101

    But that's OK — In fact, this is is part two of a series of articles in which I'll try and walk you through the main concepts of machine learning. You can find part one here. Once you followed the steps highlighted in the introduction article you should be in a position where you have a very clear and clearly articulated problem and a ...

  9. [2107.01238] Solving Machine Learning Problems

    Can a machine learn Machine Learning? This work trains a machine learning model to solve machine learning problems from a University undergraduate level course. We generate a new training set of questions and answers consisting of course exercises, homework, and quiz questions from MIT's 6.036 Introduction to Machine Learning course and train a machine learning model to answer these questions ...

  10. Basic training loops

    Solving machine learning problems. Solving a machine learning problem usually consists of the following steps: Obtain training data. Define the model. Define a loss function. Run through the training data, calculating loss from the ideal value; Calculate gradients for that loss and use an optimizer to adjust the variables to fit the data.

  11. What Is Machine Learning?

    Advantages & limitations of machine learning. Machine learning is a powerful problem-solving tool. However, it also has its limitations. Listed below are the main advantages and current challenges of machine learning: Advantages. Scale of data. Machine learning can handle problems that require processing massive volumes of data.

  12. What is Machine Learning? Definition, Types, Tools & More

    To prepare for a machine learning interview, review fundamental concepts in statistics, linear algebra, and machine learning algorithms, practice coding and implementing machine learning models, and be prepared to discuss your previous projects and problem-solving approaches in detail.

  13. Understand the problem

    Understand the problem. To understand the problem, perform the following tasks: State the goal for the product you are developing or refactoring. Determine whether the goal is best solved using, predictive ML, generative AI, or a non-ML solution. Verify you have the data required to train a model if you're using a predictive ML approach.

  14. PDF Solving Machine Learning Problems

    1.1. Solving Machine Learning Problems This work is the rst to successfully solve Machine Learning problems (or questions) using Machine Learning. Speci cally, our model handles the wide variety of topics, shown in Table1, covered in MIT's Introduction to Machine Learning course (6.036), except for

  15. Exercises

    This page lists the exercises in Machine Learning Crash Course. The majority of the Programming Exercises use the California housing data set . Programming exercises run directly in your browser (no setup required!) using the Colaboratory platform.

  16. Machine Learning: Process for solving any Machine Learning problem

    Now that we understand what Machine Learning is, let us see how it is applied to solve interesting business problems. Machine Learning Process. A process is defined as a series of actions of steps taken in order to achieve a particular end. Here, our process is achieving a successful implementation of a machine learning algorithm.

  17. Measuring Mathematical Problem Solving With the MATH Dataset

    Many intellectual endeavors require mathematical problem solving, but this skill remains beyond the capabilities of computers. To measure this ability in machine learning models, we introduce MATH, a new dataset of 12,500 challenging competition mathematics problems. Each problem in MATH has a full step-by-step solution which can be used to teach models to generate answer derivations and ...

  18. Machine Learning for Problem Solving

    Machine Learning for Problem Solving. 95-828. Units: 12. Description. The main premise of the course is to equip students with the intuitive understanding of machine learning concepts grounded in real-world applications. The course is designed to deliver the practical knowledge and experience necessary for recognizing and formulating machine ...

  19. 10 Real World Problems That Machine Learning Can Solve

    8. Innovations in the Finance Sector Including Stock Market. The functioning of the finance sector is about to change in the upcoming years completely. Thanks to technologies like mobile app development and machine learning, the stock market is at its all-time high.

  20. 9 Real-World Problems that can be Solved by Machine Learning

    Spam detection is one of the best and most common problems solved by Machine Learning. Neural networks employ content-based filtering to classify unwanted emails as spam. These neural networks are quite similar to the brain, with the ability to identify spam emails and messages. 2.

  21. The Future of Problem Solving: AI and Machine Learning at Work

    Welcome to our article on the future of problem solving.In today's rapidly evolving world, artificial intelligence (AI) and machine learning are playing an increasingly integral role in shaping the way we approach and solve complex problems in the workplace. With advancements in AI technology, businesses are leveraging these powerful tools to enhance efficiency, drive innovation, and make ...

  22. What is machine learning?

    Machine learning is a form of artificial intelligence (AI) that can adapt to a wide range of inputs, including large data sets and human instruction. (Some machine learning algorithms are specialized in training themselves to detect patterns; this is called deep learning, which we explore in detail in a separate Explainer .)

  23. Challenges and Solutions for Building Machine Learning Systems

    Follow. According to Camilla Montonen, the challenges of building machine learning systems have to do mostly with creating and maintaining the model. MLOps platforms and solutions contain ...

  24. Exploring Heuristic Algorithms in Machine Learning

    Heuristic algorithms simplify complex problems by making educated guesses, known as heuristics, which lead to a solution more quickly than brute-force methods. In machine learning, these ...

  25. Galactic Jedi: Fusing Star Wars Passion with Problem-Solving in Machine

    As a fan of both the epic Star Wars saga and machine learning, Mou, expressed his enthusiasm for leveraging the force of ML for quality and productivity improvement in advanced manufacturing systems. "If a problem remains unsolved, it highlights both its complexity and the urgent need for creative solutions." Jedi-Level Precision

  26. Machine Unlearning in 2024

    The concept of " corrective machine unlearning ", where unlearning serves to correct the impact of bad data, was recently proposed to capture this motivation. From this perspective, unlearning may also be viewed as a post-training risk mitigation mechanism for AI safety concerns (discussed further in Section 4).

  27. New machine learning algorithm promises advances in computing

    Ohio State News. [email protected]. Systems controlled by next-generation computing algorithms could give rise to better and more efficient machine learning products, a new study suggests. Using machine learning tools to create a digital twin, or a virtual copy, of an electronic circuit that exhibits chaotic behavior, researchers found that ...

  28. New machine learning algorithm promises advances in computing

    Ohio State University. "New machine learning algorithm promises advances in computing." ScienceDaily. ScienceDaily, 9 May 2024. <www.sciencedaily.com / releases / 2024 / 05 / 240509155536.htm ...