Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

A systematic literature review on hardware implementation of artificial intelligence algorithms

Profile image of Qassim Nasir

2020, The Journal of Supercomputing

Related Papers

2020 IEEE High Performance Extreme Computing Conference (HPEC)

Albert Reuther

research paper on artificial intelligence algorithms

2017 IEEE Custom Integrated Circuits Conference (CICC)

Computer systems and network

Yuriy Khoma

Frontiers in Neuroscience

Oliver Rhodes

IEEE Access

Aiman H. El-Maleh

Shawki Areibi

The rapid growth of data size and accessibility in recent years has instigated a shift of philosophy in algorithm design for artificial intelligence. Instead of engineering algorithms by hand, the ability to learn composable systems automatically from massive amounts of data has led to ground-breaking performance in important domains such as computer vision, speech recognition, and natural language processing. The most popular class of techniques used in these domains is called deep learning, and is seeing significant attention from industry. However, these models require incredible amounts of data and compute power to train, and are limited by the need for better hardware acceleration to accommodate scaling beyond current data and model sizes. While the current solution has been to use clusters of graphics processing units (GPU) as general purpose processors (GPGPU), the use of field programmable gate arrays (FPGA) provide an interesting alternative. Current trends in design tools ...

WSEAS TRANSACTIONS ON CIRCUITS AND SYSTEMS

Guennadi A Kouzaev

A variable predicate logic processor (VPLP) is proposed for artificial intelligence (AI), robotics, computer-aided medicine, electronic security, and other applications. The development is realized as an accelerating unit in AI computing machines. The difference from known designs, the datapath of this processor consists of universal gates changing on-the-fly their logical styles-subsets of predicate logic according to the data type and implemented instructions. In this paper, the processor’s reconfigurable gates and the main units are proposed, designed, modeled, and verified using a Field-Programmable Gate Array (FPGA) board and corresponding computer-aided design (CAD) tool. The implemented processor confirmed its reconfigurability on-the-fly performing testing codes. This processor is interesting in accelerating AI computing, molecular and quantum calculations in science, cryptography, computer-aided medicine, robotics, etc.

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Randy Huang

Divya lakshmi Duggisetty

Recent researches on neural network have shown signiicant advantage in machine learning over traditional algorithms based on handcraaed features and models. Neural network is now widely adopted in regions like image, speech and video recognition. But the high computation and storage complexity of neural network inference poses great diiculty on its application. CPU platforms are hard to ooer enough computation capacity. GPU platforms are the rst choice for neural network process because of its high computation capacity and easy to use development frameworks. On the other hand, FPGA-based neural network inference accelerator is becoming a research topic. With speciically designed hardware, FPGA is the next possible solution to surpass GPU in speed and energy ee-ciency. Various FPGA-based accelerator designs have been proposed with sooware and hardware optimization techniques to achieve high speed and energy eeciency. In this paper, we give an overview of previous work on neural network inference accelerators based on FPGA and summarize the main techniques used. An investigation from sooware to hardware, from circuit level to system level is carried out to complete analysis of FPGA-based neural network inference accelerator design and serves as a guide to future work.

There has been a great change in the computing environment after the introduction of deep learning systems in every day applications. The requirements of these systems are so vastly different from the conventional systems that a complete revision of the processor design strategies is necessary. Processors capable of streamed SIMD, MIMD, Matrix and systolic arrays do offer some solutions. As many new neural structures will be introduced over next years, new processor architectures need to evolve. In spite of the variability of Artificial Neural Network (ANN) structures, some feature will be common among them. We have tried to implement the hardware components required for most of the ANNs. This paper highlights some of the key issues related to hardware implementation of neural networks and suggests some possible solutions. However, the arena remains very open for innovation. Low precision arithmetic and approximation techniques suitable for acceleration of computational load of neur...

RELATED PAPERS

LAURA PLAZA VINUESA

Studia Socialia Cracoviensia

Adam Kulczycki

2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

Seyed Amir Hossein Aqajari

Imagination, Cognition and Personality

Alain Morin

Wojciech Adamczyk

Archeologia Postmedievale

Fabrizio Benente

Ethics, Medicine and Public Health

Sarah Markham

Journal of Environmental Engineering

Luca Bonomo

Biochimica et Biophysica Acta (BBA) - Lipids and Lipid Metabolism

Åke Nilsson

Nature Communications

William Phillips

Annals of Oncology

Alan Gonzalez

Annals of Global Health

Audu Emmanuel

Parasites & vectors

Eunice A B Galati

Acta Agrícola y Pecuaria

Omar Franco-Mora

David Miquelluti

REVISTA DE INICIAÇÃO CIENTÍFICA EM RELAÇÕES INTERNACIONAIS

Julia Rensi

Chemical Industry and Chemical Engineering Quarterly

Sebastian Balos

Marine Drugs

Ana Teresa Gomes

Journal de Physique IV (Proceedings)

shuichi ishikura

Bhartiya Krishi Anusandhan Patrika

umesh chinchmalatpure

RELATED TOPICS

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

How artificial intelligence is transforming the world

Subscribe to the center for technology innovation newsletter, darrell m. west and darrell m. west senior fellow - center for technology innovation , douglas dillon chair in governmental studies john r. allen john r. allen.

April 24, 2018

Artificial intelligence (AI) is a wide-ranging tool that enables people to rethink how we integrate information, analyze data, and use the resulting insights to improve decision making—and already it is transforming every walk of life. In this report, Darrell West and John Allen discuss AI’s application across a variety of sectors, address issues in its development, and offer recommendations for getting the most out of AI while still protecting important human values.

Table of Contents I. Qualities of artificial intelligence II. Applications in diverse sectors III. Policy, regulatory, and ethical issues IV. Recommendations V. Conclusion

  • 49 min read

Most people are not very familiar with the concept of artificial intelligence (AI). As an illustration, when 1,500 senior business leaders in the United States in 2017 were asked about AI, only 17 percent said they were familiar with it. 1 A number of them were not sure what it was or how it would affect their particular companies. They understood there was considerable potential for altering business processes, but were not clear how AI could be deployed within their own organizations.

Despite its widespread lack of familiarity, AI is a technology that is transforming every walk of life. It is a wide-ranging tool that enables people to rethink how we integrate information, analyze data, and use the resulting insights to improve decisionmaking. Our hope through this comprehensive overview is to explain AI to an audience of policymakers, opinion leaders, and interested observers, and demonstrate how AI already is altering the world and raising important questions for society, the economy, and governance.

In this paper, we discuss novel applications in finance, national security, health care, criminal justice, transportation, and smart cities, and address issues such as data access problems, algorithmic bias, AI ethics and transparency, and legal liability for AI decisions. We contrast the regulatory approaches of the U.S. and European Union, and close by making a number of recommendations for getting the most out of AI while still protecting important human values. 2

In order to maximize AI benefits, we recommend nine steps for going forward:

  • Encourage greater data access for researchers without compromising users’ personal privacy,
  • invest more government funding in unclassified AI research,
  • promote new models of digital education and AI workforce development so employees have the skills needed in the 21 st -century economy,
  • create a federal AI advisory committee to make policy recommendations,
  • engage with state and local officials so they enact effective policies,
  • regulate broad AI principles rather than specific algorithms,
  • take bias complaints seriously so AI does not replicate historic injustice, unfairness, or discrimination in data or algorithms,
  • maintain mechanisms for human oversight and control, and
  • penalize malicious AI behavior and promote cybersecurity.

Qualities of artificial intelligence

Although there is no uniformly agreed upon definition, AI generally is thought to refer to “machines that respond to stimulation consistent with traditional responses from humans, given the human capacity for contemplation, judgment and intention.” 3  According to researchers Shubhendu and Vijay, these software systems “make decisions which normally require [a] human level of expertise” and help people anticipate problems or deal with issues as they come up. 4 As such, they operate in an intentional, intelligent, and adaptive manner.

Intentionality

Artificial intelligence algorithms are designed to make decisions, often using real-time data. They are unlike passive machines that are capable only of mechanical or predetermined responses. Using sensors, digital data, or remote inputs, they combine information from a variety of different sources, analyze the material instantly, and act on the insights derived from those data. With massive improvements in storage systems, processing speeds, and analytic techniques, they are capable of tremendous sophistication in analysis and decisionmaking.

Artificial intelligence is already altering the world and raising important questions for society, the economy, and governance.

Intelligence

AI generally is undertaken in conjunction with machine learning and data analytics. 5 Machine learning takes data and looks for underlying trends. If it spots something that is relevant for a practical problem, software designers can take that knowledge and use it to analyze specific issues. All that is required are data that are sufficiently robust that algorithms can discern useful patterns. Data can come in the form of digital information, satellite imagery, visual information, text, or unstructured data.

Adaptability

AI systems have the ability to learn and adapt as they make decisions. In the transportation area, for example, semi-autonomous vehicles have tools that let drivers and vehicles know about upcoming congestion, potholes, highway construction, or other possible traffic impediments. Vehicles can take advantage of the experience of other vehicles on the road, without human involvement, and the entire corpus of their achieved “experience” is immediately and fully transferable to other similarly configured vehicles. Their advanced algorithms, sensors, and cameras incorporate experience in current operations, and use dashboards and visual displays to present information in real time so human drivers are able to make sense of ongoing traffic and vehicular conditions. And in the case of fully autonomous vehicles, advanced systems can completely control the car or truck, and make all the navigational decisions.

Related Content

Jack Karsten, Darrell M. West

October 26, 2015

Makada Henry-Nickie

November 16, 2017

Sunil Johal, Daniel Araya

February 28, 2017

Applications in diverse sectors

AI is not a futuristic vision, but rather something that is here today and being integrated with and deployed into a variety of sectors. This includes fields such as finance, national security, health care, criminal justice, transportation, and smart cities. There are numerous examples where AI already is making an impact on the world and augmenting human capabilities in significant ways. 6

One of the reasons for the growing role of AI is the tremendous opportunities for economic development that it presents. A project undertaken by PriceWaterhouseCoopers estimated that “artificial intelligence technologies could increase global GDP by $15.7 trillion, a full 14%, by 2030.” 7 That includes advances of $7 trillion in China, $3.7 trillion in North America, $1.8 trillion in Northern Europe, $1.2 trillion for Africa and Oceania, $0.9 trillion in the rest of Asia outside of China, $0.7 trillion in Southern Europe, and $0.5 trillion in Latin America. China is making rapid strides because it has set a national goal of investing $150 billion in AI and becoming the global leader in this area by 2030.

Meanwhile, a McKinsey Global Institute study of China found that “AI-led automation can give the Chinese economy a productivity injection that would add 0.8 to 1.4 percentage points to GDP growth annually, depending on the speed of adoption.” 8 Although its authors found that China currently lags the United States and the United Kingdom in AI deployment, the sheer size of its AI market gives that country tremendous opportunities for pilot testing and future development.

Investments in financial AI in the United States tripled between 2013 and 2014 to a total of $12.2 billion. 9 According to observers in that sector, “Decisions about loans are now being made by software that can take into account a variety of finely parsed data about a borrower, rather than just a credit score and a background check.” 10 In addition, there are so-called robo-advisers that “create personalized investment portfolios, obviating the need for stockbrokers and financial advisers.” 11 These advances are designed to take the emotion out of investing and undertake decisions based on analytical considerations, and make these choices in a matter of minutes.

A prominent example of this is taking place in stock exchanges, where high-frequency trading by machines has replaced much of human decisionmaking. People submit buy and sell orders, and computers match them in the blink of an eye without human intervention. Machines can spot trading inefficiencies or market differentials on a very small scale and execute trades that make money according to investor instructions. 12 Powered in some places by advanced computing, these tools have much greater capacities for storing information because of their emphasis not on a zero or a one, but on “quantum bits” that can store multiple values in each location. 13 That dramatically increases storage capacity and decreases processing times.

Fraud detection represents another way AI is helpful in financial systems. It sometimes is difficult to discern fraudulent activities in large organizations, but AI can identify abnormalities, outliers, or deviant cases requiring additional investigation. That helps managers find problems early in the cycle, before they reach dangerous levels. 14

National security

AI plays a substantial role in national defense. Through its Project Maven, the American military is deploying AI “to sift through the massive troves of data and video captured by surveillance and then alert human analysts of patterns or when there is abnormal or suspicious activity.” 15 According to Deputy Secretary of Defense Patrick Shanahan, the goal of emerging technologies in this area is “to meet our warfighters’ needs and to increase [the] speed and agility [of] technology development and procurement.” 16

Artificial intelligence will accelerate the traditional process of warfare so rapidly that a new term has been coined: hyperwar.

The big data analytics associated with AI will profoundly affect intelligence analysis, as massive amounts of data are sifted in near real time—if not eventually in real time—thereby providing commanders and their staffs a level of intelligence analysis and productivity heretofore unseen. Command and control will similarly be affected as human commanders delegate certain routine, and in special circumstances, key decisions to AI platforms, reducing dramatically the time associated with the decision and subsequent action. In the end, warfare is a time competitive process, where the side able to decide the fastest and move most quickly to execution will generally prevail. Indeed, artificially intelligent intelligence systems, tied to AI-assisted command and control systems, can move decision support and decisionmaking to a speed vastly superior to the speeds of the traditional means of waging war. So fast will be this process, especially if coupled to automatic decisions to launch artificially intelligent autonomous weapons systems capable of lethal outcomes, that a new term has been coined specifically to embrace the speed at which war will be waged: hyperwar.

While the ethical and legal debate is raging over whether America will ever wage war with artificially intelligent autonomous lethal systems, the Chinese and Russians are not nearly so mired in this debate, and we should anticipate our need to defend against these systems operating at hyperwar speeds. The challenge in the West of where to position “humans in the loop” in a hyperwar scenario will ultimately dictate the West’s capacity to be competitive in this new form of conflict. 17

Just as AI will profoundly affect the speed of warfare, the proliferation of zero day or zero second cyber threats as well as polymorphic malware will challenge even the most sophisticated signature-based cyber protection. This forces significant improvement to existing cyber defenses. Increasingly, vulnerable systems are migrating, and will need to shift to a layered approach to cybersecurity with cloud-based, cognitive AI platforms. This approach moves the community toward a “thinking” defensive capability that can defend networks through constant training on known threats. This capability includes DNA-level analysis of heretofore unknown code, with the possibility of recognizing and stopping inbound malicious code by recognizing a string component of the file. This is how certain key U.S.-based systems stopped the debilitating “WannaCry” and “Petya” viruses.

Preparing for hyperwar and defending critical cyber networks must become a high priority because China, Russia, North Korea, and other countries are putting substantial resources into AI. In 2017, China’s State Council issued a plan for the country to “build a domestic industry worth almost $150 billion” by 2030. 18 As an example of the possibilities, the Chinese search firm Baidu has pioneered a facial recognition application that finds missing people. In addition, cities such as Shenzhen are providing up to $1 million to support AI labs. That country hopes AI will provide security, combat terrorism, and improve speech recognition programs. 19 The dual-use nature of many AI algorithms will mean AI research focused on one sector of society can be rapidly modified for use in the security sector as well. 20

Health care

AI tools are helping designers improve computational sophistication in health care. For example, Merantix is a German company that applies deep learning to medical issues. It has an application in medical imaging that “detects lymph nodes in the human body in Computer Tomography (CT) images.” 21 According to its developers, the key is labeling the nodes and identifying small lesions or growths that could be problematic. Humans can do this, but radiologists charge $100 per hour and may be able to carefully read only four images an hour. If there were 10,000 images, the cost of this process would be $250,000, which is prohibitively expensive if done by humans.

What deep learning can do in this situation is train computers on data sets to learn what a normal-looking versus an irregular-appearing lymph node is. After doing that through imaging exercises and honing the accuracy of the labeling, radiological imaging specialists can apply this knowledge to actual patients and determine the extent to which someone is at risk of cancerous lymph nodes. Since only a few are likely to test positive, it is a matter of identifying the unhealthy versus healthy node.

AI has been applied to congestive heart failure as well, an illness that afflicts 10 percent of senior citizens and costs $35 billion each year in the United States. AI tools are helpful because they “predict in advance potential challenges ahead and allocate resources to patient education, sensing, and proactive interventions that keep patients out of the hospital.” 22

Criminal justice

AI is being deployed in the criminal justice area. The city of Chicago has developed an AI-driven “Strategic Subject List” that analyzes people who have been arrested for their risk of becoming future perpetrators. It ranks 400,000 people on a scale of 0 to 500, using items such as age, criminal activity, victimization, drug arrest records, and gang affiliation. In looking at the data, analysts found that youth is a strong predictor of violence, being a shooting victim is associated with becoming a future perpetrator, gang affiliation has little predictive value, and drug arrests are not significantly associated with future criminal activity. 23

Judicial experts claim AI programs reduce human bias in law enforcement and leads to a fairer sentencing system. R Street Institute Associate Caleb Watney writes:

Empirically grounded questions of predictive risk analysis play to the strengths of machine learning, automated reasoning and other forms of AI. One machine-learning policy simulation concluded that such programs could be used to cut crime up to 24.8 percent with no change in jailing rates, or reduce jail populations by up to 42 percent with no increase in crime rates. 24

However, critics worry that AI algorithms represent “a secret system to punish citizens for crimes they haven’t yet committed. The risk scores have been used numerous times to guide large-scale roundups.” 25 The fear is that such tools target people of color unfairly and have not helped Chicago reduce the murder wave that has plagued it in recent years.

Despite these concerns, other countries are moving ahead with rapid deployment in this area. In China, for example, companies already have “considerable resources and access to voices, faces and other biometric data in vast quantities, which would help them develop their technologies.” 26 New technologies make it possible to match images and voices with other types of information, and to use AI on these combined data sets to improve law enforcement and national security. Through its “Sharp Eyes” program, Chinese law enforcement is matching video images, social media activity, online purchases, travel records, and personal identity into a “police cloud.” This integrated database enables authorities to keep track of criminals, potential law-breakers, and terrorists. 27 Put differently, China has become the world’s leading AI-powered surveillance state.

Transportation

Transportation represents an area where AI and machine learning are producing major innovations. Research by Cameron Kerry and Jack Karsten of the Brookings Institution has found that over $80 billion was invested in autonomous vehicle technology between August 2014 and June 2017. Those investments include applications both for autonomous driving and the core technologies vital to that sector. 28

Autonomous vehicles—cars, trucks, buses, and drone delivery systems—use advanced technological capabilities. Those features include automated vehicle guidance and braking, lane-changing systems, the use of cameras and sensors for collision avoidance, the use of AI to analyze information in real time, and the use of high-performance computing and deep learning systems to adapt to new circumstances through detailed maps. 29

Light detection and ranging systems (LIDARs) and AI are key to navigation and collision avoidance. LIDAR systems combine light and radar instruments. They are mounted on the top of vehicles that use imaging in a 360-degree environment from a radar and light beams to measure the speed and distance of surrounding objects. Along with sensors placed on the front, sides, and back of the vehicle, these instruments provide information that keeps fast-moving cars and trucks in their own lane, helps them avoid other vehicles, applies brakes and steering when needed, and does so instantly so as to avoid accidents.

Advanced software enables cars to learn from the experiences of other vehicles on the road and adjust their guidance systems as weather, driving, or road conditions change. This means that software is the key—not the physical car or truck itself.

Since these cameras and sensors compile a huge amount of information and need to process it instantly to avoid the car in the next lane, autonomous vehicles require high-performance computing, advanced algorithms, and deep learning systems to adapt to new scenarios. This means that software is the key, not the physical car or truck itself. 30 Advanced software enables cars to learn from the experiences of other vehicles on the road and adjust their guidance systems as weather, driving, or road conditions change. 31

Ride-sharing companies are very interested in autonomous vehicles. They see advantages in terms of customer service and labor productivity. All of the major ride-sharing companies are exploring driverless cars. The surge of car-sharing and taxi services—such as Uber and Lyft in the United States, Daimler’s Mytaxi and Hailo service in Great Britain, and Didi Chuxing in China—demonstrate the opportunities of this transportation option. Uber recently signed an agreement to purchase 24,000 autonomous cars from Volvo for its ride-sharing service. 32

However, the ride-sharing firm suffered a setback in March 2018 when one of its autonomous vehicles in Arizona hit and killed a pedestrian. Uber and several auto manufacturers immediately suspended testing and launched investigations into what went wrong and how the fatality could have occurred. 33 Both industry and consumers want reassurance that the technology is safe and able to deliver on its stated promises. Unless there are persuasive answers, this accident could slow AI advancements in the transportation sector.

Smart cities

Metropolitan governments are using AI to improve urban service delivery. For example, according to Kevin Desouza, Rashmi Krishnamurthy, and Gregory Dawson:

The Cincinnati Fire Department is using data analytics to optimize medical emergency responses. The new analytics system recommends to the dispatcher an appropriate response to a medical emergency call—whether a patient can be treated on-site or needs to be taken to the hospital—by taking into account several factors, such as the type of call, location, weather, and similar calls. 34

Since it fields 80,000 requests each year, Cincinnati officials are deploying this technology to prioritize responses and determine the best ways to handle emergencies. They see AI as a way to deal with large volumes of data and figure out efficient ways of responding to public requests. Rather than address service issues in an ad hoc manner, authorities are trying to be proactive in how they provide urban services.

Cincinnati is not alone. A number of metropolitan areas are adopting smart city applications that use AI to improve service delivery, environmental planning, resource management, energy utilization, and crime prevention, among other things. For its smart cities index, the magazine Fast Company ranked American locales and found Seattle, Boston, San Francisco, Washington, D.C., and New York City as the top adopters. Seattle, for example, has embraced sustainability and is using AI to manage energy usage and resource management. Boston has launched a “City Hall To Go” that makes sure underserved communities receive needed public services. It also has deployed “cameras and inductive loops to manage traffic and acoustic sensors to identify gun shots.” San Francisco has certified 203 buildings as meeting LEED sustainability standards. 35

Through these and other means, metropolitan areas are leading the country in the deployment of AI solutions. Indeed, according to a National League of Cities report, 66 percent of American cities are investing in smart city technology. Among the top applications noted in the report are “smart meters for utilities, intelligent traffic signals, e-governance applications, Wi-Fi kiosks, and radio frequency identification sensors in pavement.” 36

Policy, regulatory, and ethical issues

These examples from a variety of sectors demonstrate how AI is transforming many walks of human existence. The increasing penetration of AI and autonomous devices into many aspects of life is altering basic operations and decisionmaking within organizations, and improving efficiency and response times.

At the same time, though, these developments raise important policy, regulatory, and ethical issues. For example, how should we promote data access? How do we guard against biased or unfair data used in algorithms? What types of ethical principles are introduced through software programming, and how transparent should designers be about their choices? What about questions of legal liability in cases where algorithms cause harm? 37

The increasing penetration of AI into many aspects of life is altering decisionmaking within organizations and improving efficiency. At the same time, though, these developments raise important policy, regulatory, and ethical issues.

Data access problems

The key to getting the most out of AI is having a “data-friendly ecosystem with unified standards and cross-platform sharing.” AI depends on data that can be analyzed in real time and brought to bear on concrete problems. Having data that are “accessible for exploration” in the research community is a prerequisite for successful AI development. 38

According to a McKinsey Global Institute study, nations that promote open data sources and data sharing are the ones most likely to see AI advances. In this regard, the United States has a substantial advantage over China. Global ratings on data openness show that U.S. ranks eighth overall in the world, compared to 93 for China. 39

But right now, the United States does not have a coherent national data strategy. There are few protocols for promoting research access or platforms that make it possible to gain new insights from proprietary data. It is not always clear who owns data or how much belongs in the public sphere. These uncertainties limit the innovation economy and act as a drag on academic research. In the following section, we outline ways to improve data access for researchers.

Biases in data and algorithms

In some instances, certain AI systems are thought to have enabled discriminatory or biased practices. 40 For example, Airbnb has been accused of having homeowners on its platform who discriminate against racial minorities. A research project undertaken by the Harvard Business School found that “Airbnb users with distinctly African American names were roughly 16 percent less likely to be accepted as guests than those with distinctly white names.” 41

Racial issues also come up with facial recognition software. Most such systems operate by comparing a person’s face to a range of faces in a large database. As pointed out by Joy Buolamwini of the Algorithmic Justice League, “If your facial recognition data contains mostly Caucasian faces, that’s what your program will learn to recognize.” 42 Unless the databases have access to diverse data, these programs perform poorly when attempting to recognize African-American or Asian-American features.

Many historical data sets reflect traditional values, which may or may not represent the preferences wanted in a current system. As Buolamwini notes, such an approach risks repeating inequities of the past:

The rise of automation and the increased reliance on algorithms for high-stakes decisions such as whether someone get insurance or not, your likelihood to default on a loan or somebody’s risk of recidivism means this is something that needs to be addressed. Even admissions decisions are increasingly automated—what school our children go to and what opportunities they have. We don’t have to bring the structural inequalities of the past into the future we create. 43

AI ethics and transparency

Algorithms embed ethical considerations and value choices into program decisions. As such, these systems raise questions concerning the criteria used in automated decisionmaking. Some people want to have a better understanding of how algorithms function and what choices are being made. 44

In the United States, many urban schools use algorithms for enrollment decisions based on a variety of considerations, such as parent preferences, neighborhood qualities, income level, and demographic background. According to Brookings researcher Jon Valant, the New Orleans–based Bricolage Academy “gives priority to economically disadvantaged applicants for up to 33 percent of available seats. In practice, though, most cities have opted for categories that prioritize siblings of current students, children of school employees, and families that live in school’s broad geographic area.” 45 Enrollment choices can be expected to be very different when considerations of this sort come into play.

Depending on how AI systems are set up, they can facilitate the redlining of mortgage applications, help people discriminate against individuals they don’t like, or help screen or build rosters of individuals based on unfair criteria. The types of considerations that go into programming decisions matter a lot in terms of how the systems operate and how they affect customers. 46

For these reasons, the EU is implementing the General Data Protection Regulation (GDPR) in May 2018. The rules specify that people have “the right to opt out of personally tailored ads” and “can contest ‘legal or similarly significant’ decisions made by algorithms and appeal for human intervention” in the form of an explanation of how the algorithm generated a particular outcome. Each guideline is designed to ensure the protection of personal data and provide individuals with information on how the “black box” operates. 47

Legal liability

There are questions concerning the legal liability of AI systems. If there are harms or infractions (or fatalities in the case of driverless cars), the operators of the algorithm likely will fall under product liability rules. A body of case law has shown that the situation’s facts and circumstances determine liability and influence the kind of penalties that are imposed. Those can range from civil fines to imprisonment for major harms. 48 The Uber-related fatality in Arizona will be an important test case for legal liability. The state actively recruited Uber to test its autonomous vehicles and gave the company considerable latitude in terms of road testing. It remains to be seen if there will be lawsuits in this case and who is sued: the human backup driver, the state of Arizona, the Phoenix suburb where the accident took place, Uber, software developers, or the auto manufacturer. Given the multiple people and organizations involved in the road testing, there are many legal questions to be resolved.

In non-transportation areas, digital platforms often have limited liability for what happens on their sites. For example, in the case of Airbnb, the firm “requires that people agree to waive their right to sue, or to join in any class-action lawsuit or class-action arbitration, to use the service.” By demanding that its users sacrifice basic rights, the company limits consumer protections and therefore curtails the ability of people to fight discrimination arising from unfair algorithms. 49 But whether the principle of neutral networks holds up in many sectors is yet to be determined on a widespread basis.

Recommendations

In order to balance innovation with basic human values, we propose a number of recommendations for moving forward with AI. This includes improving data access, increasing government investment in AI, promoting AI workforce development, creating a federal advisory committee, engaging with state and local officials to ensure they enact effective policies, regulating broad objectives as opposed to specific algorithms, taking bias seriously as an AI issue, maintaining mechanisms for human control and oversight, and penalizing malicious behavior and promoting cybersecurity.

Improving data access

The United States should develop a data strategy that promotes innovation and consumer protection. Right now, there are no uniform standards in terms of data access, data sharing, or data protection. Almost all the data are proprietary in nature and not shared very broadly with the research community, and this limits innovation and system design. AI requires data to test and improve its learning capacity. 50 Without structured and unstructured data sets, it will be nearly impossible to gain the full benefits of artificial intelligence.

In general, the research community needs better access to government and business data, although with appropriate safeguards to make sure researchers do not misuse data in the way Cambridge Analytica did with Facebook information. There is a variety of ways researchers could gain data access. One is through voluntary agreements with companies holding proprietary data. Facebook, for example, recently announced a partnership with Stanford economist Raj Chetty to use its social media data to explore inequality. 51 As part of the arrangement, researchers were required to undergo background checks and could only access data from secured sites in order to protect user privacy and security.

In the U.S., there are no uniform standards in terms of data access, data sharing, or data protection. Almost all the data are proprietary in nature and not shared very broadly with the research community, and this limits innovation and system design.

Google long has made available search results in aggregated form for researchers and the general public. Through its “Trends” site, scholars can analyze topics such as interest in Trump, views about democracy, and perspectives on the overall economy. 52 That helps people track movements in public interest and identify topics that galvanize the general public.

Twitter makes much of its tweets available to researchers through application programming interfaces, commonly referred to as APIs. These tools help people outside the company build application software and make use of data from its social media platform. They can study patterns of social media communications and see how people are commenting on or reacting to current events.

In some sectors where there is a discernible public benefit, governments can facilitate collaboration by building infrastructure that shares data. For example, the National Cancer Institute has pioneered a data-sharing protocol where certified researchers can query health data it has using de-identified information drawn from clinical data, claims information, and drug therapies. That enables researchers to evaluate efficacy and effectiveness, and make recommendations regarding the best medical approaches, without compromising the privacy of individual patients.

There could be public-private data partnerships that combine government and business data sets to improve system performance. For example, cities could integrate information from ride-sharing services with its own material on social service locations, bus lines, mass transit, and highway congestion to improve transportation. That would help metropolitan areas deal with traffic tie-ups and assist in highway and mass transit planning.

Some combination of these approaches would improve data access for researchers, the government, and the business community, without impinging on personal privacy. As noted by Ian Buck, the vice president of NVIDIA, “Data is the fuel that drives the AI engine. The federal government has access to vast sources of information. Opening access to that data will help us get insights that will transform the U.S. economy.” 53 Through its Data.gov portal, the federal government already has put over 230,000 data sets into the public domain, and this has propelled innovation and aided improvements in AI and data analytic technologies. 54 The private sector also needs to facilitate research data access so that society can achieve the full benefits of artificial intelligence.

Increase government investment in AI

According to Greg Brockman, the co-founder of OpenAI, the U.S. federal government invests only $1.1 billion in non-classified AI technology. 55 That is far lower than the amount being spent by China or other leading nations in this area of research. That shortfall is noteworthy because the economic payoffs of AI are substantial. In order to boost economic development and social innovation, federal officials need to increase investment in artificial intelligence and data analytics. Higher investment is likely to pay for itself many times over in economic and social benefits. 56

Promote digital education and workforce development

As AI applications accelerate across many sectors, it is vital that we reimagine our educational institutions for a world where AI will be ubiquitous and students need a different kind of training than they currently receive. Right now, many students do not receive instruction in the kinds of skills that will be needed in an AI-dominated landscape. For example, there currently are shortages of data scientists, computer scientists, engineers, coders, and platform developers. These are skills that are in short supply; unless our educational system generates more people with these capabilities, it will limit AI development.

For these reasons, both state and federal governments have been investing in AI human capital. For example, in 2017, the National Science Foundation funded over 6,500 graduate students in computer-related fields and has launched several new initiatives designed to encourage data and computer science at all levels from pre-K to higher and continuing education. 57 The goal is to build a larger pipeline of AI and data analytic personnel so that the United States can reap the full advantages of the knowledge revolution.

But there also needs to be substantial changes in the process of learning itself. It is not just technical skills that are needed in an AI world but skills of critical reasoning, collaboration, design, visual display of information, and independent thinking, among others. AI will reconfigure how society and the economy operate, and there needs to be “big picture” thinking on what this will mean for ethics, governance, and societal impact. People will need the ability to think broadly about many questions and integrate knowledge from a number of different areas.

One example of new ways to prepare students for a digital future is IBM’s Teacher Advisor program, utilizing Watson’s free online tools to help teachers bring the latest knowledge into the classroom. They enable instructors to develop new lesson plans in STEM and non-STEM fields, find relevant instructional videos, and help students get the most out of the classroom. 58 As such, they are precursors of new educational environments that need to be created.

Create a federal AI advisory committee

Federal officials need to think about how they deal with artificial intelligence. As noted previously, there are many issues ranging from the need for improved data access to addressing issues of bias and discrimination. It is vital that these and other concerns be considered so we gain the full benefits of this emerging technology.

In order to move forward in this area, several members of Congress have introduced the “Future of Artificial Intelligence Act,” a bill designed to establish broad policy and legal principles for AI. It proposes the secretary of commerce create a federal advisory committee on the development and implementation of artificial intelligence. The legislation provides a mechanism for the federal government to get advice on ways to promote a “climate of investment and innovation to ensure the global competitiveness of the United States,” “optimize the development of artificial intelligence to address the potential growth, restructuring, or other changes in the United States workforce,” “support the unbiased development and application of artificial intelligence,” and “protect the privacy rights of individuals.” 59

Among the specific questions the committee is asked to address include the following: competitiveness, workforce impact, education, ethics training, data sharing, international cooperation, accountability, machine learning bias, rural impact, government efficiency, investment climate, job impact, bias, and consumer impact. The committee is directed to submit a report to Congress and the administration 540 days after enactment regarding any legislative or administrative action needed on AI.

This legislation is a step in the right direction, although the field is moving so rapidly that we would recommend shortening the reporting timeline from 540 days to 180 days. Waiting nearly two years for a committee report will certainly result in missed opportunities and a lack of action on important issues. Given rapid advances in the field, having a much quicker turnaround time on the committee analysis would be quite beneficial.

Engage with state and local officials

States and localities also are taking action on AI. For example, the New York City Council unanimously passed a bill that directed the mayor to form a taskforce that would “monitor the fairness and validity of algorithms used by municipal agencies.” 60 The city employs algorithms to “determine if a lower bail will be assigned to an indigent defendant, where firehouses are established, student placement for public schools, assessing teacher performance, identifying Medicaid fraud and determine where crime will happen next.” 61

According to the legislation’s developers, city officials want to know how these algorithms work and make sure there is sufficient AI transparency and accountability. In addition, there is concern regarding the fairness and biases of AI algorithms, so the taskforce has been directed to analyze these issues and make recommendations regarding future usage. It is scheduled to report back to the mayor on a range of AI policy, legal, and regulatory issues by late 2019.

Some observers already are worrying that the taskforce won’t go far enough in holding algorithms accountable. For example, Julia Powles of Cornell Tech and New York University argues that the bill originally required companies to make the AI source code available to the public for inspection, and that there be simulations of its decisionmaking using actual data. After criticism of those provisions, however, former Councilman James Vacca dropped the requirements in favor of a task force studying these issues. He and other city officials were concerned that publication of proprietary information on algorithms would slow innovation and make it difficult to find AI vendors who would work with the city. 62 It remains to be seen how this local task force will balance issues of innovation, privacy, and transparency.

Regulate broad objectives more than specific algorithms

The European Union has taken a restrictive stance on these issues of data collection and analysis. 63 It has rules limiting the ability of companies from collecting data on road conditions and mapping street views. Because many of these countries worry that people’s personal information in unencrypted Wi-Fi networks are swept up in overall data collection, the EU has fined technology firms, demanded copies of data, and placed limits on the material collected. 64 This has made it more difficult for technology companies operating there to develop the high-definition maps required for autonomous vehicles.

The GDPR being implemented in Europe place severe restrictions on the use of artificial intelligence and machine learning. According to published guidelines, “Regulations prohibit any automated decision that ‘significantly affects’ EU citizens. This includes techniques that evaluates a person’s ‘performance at work, economic situation, health, personal preferences, interests, reliability, behavior, location, or movements.’” 65 In addition, these new rules give citizens the right to review how digital services made specific algorithmic choices affecting people.

By taking a restrictive stance on issues of data collection and analysis, the European Union is putting its manufacturers and software designers at a significant disadvantage to the rest of the world.

If interpreted stringently, these rules will make it difficult for European software designers (and American designers who work with European counterparts) to incorporate artificial intelligence and high-definition mapping in autonomous vehicles. Central to navigation in these cars and trucks is tracking location and movements. Without high-definition maps containing geo-coded data and the deep learning that makes use of this information, fully autonomous driving will stagnate in Europe. Through this and other data protection actions, the European Union is putting its manufacturers and software designers at a significant disadvantage to the rest of the world.

It makes more sense to think about the broad objectives desired in AI and enact policies that advance them, as opposed to governments trying to crack open the “black boxes” and see exactly how specific algorithms operate. Regulating individual algorithms will limit innovation and make it difficult for companies to make use of artificial intelligence.

Take biases seriously

Bias and discrimination are serious issues for AI. There already have been a number of cases of unfair treatment linked to historic data, and steps need to be undertaken to make sure that does not become prevalent in artificial intelligence. Existing statutes governing discrimination in the physical economy need to be extended to digital platforms. That will help protect consumers and build confidence in these systems as a whole.

For these advances to be widely adopted, more transparency is needed in how AI systems operate. Andrew Burt of Immuta argues, “The key problem confronting predictive analytics is really transparency. We’re in a world where data science operations are taking on increasingly important tasks, and the only thing holding them back is going to be how well the data scientists who train the models can explain what it is their models are doing.” 66

Maintaining mechanisms for human oversight and control

Some individuals have argued that there needs to be avenues for humans to exercise oversight and control of AI systems. For example, Allen Institute for Artificial Intelligence CEO Oren Etzioni argues there should be rules for regulating these systems. First, he says, AI must be governed by all the laws that already have been developed for human behavior, including regulations concerning “cyberbullying, stock manipulation or terrorist threats,” as well as “entrap[ping] people into committing crimes.” Second, he believes that these systems should disclose they are automated systems and not human beings. Third, he states, “An A.I. system cannot retain or disclose confidential information without explicit approval from the source of that information.” 67 His rationale is that these tools store so much data that people have to be cognizant of the privacy risks posed by AI.

In the same vein, the IEEE Global Initiative has ethical guidelines for AI and autonomous systems. Its experts suggest that these models be programmed with consideration for widely accepted human norms and rules for behavior. AI algorithms need to take into effect the importance of these norms, how norm conflict can be resolved, and ways these systems can be transparent about norm resolution. Software designs should be programmed for “nondeception” and “honesty,” according to ethics experts. When failures occur, there must be mitigation mechanisms to deal with the consequences. In particular, AI must be sensitive to problems such as bias, discrimination, and fairness. 68

A group of machine learning experts claim it is possible to automate ethical decisionmaking. Using the trolley problem as a moral dilemma, they ask the following question: If an autonomous car goes out of control, should it be programmed to kill its own passengers or the pedestrians who are crossing the street? They devised a “voting-based system” that asked 1.3 million people to assess alternative scenarios, summarized the overall choices, and applied the overall perspective of these individuals to a range of vehicular possibilities. That allowed them to automate ethical decisionmaking in AI algorithms, taking public preferences into account. 69 This procedure, of course, does not reduce the tragedy involved in any kind of fatality, such as seen in the Uber case, but it provides a mechanism to help AI developers incorporate ethical considerations in their planning.

Penalize malicious behavior and promote cybersecurity

As with any emerging technology, it is important to discourage malicious treatment designed to trick software or use it for undesirable ends. 70 This is especially important given the dual-use aspects of AI, where the same tool can be used for beneficial or malicious purposes. The malevolent use of AI exposes individuals and organizations to unnecessary risks and undermines the virtues of the emerging technology. This includes behaviors such as hacking, manipulating algorithms, compromising privacy and confidentiality, or stealing identities. Efforts to hijack AI in order to solicit confidential information should be seriously penalized as a way to deter such actions. 71

In a rapidly changing world with many entities having advanced computing capabilities, there needs to be serious attention devoted to cybersecurity. Countries have to be careful to safeguard their own systems and keep other nations from damaging their security. 72 According to the U.S. Department of Homeland Security, a major American bank receives around 11 million calls a week at its service center. In order to protect its telephony from denial of service attacks, it uses a “machine learning-based policy engine [that] blocks more than 120,000 calls per month based on voice firewall policies including harassing callers, robocalls and potential fraudulent calls.” 73 This represents a way in which machine learning can help defend technology systems from malevolent attacks.

To summarize, the world is on the cusp of revolutionizing many sectors through artificial intelligence and data analytics. There already are significant deployments in finance, national security, health care, criminal justice, transportation, and smart cities that have altered decisionmaking, business models, risk mitigation, and system performance. These developments are generating substantial economic and social benefits.

The world is on the cusp of revolutionizing many sectors through artificial intelligence, but the way AI systems are developed need to be better understood due to the major implications these technologies will have for society as a whole.

Yet the manner in which AI systems unfold has major implications for society as a whole. It matters how policy issues are addressed, ethical conflicts are reconciled, legal realities are resolved, and how much transparency is required in AI and data analytic solutions. 74 Human choices about software development affect the way in which decisions are made and the manner in which they are integrated into organizational routines. Exactly how these processes are executed need to be better understood because they will have substantial impact on the general public soon, and for the foreseeable future. AI may well be a revolution in human affairs, and become the single most influential human innovation in history.

Note: We appreciate the research assistance of Grace Gilberg, Jack Karsten, Hillary Schaub, and Kristjan Tomasson on this project.

The Brookings Institution is a nonprofit organization devoted to independent research and policy solutions. Its mission is to conduct high-quality, independent research and, based on that research, to provide innovative, practical recommendations for policymakers and the public. The conclusions and recommendations of any Brookings publication are solely those of its author(s), and do not reflect the views of the Institution, its management, or its other scholars.

Support for this publication was generously provided by Amazon. Brookings recognizes that the value it provides is in its absolute commitment to quality, independence, and impact. Activities supported by its donors reflect this commitment. 

John R. Allen is a member of the Board of Advisors of Amida Technology and on the Board of Directors of Spark Cognition. Both companies work in fields discussed in this piece.

  • Thomas Davenport, Jeff Loucks, and David Schatsky, “Bullish on the Business Value of Cognitive” (Deloitte, 2017), p. 3 (www2.deloitte.com/us/en/pages/deloitte-analytics/articles/cognitive-technology-adoption-survey.html).
  • Luke Dormehl, Thinking Machines: The Quest for Artificial Intelligence—and Where It’s Taking Us Next (New York: Penguin–TarcherPerigee, 2017).
  • Shubhendu and Vijay, “Applicability of Artificial Intelligence in Different Fields of Life.”
  • Andrew McAfee and Erik Brynjolfsson, Machine Platform Crowd: Harnessing Our Digital Future (New York: Norton, 2017).
  • Portions of this paper draw on Darrell M. West, The Future of Work: Robots, AI, and Automation , Brookings Institution Press, 2018.
  • PriceWaterhouseCoopers, “Sizing the Prize: What’s the Real Value of AI for Your Business and How Can You Capitalise?” 2017.
  • Dominic Barton, Jonathan Woetzel, Jeongmin Seong, and Qinzheng Tian, “Artificial Intelligence: Implications for China” (New York: McKinsey Global Institute, April 2017), p. 1.
  • Nathaniel Popper, “Stocks and Bots,” New York Times Magazine , February 28, 2016.
  • Michael Lewis, Flash Boys: A Wall Street Revolt (New York: Norton, 2015).
  • Cade Metz, “In Quantum Computing Race, Yale Professors Battle Tech Giants,” New York Times , November 14, 2017, p. B3.
  • Executive Office of the President, “Artificial Intelligence, Automation, and the Economy,” December 2016, pp. 27-28.
  • Christian Davenport, “Future Wars May Depend as Much on Algorithms as on Ammunition, Report Says,” Washington Post , December 3, 2017.
  • John R. Allen and Amir Husain, “On Hyperwar,” Naval Institute Proceedings , July 17, 2017, pp. 30-36.
  • Paul Mozur, “China Sets Goal to Lead in Artificial Intelligence,” New York Times , July 21, 2017, p. B1.
  • Paul Mozur and John Markoff, “Is China Outsmarting American Artificial Intelligence?” New York Times , May 28, 2017.
  • Economist , “America v China: The Battle for Digital Supremacy,” March 15, 2018.
  • Rasmus Rothe, “Applying Deep Learning to Real-World Problems,” Medium , May 23, 2017.
  • Eric Horvitz, “Reflections on the Status and Future of Artificial Intelligence,” Testimony before the U.S. Senate Subcommittee on Space, Science, and Competitiveness, November 30, 2016, p. 5.
  • Jeff Asher and Rob Arthur, “Inside the Algorithm That Tries to Predict Gun Violence in Chicago,” New York Times Upshot , June 13, 2017.
  • Caleb Watney, “It’s Time for our Justice System to Embrace Artificial Intelligence,” TechTank (blog), Brookings Institution, July 20, 2017.
  • Asher and Arthur, “Inside the Algorithm That Tries to Predict Gun Violence in Chicago.”
  • Paul Mozur and Keith Bradsher, “China’s A.I. Advances Help Its Tech Industry, and State Security,” New York Times , December 3, 2017.
  • Simon Denyer, “China’s Watchful Eye,” Washington Post , January 7, 2018.
  • Cameron Kerry and Jack Karsten, “Gauging Investment in Self-Driving Cars,” Brookings Institution, October 16, 2017.
  • Portions of this section are drawn from Darrell M. West, “Driverless Cars in China, Europe, Japan, Korea, and the United States,” Brookings Institution, September 2016.
  • Yuming Ge, Xiaoman Liu, Libo Tang, and Darrell M. West, “Smart Transportation in China and the United States,” Center for Technology Innovation, Brookings Institution, December 2017.
  • Peter Holley, “Uber Signs Deal to Buy 24,000 Autonomous Vehicles from Volvo,” Washington Post , November 20, 2017.
  • Daisuke Wakabayashi, “Self-Driving Uber Car Kills Pedestrian in Arizona, Where Robots Roam,” New York Times , March 19, 2018.
  • Kevin Desouza, Rashmi Krishnamurthy, and Gregory Dawson, “Learning from Public Sector Experimentation with Artificial Intelligence,” TechTank (blog), Brookings Institution, June 23, 2017.
  • Boyd Cohen, “The 10 Smartest Cities in North America,” Fast Company , November 14, 2013.
  • Teena Maddox, “66% of US Cities Are Investing in Smart City Technology,” TechRepublic , November 6, 2017.
  • Osonde Osoba and William Welser IV, “The Risks of Artificial Intelligence to Security and the Future of Work” (Santa Monica, Calif.: RAND Corp., December 2017) (www.rand.org/pubs/perspectives/PE237.html).
  • Ibid., p. 7.
  • Dominic Barton, Jonathan Woetzel, Jeongmin Seong, and Qinzheng Tian, “Artificial Intelligence: Implications for China” (New York: McKinsey Global Institute, April 2017), p. 7.
  • Executive Office of the President, “Preparing for the Future of Artificial Intelligence,” October 2016, pp. 30-31.
  • Elaine Glusac, “As Airbnb Grows, So Do Claims of Discrimination,” New York Times , June 21, 2016.
  • “Joy Buolamwini,” Bloomberg Businessweek , July 3, 2017, p. 80.
  • Mark Purdy and Paul Daugherty, “Why Artificial Intelligence is the Future of Growth,” Accenture, 2016.
  • Jon Valant, “Integrating Charter Schools and Choice-Based Education Systems,” Brown Center Chalkboard blog, Brookings Institution, June 23, 2017.
  • Tucker, “‘A White Mask Worked Better.’”
  • Cliff Kuang, “Can A.I. Be Taught to Explain Itself?” New York Times Magazine , November 21, 2017.
  • Yale Law School Information Society Project, “Governing Machine Learning,” September 2017.
  • Katie Benner, “Airbnb Vows to Fight Racism, But Its Users Can’t Sue to Prompt Fairness,” New York Times , June 19, 2016.
  • Executive Office of the President, “Artificial Intelligence, Automation, and the Economy” and “Preparing for the Future of Artificial Intelligence.”
  • Nancy Scolar, “Facebook’s Next Project: American Inequality,” Politico , February 19, 2018.
  • Darrell M. West, “What Internet Search Data Reveals about Donald Trump’s First Year in Office,” Brookings Institution policy report, January 17, 2018.
  • Ian Buck, “Testimony before the House Committee on Oversight and Government Reform Subcommittee on Information Technology,” February 14, 2018.
  • Keith Nakasone, “Testimony before the House Committee on Oversight and Government Reform Subcommittee on Information Technology,” March 7, 2018.
  • Greg Brockman, “The Dawn of Artificial Intelligence,” Testimony before U.S. Senate Subcommittee on Space, Science, and Competitiveness, November 30, 2016.
  • Amir Khosrowshahi, “Testimony before the House Committee on Oversight and Government Reform Subcommittee on Information Technology,” February 14, 2018.
  • James Kurose, “Testimony before the House Committee on Oversight and Government Reform Subcommittee on Information Technology,” March 7, 2018.
  • Stephen Noonoo, “Teachers Can Now Use IBM’s Watson to Search for Free Lesson Plans,” EdSurge , September 13, 2017.
  • Congress.gov, “H.R. 4625 FUTURE of Artificial Intelligence Act of 2017,” December 12, 2017.
  • Elizabeth Zima, “Could New York City’s AI Transparency Bill Be a Model for the Country?” Government Technology , January 4, 2018.
  • Julia Powles, “New York City’s Bold, Flawed Attempt to Make Algorithms Accountable,” New Yorker , December 20, 2017.
  • Sheera Frenkel, “Tech Giants Brace for Europe’s New Data Privacy Rules,” New York Times , January 28, 2018.
  • Claire Miller and Kevin O’Brien, “Germany’s Complicated Relationship with Google Street View,” New York Times , April 23, 2013.
  • Cade Metz, “Artificial Intelligence is Setting Up the Internet for a Huge Clash with Europe,” Wired , July 11, 2016.
  • Eric Siegel, “Predictive Analytics Interview Series: Andrew Burt,” Predictive Analytics Times , June 14, 2017.
  • Oren Etzioni, “How to Regulate Artificial Intelligence,” New York Times , September 1, 2017.
  • “Ethical Considerations in Artificial Intelligence and Autonomous Systems,” unpublished paper. IEEE Global Initiative, 2018.
  • Ritesh Noothigattu, Snehalkumar Gaikwad, Edmond Awad, Sohan Dsouza, Iyad Rahwan, Pradeep Ravikumar, and Ariel Procaccia, “A Voting-Based System for Ethical Decision Making,” Computers and Society , September 20, 2017 (www.media.mit.edu/publications/a-voting-based-system-for-ethical-decision-making/).
  • Miles Brundage, et al., “The Malicious Use of Artificial Intelligence,” University of Oxford unpublished paper, February 2018.
  • John Markoff, “As Artificial Intelligence Evolves, So Does Its Criminal Potential,” New York Times, October 24, 2016, p. B3.
  • Economist , “The Challenger: Technopolitics,” March 17, 2018.
  • Douglas Maughan, “Testimony before the House Committee on Oversight and Government Reform Subcommittee on Information Technology,” March 7, 2018.
  • Levi Tillemann and Colin McCormick, “Roadmapping a U.S.-German Agenda for Artificial Intelligence Policy,” New American Foundation, March 2017.

Artificial Intelligence

Governance Studies

Center for Technology Innovation

Artificial Intelligence and Emerging Technology Initiative

Jacob Taylor

May 20, 2024

Mark Schoeman

May 16, 2024

Charles Asiegbu, Chinasa T. Okolo

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 01 December 2023

A survey on performance evaluation of artificial intelligence algorithms for improving IoT security systems

  • Hind Meziane 1 &
  • Noura Ouerdi 1  

Scientific Reports volume  13 , Article number:  21255 ( 2023 ) Cite this article

2809 Accesses

1 Citations

Metrics details

  • Computer science
  • Information technology

Security is an important field in the Internet of Things (IoT) systems. The IoT and security are topical domains. Because it was obtained 35,077 document results from the Scopus database. Hence, the AI (Artificial Intelligence) has proven its efficiency in several domains including security, digital marketing, healthcare, big data, industry, education, robotic, and entertainment. Thus, the contribution of AI to the security of IoT systems has become a huge breakthrough. This contribution adopts the artificial intelligence (AI) as a base solution for the IoT security systems. Two different subsets of AI algorithms were considered: Machine Learning (ML) and Deep Learning (DL) methods. Nevertheless, it is difficult to determine which AI method and IoT dataset are best (more suitable) for classifying and/or detecting intrusions and attacks in the IoT domain. The large number of existing publications on this phenomenon explains the need for the current state of research that covers publications on IoT security using AI methods. Thus, this study compares the results regarding AI algorithms that have been mentioned in the related works. The goal of this paper is to compare the performance assessment of the existing AI algorithms in order to choose the best algorithm as well as whether the chosen algorithm can be used for classifying or/and detecting intrusions and attacks in order to improve security in the IoT domain. This study compares these methods in term of accuracy rate. Evaluating the current state of IoT security, AI and IoT datasets is the main aim for considering our future work. After that, this paper proposes, as result, a new and general taxonomy of AI techniques for IoT security (classification and detection techniques). Finally, the obtained results from this assessment survey that was dedicated to research conducted between 2018 and 2023 were satisfactory. This paper provides a good reference for researchers and readers in the IoT domain.

Similar content being viewed by others

research paper on artificial intelligence algorithms

Deep learning hybridization for improved malware detection in smart Internet of Things

research paper on artificial intelligence algorithms

Anomaly detection in IoT-based healthcare: machine learning for enhanced security

research paper on artificial intelligence algorithms

Ensemble technique of intrusion detection for IoT-edge platform

Introduction.

Internet of Things or IoT aims at connecting devices or things to the Internet to collect and exchange data from the environment without any human intervention. IoT was first used by Kevin Ashton in 1999, who has also invented the term RFID (Radio-Frequency Identification) 1 . In recent years, IoT plays a pivotal role in several domains/fields in the market such as healthcare, agriculture, smart home, smart city, education, smart grids, smart business.

In recent years, AI has become an emergent technology in many domains such as security, industry, etc. Security is one of challenges that we have in the era of IoT. Thus, IoT is really a challenged domain. Since, it was obtained 35,077 document results from the Scopus database after using the search criterion ((iot OR "Internet of Things") AND security). Recently, the adoption of artificial intelligence (AI) algorithms for IoT security systems has increased. Due to the use of the search criterion ((iot OR "Internet of Things") AND security AND (ai OR "artificial intelligence")) we gotten 2557 document results from the Scopus database, indicating a big interest in IoT security and AI among academic researchers, as well as the importance of IoT security research topic which encourage the use of AI algorithms for improving IoT security systems.

Improving the IoT security system means a good choice of AI technique and IoT dataset to get good accuracy in the context of classifying and detecting potential IoT attacks. There are many methods to classify and detect IoT attacks and intrusions based on AI, among which, the intelligent ML approaches, and the DL approaches. In spite of all the tools that offer the possibility to protect the IoT systems, IoT systems are more vulnerable to attacks due to several restrictions and the enormous concerns related to this field. For this reason, the security of IoT systems must be a priority.

IoT security is one of the most important concerns. Hence, Security Information and Event Management (SIEM) is a tool/solution that monitors, detects and alerts about security events or incidents in an IT environment. SIEM systems combines security event management (SEM) with security information management (SIM). It aims at improving security awareness within an IT environment by combining SIM and SEM 2 . By gathering and analyzing data and sources related to both real-time/recent and historical security event activity, SIEM solutions improve security incident management, threat detection, and compliance 2 . Additionally, it helps enable user activity monitoring, security monitoring, and compliance. This tool provides various services, like vulnerability scanning, intrusion detection, availability checking. Nevertheless, this tool has not yet been appropriated to the IoT. On the other hand, the AI techniques and methods can be used for classifying and detecting attacks and intrusions in the IoT system. Nevertheless, it is still unclear which AI approach(es), dataset(s) are efficient and best for classifying and detecting attacks in order to improve IoT security. Furthermore, in order to build a performant intrusion detector that relies on the chosen AI model(s) for detecting, predicting and monitoring anomalies in IoT system, we first need to choose the best AI technique to solve the classification or/and detection problem and an IoT dataset which are the main focus, then we can apply the chosen AI technique to train the data to make predictions at the end.

In addition to those AI techniques, there is another new technique apart from ML (ML suffers from a lot of false positives, false negatives, errors), called XAI (Explainable Artificial Intelligence) which addresses the problem of “black box”. For example, while working with neural networks, programmers are unable to understand what is behind. i.e., for programmers, the neural networks lack visibility. Therefore, a set of AI algorithms that can be understood by users is defined by XAI. Which includes the human decision or (human factor as the decision-maker). It refers to the question “How and why does the ML model make a prediction or decision in this way?”. In order to maintain success, we need to keep the human factor and not neglect it, so we need humans to work together with AI.

Objective of the work

The objective of the survey is the state of the art, it is an analysis of the related works (current state of research) that takes more time before reaching the most recent findings of these revolutionary concepts, i.e., the work that an author will do in 6 months, he will read it in a few minutes/hours, it is a good result. The goal of this survey is to clearly indicate what is missing, what is needed, and what needs to be done to improve IoT security as well as the directions in AI and IoT security research. It clearly describes what has been done before on the problem, and what is new will be detailed in the results section. So, this survey analyses the results and possible outcome from this new technology before integrating it. Therefore, this survey compares the performance of multiple AI methods in attack classification and anomaly or outlier detection. It performs a comprehensive state of the art on the study of AI for security in the IoT. The main goal is to outline the best AI method used for classification and/or detection of attacks in IoT environment. Indeed, a new and general taxonomy of AI based classification techniques for IoT security are also provided.

The question asked by this research is as follows “which is the best AI technique or algorithm for improving IoT security?” this means that which is the AI approach that achieves the best score in terms of accuracy, I proposed this study topic because it is very interesting for improving IoT security systems. The next question that can also be asked through this work is: “ can we use the chosen technique in order to classify and/or detect intrusions and attacks in the IoT security systems ?”. In the other words, which algorithm among the AI techniques and algorithms could be applied on classifying and/or detecting attacks and intrusions in order to improve the IoT security systems.

The problem that we have is “ how to choose an efficient AI model that we can use especially with the IoT security ?”. We need to understand a task(s) of application we are working on. The third research question is “ which datasets are most suitable for IoT systems ”. The originality or the added value of this contribution is as follows “There is no detailed other works in the literature that have reported or treated the questions posed through this research”. For this reason, it is imperative to choose a proper/right AI algorithm suitable for classification and/or detection in the IoT security systems, because choosing a wrong AI technique could lead to loss of effort, accuracy and effectiveness. For that, a comparison on accuracy will be made. Moreover, choosing a wrong dataset will produce incorrect results. For this reason, this paper gives a comparative analysis on the most available datasets to deduce recent and IoT-oriented datasets. Further, to increase accuracy, we need to follow/use a good classifier and a good dataset. Because, a good choice of AI method and IoT dataset leads to good results. Moreover, we cannot assume what is the best AI method to do attack classification and detection and for what dataset?

In this paper, the contribution of AI in the security of IoT systems is discussed. The aim of this paper is to carry out a comprehensive survey of AI techniques for IoT security systems. In other words, we make a comparison between the obtained results of AI techniques in terms of the accuracy rate. The objective of this performance comparison is to synthesize the most effective classification or/and detection method. Because, it is crucial to choose the right method and the right dataset for IoT security systems. On the other hand, the main idea of this research work is to collect and evaluate different results found by scientific researchers/academics/authors who have used AI in the context of classifying and/or detecting potential IoT attacks and intrusions or anomalies in order to find the best AI method taking into account the most suitable IoT dataset(s).

The paper is the only one that provides an in-depth analysis of the AI and IoT security fields. It is not about a descriptive overview. This study gives readers with more information about the best AI technique to use in classifying/detecting IoT attacks, and the data that comes from the IoT environment to achieve the best prediction results. It also gives the current state of research using AI techniques and other than AI. Further, no significant study on the best classifier or AI model(s), appropriate dataset(s) for IoT environment has been discussed. Also, most of the reported works have not highlighted the security vulnerabilities, attacks, with their respective requirements and solutions. All these reasons motivate us to give the following contributions in this research paper. This work is a good start, we will have all the novelties in the field of IoT, we will have the possibility of choosing the right path.

Contribution

The objectives and originality/value of this paper are summarized as follows:

The first research question is as follows: “which is the best AI technique or algorithm for improving IoT security?” for that, this paper performs a comprehensive study on the performance of AI algorithms regarding their accuracy;

We studied previous works of AI algorithms, specifically the algorithms used in IoT security in order to perform an evaluation performance by comparing the accuracy of the results obtained in AI algorithms;

This research paper gives a literature review related to AI, ML, DL, IDS and IoT security; it presents the state of the art of IoT attack classification and detection techniques based on AI;

The main contribution is to choose an approach to secure IoT systems using the ML and DL after making a performance comparison between the different AI algorithms used in IoT security;

This survey gives exploratory research on the AI and IoT security system. It also provides a comparative study of a recent IoT oriented datasets;

The second question that can be asked is: “can we use the chosen technique to classify and/or detect intrusions and attacks in the IoT security systems?”. in the other hand, I precise the task (classification and/or detection) in which we can apply the chosen algorithm in IoT systems; it also efforts to show the performance study of AI algorithms-based IDS (Intrusion Detection System);

It provides a comprehensive survey with a new and general taxonomy of supervised AI techniques. In this axis, it performs a new taxonomy of supervised classification methods used for IoT attack classification in order to give a general and a clear vision of the classification methods.

The value of this research in comparison to the existing studies is to be able to deduce or to know the novelty of each paper. The questions of research are: which is the best AI technique used for improving IoT security systems? which datasets are best for IoT security systems? For this reason, this paper provides a performance evaluation study and compares the many AI models that have been mentioned in the literature. This paper aims at getting some inspiring results and rational outcomes as well as to figure out which AI technique is best for classifying or/and detecting IoT attacks and intrusions.

The remainder of this article is arranged as follows: The second section deals with the AI and IoT security. The third section presents the current state of the art. The fourth section presents the research methodology that covers a comparison in term of accuracy between the various AI algorithms used for IoT security systems. Afterward, the fifth section provides results discussion. Finally, the last section closes the article.

AI and IoT security

This section is reserved to carry out a current state of our research subject based on several axes to help readers understand the contribution of AI to IoT security. To get a clear vision and provide a comprehensive survey in understanding the AI in IoT security, this section gives an overview on AI and IoT security taking into consideration several important keywords related to AI and IoT security including Machine Learning (ML), Deep Learning (DL), Intrusion Detection System (IDS), Classification, AI accuracy, and Prediction (best prediction accuracy). The following subsections focus on AI techniques in IoT security. This section can be helpful for the reader to get an idea about security attacks, vulnerabilities, requirements and solutions for each IoT layer and an overview of AI algorithms and its applications in the IoT security domain.

Internet of Things (IoT) system

IoT is a vast domain which contains physical objects, networks communication, technologies, hardware (devices, computers), protocols, electronics, platforms and applications that interconnects anything from the physical environment (physical objects, animals, places, plants, machines and peoples …) to the internet in order to exchange data without any human interaction 3 . IoT covers many fields starting from human activities to industry eras. The main goals of IoT 3 are to:

Provide the best services for the human being, to allow many smartly new applications in the medical, industrial, economic, educational, and even individual daily life levels;

Make human life more comfort and convenient;

Save cost and time;

Automate the interaction between environment and humans easily.

Architecture and functioning of IoT system

The architecture of IoT system deals with multi-layer. There have been different proposals for IoT layers, each of which uses diverse technologies and bring a number of possible security issues. This subsection presents in detail the architecture of IoT that we proposed recently in 4 to analyze security issues and problems. Moreover, we turn to the analysis of IoT attacks based on a four-layer architecture.

There is a lack of standardization in IoT architecture 1 , 3 , 4 . There are different proposals of the architecture of the IoT system. Some of these architectures of IoT system are composed of five layers 1 , 3 , while other architectures are composing of four layers 4 . For instance, the Cisco IoT reference model is composed of seven layers namely Collaboration & Processes, Application, Data Abstraction, Data Accumulation, Edge Computing, Connectivity, and Physical Devices & Controllers. The architecture of IoT that includes three layers namely physical/perception layer, network layer, and application layer would not be sufficient and not present in a great way the concept of the IoT, because it lacks cloud or middleware layer. Therefore, we need cloud or middleware layer, because we have a lot of continuous, enormous and sensitive data gathered from the Physical layer. Therefore, we recommend using the four-layer architecture. The underlying IoT architecture is composed of four layers, physical or perception layer; network layer; cloud/middleware layer; and application layer. In this research, we adopt the four-layer architecture because there are many advantages of using it:

It is a thorough architecture;

It reflects the IoT architecture;

It is very general 1 , 3 and presents in a great way the concept of the IoT 4 ;

Security at the middleware layer (storage/cloud/data) is not the same as security at the application layer (authentication/identification) 4 ; as more than we separate the problems, we find solutions.

The middleware layer is a necessary and an integral part of the IoT 4 . We need cloud because we have so much data generated by many connected objects.

However, there is no standardized IoT architecture, and there is an inexistence of a secure IoT architecture. Figure  1 shows the four-layer architecture of IoT system and its corresponding technologies.

figure 1

Proposed layers in IoT system 4 .

Physical layer

The main challenge of the perception/physical layer is the constrained devices on IoT systems 4 . This layer deals with sensors and devices which are used to receive/transmit data using different communication protocols such as Bluetooth, Zigbee, RFID, etc. RFID technology is mostly used for the automated information exchange between tags and readers using radio waves in the physical layer, it uses the Automatic Identification and Data Capture (AIDC) technology, attackers can destroy communication between RFID reader and tag, e.g., through RF jamming: RFID tags can also be compromised by kind of a DoS attack in which communication through RF signals is disrupted with an excess of noise signals. An attacker could physically modify the users of IoT system in order to get their sensitive data. Therefore, the exchanged sensitive data should be safe over heterogeneous networks and through constrained devices. The major security attacks that are encountered at the physical layer are as follows:

Eavesdropping : Attackers can easily eavesdrop/sniff the device of the physical layer;

False Data Injection Attack : The attacker captures the connected objects to inject erroneous data into the IoT system;

Unauthorized Access to the Tags : tags can be accessed by someone without authorization. The attacker cannot just read the data but the data can be modified or even deleted as well, Due to the lack of proper authentication mechanism in a large number of RFID systems, however, RFID technology is vulnerable to many attacks such us Replay, spoofing, Tracking, Virus, Eavesdropping, Unauthorized access, Man in the middle, Killing Tag, Counterfeiting.

Network layer

Several communication networks and protocols are used for connectivity such as LoRa, WiFi, etc. Network attacks are multiple, usually affects work coordination as well as information sharing between devices. Concerning the network layer attacks, the sybil attacks 1 , 3 are aimed at creating illusions in the network. At the network layer/level, the attacker targets the communication technologies of the IoT 4 due to vulnerabilities of network protocols todestroy network communication. An attacker can send fake routing information to contaminate the entire network. The network layer is highly vulnerable to Man-in-the-Middle (MITM) attack. Blackmailer endeavors to mess up the network by launching the DoS attack. The impact of a DDoS attack 1 (Distributed Denial of Service) is to disable the network. Moreover, the effect of security failures may contribute to the disruption of the whole network. Besides, the attacker can also exploit the lack of abnormalities and intrusions detection on networks. For that, an IDS should be implemented on this layer to monitor a network for malicous activities, as well as avoiding abnormal behavior of network participants and DoS attack that could lead to blocking the functionalities of part of the network. Furthermore, man-in-the-middle (MITM) attack can be caused by malicious node injection into the network. Therefore, an improvement of network intrusion detection using Artificial Intelligence (AI) is necessary. Besides, a new intrusion detection system (IDS) using AI techniques should be defined, implemented and validated on an actual case. Therefore, security on IoT networks is an important factor to consider. The main attacks against the network layer are the following 1 , 3 , 4 :

Phishing Site Attack : The network layer for the IoT is very vulnerable to such attacks (phishing sites attacks). Phishing site attacks refer to attacks where multiple IoT devices are targeted with minimal effort from the attacker. An attacker pretends to obtain sensitive information such as the credit card details and passwords of users;

Selective forward attack : the main goal of this attack is forwarding chosen packets by attacker to disturb routing paths;

Sinkhole attack : In this attack, a malicious node may announce beneficial route or falsified path to attract so many nodes to redirect their packets;

Sybil attack : In this type of attack, a malicious object may use different identities in the same network;

Wormhole attack : This attack can be launched by creating private channel between two attackers in the network and forwarding the selective packets through it;

Blackhole attack : a blackhole attack has been designed to drop silently all data packets that are meant to it by maliciously advertising itself as the shortest path to the destination during the path-discovering mechanism.

Hello flooding attack : Objects recently joining the net-work send broadcast packet known as a hello message. In this case, an attacker can represent himself as a neighbor object to several objects by broadcasting hello message with a high-powered antenna to de

Denial of Service (DoS) attack : is the most common attack that can affect network service availability and exhaust network resources to make it unavailable to users. It is a particular attack on a computational resource or on a network.

Middleware layer

Cloud is intended more for data analysis. We eliminate all the challenges of the first two layers (the perception layer and the network layer). This layer is about a distributed infrastructure to process and analyze IoT data 4 . It is also about decision making after eliminating and removing anomalous point from the data 3 or anomalious group of data that may lead to false decisions, which help to reduce true negative rate and false positive rate. The major security attacks that are encountered at the Middleware layer are as follows:

Man-in-the-Middle (MITM) Attack : Message Queuing Telemetry Transport (MQTT) is the most popular protocols utilized in IoT systems that transfer messages between two devices. It is an efficient and lightweight protocol that uses publish-subscribe communication model between subscribers and clients using the MQTT broker. Attacker can control the broker and become a MITM to gain complete all communications control without any knowledge of clients;

Signature Wrapping Attack : XML signatures are used in the web services. The attacker modifies/falsifies the eavesdropped messages by breaking the signature algorithm and exploiting vulnerabilities in Simple Object Access Protocol (SOAP). The SOAP interface is offered by EC2 (Elastic Cloud Computing) to control deployed machines;

SQL Injection Attack : The attacker uses SQL statements in the input data to delete, read, and write operations. Attackers can threaten the whole database system and obtain private data of users

Application layer

IoT computing provides the information that are visualized at this layer 3 , 4 . This layer is used to provide graphs for the decision and straightforward graphs on where the anomalies are? I.e. do we have anomalies in security? In the physical layer? As well as the control of the equipment, i.e., functioning or not functioning of the equipment, is the sensor which detects the temperature/humidity working or not? This layer process data from the cloud/middleware layer, as well as providing quality service to users. The major security attacks that are encountered at the application layer are as follows:

Buffer overflow is one of the most used attack in different applications and software. The buffer overflow attack entails violation of the data buffer or bounds of code by exploiting program vulnerabilities.

Malicious Code Injection Attacks : The attackers opt for the easiest method that they can use in order to break into a network. The attackers use Cross-Site Scripting (XSS) to inject malicious script into a trusted website. The XSS attack can paralyze the IoT system.

IoT attacks classification

Security requirements and attacks classification based on iot layer.

Security issues in IoT systems, like restricted resources (or resources limitation in IoT devices) 1 , weak security design 4 , access control, configuration errors of the system 1 , compatibility problem 4 , heterogeneity 4 (due to the different components of IoT environment with different characteristics and different communication technologies), scalability 3 , Big Data problem 4 , poor updates, the generation of continuous and enormous sensitive data over time (every second/minute/hour/day/week) 4 , interoperability 3 , inexistence standardization and lack of secure IoT architecture 1 , are the main challenges. To secure IoT environments, Lightweight Cryptography, Access Control, Encryption, Key Management, Intrusion Detection System (IDS) are some security solutions for IoT. Further, confidentiality, integrity, availability, privacy, authentication, authorization, non-repudiation, authenticity, identity, and compatibility are the main IoT security requirements 4 . We adopt in this research the four-layer architecture 4 (physical, network, cloud or middleware and application layer) to classify attacks. More in detail, these requirements are discussed in 4 . We propose a layer-based classification of vulnerabilities and threats on IoT. Therefore, a proposal for a secure IoT architecture (the most secure protocols…) is needed.

Table 1 represents security requirements, solutions and IoT attacks based on layer classification. It explores the vulnerabilities, threats and attacks in the IoT environment and the security solutions that can be implemented in the IoT layers. It aims at classifying the potential IoT threat and vulnerabilities by an architectural view in order to get a clear vision on how to address security requirements for each layer to avoid the current IoT threats on each layer. i.e., in the IoT environment we classified the security solutions, requirements for each layer separately. Indeed, the study of vulnerabilities on the layers of IoT systems was discussed in 4 . While, Fig.  2 shows classification of attacks on different communication technologies and protocols. More, the study of communication protocols of IoT systems were discussed in 1 , 3 , 4 . From the Table 1 , we conclude that the CIA (Confidentiality, Integrity, Availability) presents the most critical services to consider when implementing a solution to secure the IoT system. Therefore, we need to have a clear idea about the security solutions that we need to implement in a specific IoT layer in order to improve IoT architecture security. i.e., we need to secure the layers in order to secure IoT as whole. Thus, several attacks, i.e., Eavesdropping, Physical damage, Man-in-the-middle (MITM), are the most attacks in the physical layer. To overcome these issues, attack detection could be deployed.

figure 2

Taxonomy of IoT communication technologies and protocols attacks.

The classification of security attacks within IoT is designed to help IoT developers raise awareness of the risks of security threats and flaws so that better security solutions shall be incorporated. The most common security solutions in the physical layer of the IoT architecture includes Lightweight cryptography, Key Management (Table 1 ). In Table 1 , we give the proposed solutions and requirements which extracted from the recent researches 1 , 3 , 4 for each layer of the IoT architecture.

Attacks classification based on IoT communication technologies and protocols

As discussed in Refs. 1 , 3 , different communication technologies can be used to exchange and transfer data. IoT systems are heterogeneous, it uses different technologies which are not limited to, IPv6, Zigbee, 6LoWPAN (IPv6 Low power Wireless Personal Area Network), Bluetooth, NFC (Near Field Communication), Z-Wave, RFID (Radio Frequency Identification). However, these technologies are susceptible to several attacks.

The different communication technologies in the IoT systems allows objects to interconnect with each other in order to exchange data (capture, process, store, transfer data between the elements of the first layer of the IoT architecture and the other layers) 1 . Communication technologies can be categorized according to 1 , 3 :

“Short Range Communication Technologies” including: RFID, NFC, WSN, 6LowPan, Z-Wave, Zigbee, Bluetooth, Wi-Fi, BLE;

“Long Range Communications Technologies” or “LPWAN” including: Sigfox, LoRa/LoRaWAN, NB-IoT;

Cellular communication: 2G, 3G, 4G, 5G.

These technologies can be divided into three groups: Short Range, Long Range or LPWAN (Low-power wide area network) and Cellular communication. Each of these technologies in an IoT system bring a number of security attacks. Figure  2 shows various possible attacks on these technologies according to 5 .

As mentioned before, MQTT is the most widely utilized telemetry protocol in IoT 1 , 3 , 4 . If this protocol is not secure, it cannot be used like that. Therefore, an analysis of the protocol concerning (analysis of sending data, receiving data) is necessary. Hence, if the protocol is not secure, then, it must be encapsulated with a security layer SSL 1 .

Artificial intelligence (AI)

AI is about choosing the right decision at the right time. The domain of application of AI has become unlimited. Therefore, any field that exist now benefits from AI. AI in IoT Security? AI is a leading technology, which studies ways to build intelligent programs and machines that can creatively solve problems, it can be used to improve accuracy rate and prediction results. AI domains can include ML, NLP (Natural Language Processing), Robotics, Expert systems, Vision, Speech, etc. AI can also be used for IoT security and intrusion detection. In our study, we are interested in the use of AI (ML and DL) for IoT security. Therefore, the proposal of an IDS based on the chosen algorithm(s), then, the development and improvement of one or more reliable algorithms are needed, and finally, the integration of the IDS for adaptation to IoT systems is needed. Moreover, AI algorithms can also be exploited to create an anomaly detection system in IoT. Other than intrusion detection, the AI methods studied in this paper have been successfully applied in other areas, such as:

Healthcare: AI becomes a key factor in healthcare field, especially in surgical assistance, early diagnosis and prevention.

Industry: many AI use cases can be cited like robotics, and autonomous driving in automotive.

Education: Intelligent teaching and tutoring as well as science simulation are some noted example of AI application in education.

Digital Marketing: in order to provide high data quality, AI can be used for analyzing the collected data using different techniques like ML and classification.

NLP (Natural Language Processing), speech recognition, image processing, are other research fields.

AI has become a topic of the hour and extends more since its continuous development and performance in classifying and detecting IoT attacks. Further, it becomes more relevant for improving IoT security systems. For example, there are many AI techniques that can be used for an intrusion detection system (IDS). i.e., AI algorithms can be exploited in building/developing intelligent security mechanisms, they can be used in anomaly detection, intrusion detection, etc. Furthermore, AI can be used as plugins into open source IDSs to improve network intrusion detection. Concerning AI types, it has different forms: strong AI, narrow AI, and hybrid AI.

Strong AI: also called the full AI or Artificial General Intelligence (AGI). A Strong AI is a type of AI that replaces human intelligence. This type of AI has not yet been achieved and is not addressed for a specific issue;

Narrow AI: also called the weak AI which is designed for a specific problem. This type of AI lacks the human intelligence flexibility;

Hybrid AI: AI-based solutions that combine multiple weak AI in order to try to create a strong AI.

AI is the science of making machines do things that requires human intelligence. It is a system that attempts to mimic human thinking. AI raises basic issues about information processing, human intelligence, the mind/body problem, memory, symbolic reasoning, the origins of language, etc. 6 . AI plays a significant role and it is a good tool to adopt for securing IoT systems. AI is being used for improving IoT security systems as well. For example, several studies used AI techniques to detect attacks and develop an efficient IDS. Indeed, the most recent findings assume that these AI algorithms can be exploited for the creation of an anomaly detection system in the IoT. Therefore, AI is a tool or key solution based on various ML techniques. DL is a subset of ML based on ANN (Artificial Neural Network). The difference between the ML and DL is as follows: to complete the task, ML requires some direction, while DL does not require the intervention of any programmer. Moreover, the software must control the ML, while, DL might learn on its own. The bellow Fig.  3 shows the difference between AI, ML and DL. This following two subsections will discuss ML and DL techniques for IoT security.

figure 3

AI/ML/DL difference.

Machine learning (ML)

ML is a subset of AI, in which algorithms can learn without explicit programming. Therefore, ML techniques has been widely used to handle “Big Data”. Additionally, ML assumes a fundamental role in the IoT aspect to manage the immense volume of information/data produced by IoT objects. ML approaches like (k-Nearest Neighbors (KNN), Decision tree (DT), Naive Bayes (NB), Support Vector Machines (SVM), Random Forest (RF), Logistic Regression (LR), Support Vector Regression (SVR), etc.) could be included supervised, unsupervised, and reinforcement learning. The number of layers in a neural network might vary depending on how difficult the problem to be solved is. There are several types of the neural networks including Multi-Layer Perceptron (MLP), Artificial Neural Network (ANN), Recurrent Neural Networks (RNN), Convolutional Neural Networks (CNN). The neural networks can be used for improving the classification of attacks in IoT domain. Since, this different types of it were exploited in several scientific research works as a classification and detection of potential attacks in IoT system. Figure  4 shows a taxonomy of machine learning algorithms used for the security of IoT systems. ML algorithms can be divided into three major categories:

Supervised algorithms: in which the training set contains labeled data, so the goal of the algorithm is to make it once trained able to give its predictions on unlabeled data. In other words, the goal of the algorithm is to give its predictions on unlabeled data by using the training set. There are two types of this class including classification and regression. Supervised learning algorithms includes DT, K-NN, SVM, Naïve Bayes, NNs, etc.

Unsupervised algorithms: The data are unlabeled, and the algorithm must be able to find out the similarities between this data. K-means and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) are examples of unsupervised algorithm.

Reinforcement algorithms: allows the machine to learn by interacting with its environment. Q-learning is an example of reinforcement learning. Reinforcement Learning techniques have been used to secure IoT devices by detecting various IoT attacks.

figure 4

Taxonomy of machine learning algorithms used for the security of IoT systems.

Deep learning (DL)

DL is a subset of ML that use ANN (Artificial Neural Network) for computation. DL is composed of deeply layered neural networks. The term “deep” refers to the number of hidden layers in the NN, that is why DL models are often referred to as DNN. DL models can be categorized into supervised, unsupervised, semi-supervised. Feed Forward Neural Network (FFNN), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Generative Adversarial Networks (GAN) are some of the best-known DL algorithms. The CNN is composed of input, convolution, pooling, and fully connected layer. The LSTM method works in three stages: Forget Gate, Update Gate/input gate, and Output Gate. The particular AI algorithms used in security is also discussed. Figure  5 shows the AI approaches for IoT security. The two subsets of AI are ML and DL. These parts of AI are discussed and detailed in the fifth section. This survey deals with AI methods in IoT security systems. For example, HIDS (Host Intrusion Detection System), NIDS (Network Intrusion Detection System), and anomaly detection are common IoT security application fields where DL has been applied prominently. Actually, DL has taken a place in the fields of cyber security and IDS. This paper highlights the importance of these AI algorithms in providing security against attacks and intrusions in the IoT context. For that, this paper covered a state of the art on IDS (Intrusion Detection Systems), types, etc.

figure 5

AI approaches for IoT security. KNN k-Nearest Neighbors, SVM Support Vector Machines, DT Decision tree, NB Naive Bayes, MLP multilayer perceptron, FFNN Feed forward neural network, CNN Convolutional neural network, RNN Recurrent neural network, LSTM long short-term memory, RBM Restricted Boltzmann Machine.

Classification and detection methods

Classification with ai techniques.

It is a process of categorizing a given set of data into classes 3 . Classification technique is a supervised learning technique, in which the data is labeled. In other words, when the classes are predetermined and the examples known, the system learns to classify according to a classification model. These classes can be multitudinous or binary (such as anomaly detection). Classification assigns a category/class to data whose class is unknown. Classification problems consist in classifying attacks according to classes. There are various algorithms of classification including: KNN, SVM, NB (Naive Bayes), NNs, etc. Classification algorithms can be applied to detetct several types of IoT attacks to enhance and improve the performance of IoT IDS. Classification plays a significant role in building a performant IoT IDS to differentiate between the different types of IoT attacks. For this, the performance of these classification techniques must be tested. Moreover, classification is a supervised Machine Learning technique, which helps to secure the system and detect real attacks. It aims to make a prediction whether the data is properly classified or not. To do this, a training/learning set is used to train a classifier (classification algorithm), this learning set will also be applied to a test set containing similar data to classify and make prediction. Thus, based on the classification, intrusions can be detected. i.e. classification is used for IDSs based on multiple or binary classes. Therefore, choosing the best classification algorithm (AI algorithm) is based on performance evaluation/assessment in terms of accuracy rate. The primary challenge of classifying/identifying the correct categories/classes is finding the correct ratio/rate between the training and test sets.

Intrusion detection system (IDS)

Is a mechanism intended to detect abnormal or suspicious activities on the analyzed target (a network or a host). It thus makes it possible to have a preventive action on the risks of intrusion. In other words, it is software, hardware or a combination of both that aims to monitor a single host or network for malicious activities and to track and supervise the network security against any policy violation and cyber attacks. For instance, securing an IoT architecture by installing IDSs at its layers is a new solution to tackle security threats and vulnerabilities. Indeed, to enhance IoT security, the integration of IDSs into IoT system layers is required. In addition, IDS can be utilized in a healthcare use case in order to monitor human health and control human body by implementing sensors in blood, heart (WBAN Wireless Body Area Network). Therefore, an improvement of the IDS is required for adaptation to IoT systems. More, IDS is widely used to detect abnormalities and intrusions on networks. Basically, sensors can be implemented in clothes, heart of human to control its health condition. For example, Tesla has proposed a chip implemented in brain to control the health status of a human, i.e., Elon Musk hopes to be able, through his company Neurolink, to implant chips in the human brain from 2022 which is intended for medical use. Therefore, to test the operation of this chip, this system must include an intrusion detection system or an anomaly detection system or an analysis system. However, complex adaptive AI systems conduct the risk of promoting self-sustaining evolution of malicious systems that can imitate a cancerous development in the human body.

Detection is a kind of classification with two neurons in output (Normal and Attack). To detect intrusion, AI techniques must be able to classify abnormal and normal network behaviors. To get the best results, many AI techniques can be used to do intrusion detection in IoT system. Its main objective is to protect users from such attacks using AI techniques. Detecting attacks can be done by using classification. For instance, if we take an input, firstly, we detect whether it is normal or attack, secondly, we identify the class of the detected attack. Intrusion detection system (IDS) is a prevention mechanism on the risks of intrusion intended to detect abnormal activities on a network or a host. False positives and False negatives rates are the two fundamental concepts, the first one means an alert coming from an IDS but which does not correspond to a real attack (i.e., normal traffic is considered an attack.), the second is about a real intrusion that went undetected. The outlier detection or anomaly detection helps to minimize the true negative and false positive rates, which means it helps reduce AI false predictions and improve results. It also plays a significant role in securing IoT systems or any security system. Hence, IDSs are extensively used to keep track of harmful activity on a network or on a single computer. Its aim is to identify or find vulnerabilities and notify the system administrator or gives immediate alerts of any potential threats or attacks. Machine Learning and Deep Learning techniques can be used for detecting attacks for IoT. A comparison between the different types is needed. Figure  6 . shows intrusion detection system (IDS) types for IoT security. IDS based detection resources can be arranged into three categories:

Network based (NIDS-Network Intrusion Detection System): a NIDS listens to all network traffic, then analyzes it and generates alerts if any packets seem dangerous, it aims at detecting intrusions in real time. NIDSs are used to protect a company's IT assets.

Host based (HIDS-Host Intrusion Detection System): a HIDS is based on a single machine, this time no longer analyzing network traffic but the activity happening on this machine. It analyzes in real time the flows relating to a machine as well as the logs.

“Hybrid” intrusion detection systems: they are able to gather much more information from a HIDS system than a NIDS. Typically used in a decentralized environment.

figure 6

Intrusion detection system types for IoT security.

For detection methods in IDS, it can be arranged into three categories:

Signature-based methods: contains a database of known attacks signatures which can be identified by this approach. However, this method cannot detect a new attack;

Anomaly-based intrusion detection system: A system error can lead to abnormal behavior. The anomaly-based detection method aims to define the behavior of data and create a model as a knowledge to compare real records with abnormal behavior. This technique can detect new attacks and threats; it does not need database updates. We conclude that anomaly-based IDS is the best method for detecting new attacks unlike signature-based IDS and rule-based IDS despite the fact that it gives high true negatives and false positives rates. For example, anomaly detection can be applied for verifying the accuracy/integrity of the transferred data in the network layer which improves the security of the physical layer. Moreover, data integrity should be ensured. Algorithms for anomaly detection can be statistical methods, AI-based methods, etc.

Hybrid-based intrusion detection system: it combines both signature and anomaly-based intrusion detection system to detect new intrusions.

AI accuracy

Accuracy is an evaluation metric which is used to test models. It represents the overall effectiveness of the classification model. It is the ability to appropriately distinguish between intrusions and normal behavior. This work aims at making a comparison between the accuracy rate of different AI algorithms. The evaluation step aims at estimating the performance of the model on a test dataset (new input/data). The accuracy depends to the data preprocessing as well as the quality and to the quantity of data. The performance of the AI techniques has been evaluated based on accuracy metric, because all the selected papers use the “Accuracy” metric to evaluate their AI models. In terms of evaluation metric of AI models, we notice that the “accuracy” was the major metric and the main factor used than “precision”, “recall”, “F1 score”, “Loss”, and “AUC (Area Under the Curve)”. More, the evaluation of accuracy requires the use of an IoT-related dataset to reflect realistic/real-world IoT applications. Further, before applying any AI method, we need to clean and prepare the data in order to accelerate the learning process and achieve good AI accuracy. In order to prove that this is the best/right AI method to do IoT attacks classification and detection, we perform the performance assessment of multiple AI methods in IoT security.

State of the art

This section will review some researches on AI techniques used for IoT security to give more idea about the current state of the art on AI techniques and methods used for IoT attacks classification and detection. To overcome the security problem of IoT systems, and in particular, the classification and detection of IoT attacks and intrusions, many AI techniques have been proposed by several researchers. This literature review focuses on AI-based security for IoT systems until the most recent papers published in this area till 2023. This section consists in presenting the state of the art of IoT attack classification and detection techniques as well as detecting intrusions in an IoT system using AI. In this contribution, the objective is to be able to outline the methods used for classification or/and detection of attacks in IoT environment using AI and related in particular to ML/DL or other techniques such as the CTM method, steganography.

In 2021, Meziane et al., 1 and 3 proposed a new classification based on Classification Tree Method (CTM) to classify IoT attacks. It makes the evaluation of IoT systems more systematic and allows selecting attack test cases through CTE (Classification Tree Editor) tool. They try to solve the problem of classification. They exploit the usability of Classification Tree Editor (CTE) editor to generate appropriate test cases. In their research they present two approaches to improving the IoT systems evaluation process:

A systematic method to generate test cases;

Selection of test cases based on appropriate attack classification.

Although this method has several advantages including, all possible test cases are identified and relevant test cases are selected in a systematic manner, that helps to eliminate/reduce some errors and makes easier its management. However, the CTM method has a number of drawbacks. i.e., Although the CTM method has various advantages, nevertheless it suffers from some drawbacks which are mentioned in fifth section. Additionally, the CTM method has not been classified by any author in the previous works. For this reason, I also suggest adding or classifying the CTM method in the fifth section. The proposed classification of IoT attacks will be helpful for detection.

In 2022 4 , Meziane et al., proposed an IoT architecture that includes four layers: the physical layer, the network layer, the middleware layer, and the application layer. The study covered the architecture of IoT and the different technologies used, including communication protocols such as LoRa, LoRaWAN, 5G, as well as the characteristics and specifications of each layer in an IoT system. At each layer, the attacks and vulnerabilities are described and illustrated in detail. Moreover, the defined IoT architecture and its security requirements are also discussed in detail. The research performs the comparative study on IoT architecture and well-defined attacks. The results shows that we need to focus on securing the physical and the network parts. Because generally it is the parts that contain the big challenges. The resource challenge in the first layer (Physical or Perception layer), and the challenge of data heterogeneity in the second layer (Network layer).

In 2019 7 , O. Ibitoye, O. Shafiq, and A. Matrawy, used two DL models: FFNN (Feed Forward Neural Networks) and SNN (Self-normalizing Neural Network) to classify the intrusion in IoT network. The authors compared also the detection performance between SNN and FFNN using BoT-IoT dataset. As a results, FNN outperforms SNN. FNN get better results in terms of precision, accuracy, and recall. However, authors evaluated their models on only a single dataset.

In 2018 8 , Y. Zhou, M. Han, L. Liu, J.S. He, Y. Wang, have proposed a model for cyberattack detection in the IoT environment. To predict the intrusions, they trained the deep FFNNs using the back-propagation algorithm. For the NSL-KDD dataset, the accuracy has been above 95%, but for the UNSW-NB15 all models have achieved an accuracy < (less than) 95%. However, the used dataset may not represent IoT network traffic. i.e., NSL-KDD does not provide satisfactory results. Because it has two main issues: (1) it lacks recent normal traffic patterns, and (2) it does not cover recent attack patterns.

In 2018 9 , T. Aldwairi, D. Perera, and M. A. Novotny, evaluated the suitability of the applied RBM (Restricted Boltzmann Machines), to distinguish between abnormal and normal NetFlow traffic. They evaluated their approach by testing it on the renowned ISCX (Information Security Center of Excellence) dataset. Their results showed that the proposed method can be trained successfully for classifying anomalous and normal NetFlow traffic. However, the model is not compared with other similar models.

In 2017 10 , K. Vimalkumar and N. Radhika, used various ML techniques to create a big data framework, while, intrusions are detected using classification methods (DNN, SVM, RF, DT, NB). Based on the classifications, the intrusions were detected on the synchrophasor dataset. Even thought, the DNN model gets the highest accuracy (79.86%) than the other techniques used in their work, but this accuracy is < (less than) 80%.

In 2020 11 , Alsaedi, A., Moustafa, N., Tari, Z., Mahmood, A., & Anwar, A., evaluated the performance of several popular ML methods and a DL model in both multi-class and binary classification for intrusion detection purposes by the proposed Telemetry dataset for Industrial IoT (IIoT) and Industry 4.0/IoT. Moreover, due to the lack of benchmark IIoT and IoT datasets for assessing IDSs-enabled IoT systems, they proposed and described dataset, which is called TON_IoT. Besides, for evaluating the performance of seven supervised Machine Learning methods, various evaluation metrics (i.e., recall, accuracy, F-score and precision) were used. CART achieves good results than other techniques with a score of 77% for all the metrics, as well as 75% for F-score. The main finding of this evaluation was that CART and RF achieved the highest score in all metrics. Compared to the other methods, the results have also shown both KNN and LSTM had the second-best performance.

In 2019 12 , Y. Zhang, P. Li, and X. Wang, present in their work an intrusion detection model in IoT based on GA (genetic algorithm) and DBN (deep belief network). Moreover, the NSL-KDD dataset was used to evaluate algorithms and the model. For detection, this method reaches more than 99%. However, the used dataset does not target IoT system, which means that the used dataset may not represent IoT network traffic.

In 2020 13 , Ferrag et al., analyzed seven DL models including RNN (recurrent neural networks), DNN (deep neural networks), RBM (restricted Boltzmann machines), DBN (deep belief networks), CNN (convolutional neural networks), DBM (deep Boltzmann machines), and DAE (deep autoencoders). In addition, the authors used DL models for intrusion detection. Furthermore, they studied the performance of models in binary and multiclass classification (two categories of classification) using two new real traffic datasets, namely, the Bot-IoT dataset and the CSE-CIC-IDS2018 dataset. They have used three important performance indicators, detection rate, accuracy, and false alarm rate.

In 2019 14 , Ferdowsi et al., proposed a distributed GAN (generative adversarial network)-based IDS solution to detect intrusion in the IoT. In terms of accuracy, they show the superiority of their model, which has up to 25% higher precision, 20% higher accuracy, and 60% lower false positive rate compared to a standalone GAN-based IDS.

In 2019 15 , Liang et al., studied the application of a NN (neural network) in IDSs (intrusion detection systems), they used NSL-KDD dataset to test the proposed system. They applied a DNN technique for intrusion detection. The DNN has a good performance (accuracy rate was 97%) for detecting intrusions in an IoT environment.

In 2019 16 , Ge et al., presented in their work a Deep learning-based intrusion detection for IoT networks. The authors used FNN an intelligent binary and multiclass classification, but with just a few classes to get 99% in all evaluation measures (accuracy, precision, recall, and F1 score) for DDoS/DoS attacks while the normal traffic classification got an accuracy of 98%.

In 2019 17 , according to Yuan et al., the precision can reach 96% in binary classification about anomaly and normal. AC-GAN (Auxiliary Classifier Generative Adversarial Network) is also adopted. Results show that their technique could improve the precision of network traffic classification. The classifier they used has a good performance on the classification between anomaly and normal. They obtained a recall of 98% in anomaly traffic detection. However, no comparative analysis of other similar works for the evaluation of the model.

In 2019 18 , Nagisetty and Gupta used four different DL models such as MLP (Multi-Layer Perceptron), CNN (Convolutional Neural Networks), DNN (Deep Neural Networks) and AE (Autoencoder). In IoT networks, the authors aim at detecting malicious activities. Their performance evaluation is done using two datasets: NSL-KDD99 and UNSW-NB15 and they discussed their result analysis in terms of accuracy.

Sainis et al. 19 used five ML classification algorithm including NB, KNN, SVM, DT (C4.5) and RF for the attack classification problem under three datasets called NSL-KDD, KDD cup'99, and GureKDDcup. However, NSL-KDD and KDD Cup 99 do not provide satisfactory results. Because it has two main issues: (1) it lacks recent normal traffic patterns, and (2) it does not cover recent attack patterns. Moreover, authors used KDD Cup’99 dataset which is an outdated dataset.

In 2018 20 , Nazim Uddin Sheikh et al., discussed a signature-based IDS for IoT environments. The NSL-KDD dataset is used for testing their system. The proposed IDS consists of four components: Signature Generator, Pattern Generator, Intrusion Detection Engine, and Output Engine. In their experiments, they evaluate the false negative and false positive occurrences.

In 2018 21 , Diro et al., have adapted an LSTM to detect cyber-attacks. Their experiment was conducted on two datasets AWID (Aegean WiFi Intrusion Dataset) and ISCX datasets. The AWID dataset and the ISCX dataset both showed a similar trend for all measures (accuracy, recall, and precision). Furthermore, the precision-recall curve of LSTM is greater and higher than that of LR (logistic regression), which means that the system ideally and perfectly classifies normal and attack instances into their respective classes/categories. The obtained high values of recall and precision may be due to low false negatives and false positives, respectively. A low false positives and false negatives indicated high relevance in attack detection systems.

In 2020 22 , Kasongo et al., proposed a FFDNN (Feed-Forward Deep Neural Network) for binary and multiclass types of attacks, the efficiency of the model was tested using AWID and the UNSW-NB15 datasets. The proposed method was high compared to those algorithms k-Nearest Neighbor (kNN), Random Forest (RF), Naïve Bayes (NB), Decision Tree (DT), Support Vector Machine (SVM). Their experiment demonstrated that the proposed method obtained respectively 99.77% and 99.66% in accuracies for the multiclass classification and the binary. Their proposed approach has greater detection in accuracy than other techniques.

In 2019 23 , Hwang et al., used LSTM (Long Short-Term Memory) network model, their aim was to classify an incoming packet is a part of malicious traffic or normal. They have used the USTC-TFC2016, Mirai-RGU, ISCX2012 and Mirai-CCU datasets. They have assumed that the performance of theirs is competitive on classifying flows into malicious or benign with the prior work. In addition, authors have many interests in the classification of the incoming packet whether it is malicious or not, instead of considering the attack type in detail. The results show that the LSTM method reached 100% accuracy.

In 2019 24 , Ferrag et al., employs RNNs (recurrent neural networks) for detecting network attack. Three different sources the Power System dataset, a CICIDS2017 dataset, and a Bot-IoT dataset were studied for the performance of the proposed IDS. The better accuracy for CICDS2017 dataset is 99.811%.

In 2019 25 , Koroniotis et al., focused on AI-based detection algorithms for IoT networks. They proposed a new dataset called Bot-IoT, they carried out binary classification and their prediction output of models was classified as normal traffic or attack. They compared their dataset to other datasets that were publicly available. The proposed dataset was claimed to be the only one in the comparison that include IoT traces. To evaluate the quality of the new dataset, SVM, LSTM, RNN were utilized to train a classifier. The obtained results show that the Bot-IoT can be utilized to train accurate models, with RNN and LSTM outperforming the SVM implementation. However, they did not classify the output into the different categories of attacks.

However, each time an article appears which tests on a data (network data KDD Cup 99) and gives a new accuracy and precision, which is insufficient. Because, they tested on a data which does not reflect the reality of IoT. Recently, there appear new and specific datasets for the IoT, it is a need. To obtain the best performance results, we extracted the information from these papers 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 as illustrated on Table 2 . After that, we summarized, analyzed and compared these results found by focusing on the used dataset(s), the AI algorithm(s), and the accuracy. The main results of this research are discussed in the Section V.

Research methodology

This section presents a survey on AI for IoT security, in which we will study previous works from 2017 to 2023 by comparing the results obtained. The survey is the only one that collects and compares the results of other related works that experimentally test the performance of one or multiple ML/DL methods in IoT attacks classification/detection and presents the current state of research. This section is based on two processes: a search process and a selection process. The following section will be based on an in-depth analysis process and a synthesis process. In this study, we used the criteria (iot OR "Internet of Things") AND security AND (ai OR "artificial intelligence") AND (IDS OR “intrusion detection system” OR intrusion AND detection) to narrow and limit our research. Afterward, we selected and covered only the well-defined, recent and relevant papers which were published between September 2017 and February 2023, as well as we ignored several papers which are not relate to the purpose of this research, in addition to irrelevant studies. Then, we limited our study by excluding workshop proposals, overviews, theoretical study, abstract-only, research not related to the proposed research questions, and finally we got 28 papers for this survey. So, this work is based on well-defined papers which mean the papers that define the purpose (for IDS or other solution), the used algorithms and gives final results. The obtained results were extracted from the Scopus database. The purpose of the followed methodology used in this survey is to compare the accuracy of the proposed AI techniques to find the best one and obtain some inspiring results by performing an in-depth analysis of selected works published over the last seven years. The objective of this contribution is to choose in the first step the best score (or the perfect accuracy) of the AI methods, then, if the choice of this AI algorithm is for classification or/and detection use. To do this, at first, an analysis on published papers on AI-based IoT security system has been presented. This performance comparison aims us to deduce the algorithm that gives the best results in order to determine which is the effective one. At the end, we present possible challenges of AI with IoT and their future research directions.

The proposed methodology followed in this study is based on a large comparison between previous research concerning the method used, dataset used, the parameter used. A comparison of accuracy rate will be made in order to deduce the best AI technique among them. This paper is a survey of surveys/(related works) which aims at reducing tasks in terms of time. As Table 2 shows, there are wide studies devoted to the IoT security systems including classification and detection, which show and demonstrate the importance and relevance of the topic. Table 2 shows the main results extracted from these recent papers.

In the Table 2 bellow, accuracy is used for performance evaluation. Because, in the literature, several researchers evaluate and test their algorithm using the accuracy metric. Based on the previous section, we will be updated with the new datasets used for IoT security, the techniques used, and the obtained accuracy rate. The performance study of the AI algorithms including ML and DL has been done related to some essential parameters like year of publication, objective of the contribution, dataset(s) used, algorithm(s) used, results obtained in term of accuracy, the treated problem(s) or task(s) in which the algorithm(s) were applied, it includes: (1) C: classification; (2) D: detection; and (3) C-D: Classification and detection of intrusions, anomalies (deviants, outliers) or attacks. Datasets are the collection of data that can be used for training and testing the AI methods. Accuracy is used as the reference value of each algorithm. The bold value is the high accuracy obtained from the literature.

Results and discussion

This paper presents a performance study and a comparative study between AI algorithms for improving IoT security systems. The goal of this section is to explain clearly and deeply the obtained results. Further, we gathered all these results in order to be explored in our future research. This section comprises of four subsections. It provides a comparison of datasets and AI algorithms for improving IoT security systems. The first subsection provides insights of the analysis and comparison of publicly available datasets that have been applied in IoT security. The next subsection details on the analysis on AI methods for classifying and detecting IoT security attacks. The third subsection proposes a new taxonomy of AI algorithms for IoT security. Finally, the challenges of AI with IoT were discussed. This means that the last two subsections mainly revolve around challenges and applications of AI including machine/deep learning (ML and DL) in the concept of IoT security. The following subsections will provide the insights gained.

Authors use more than one algorithm and compare them on dataset or more datasets. Cross Table 2 contains all these results. It is a comparison table between the different algorithms, the datasets used for the tests by each author. This survey presents the different methods used, the different algorithms used as well as a comprehensive and comparative study on all the algorithms used for the security of IoT systems.

This paper provides a good reference for researchers and readers in the IoT domain to develop new solutions for improving IoT security systems based on AI. In order to execute the conducted survey, we followed the methodology proposed in this work, their purpose is to compare the accuracy of the proposed AI technique to obtain some inspiring results. Based on it, the first observation is that different works might use different datasets and models, i.e., each researcher has its own method of working, and each one of them aims at improving the effectiveness of its obtained results. This means, in this literature review, some surveys have been exploited one dataset to evaluate their model or (more than one model) while others used different datasets for testing their model/method (models). The integration of AI in IoT systems based on many directions of perspective will enable the improvement of IoT security systems.

Analysis and comparison of publicly available datasets

This subsection provides a comparison of datasets obtained from the literature by critically analyzing the strengths and weaknesses of each dataset. Each dataset has its own features. The aim is to classify the datasets and choose the best for us. Regarding the datasets, choosing recent and publicly accessible datasets on IoT is challenging. Hence, efficient datasets are needed in order to implement the (chosen) AI technique in IoT environment.

The major challenge in developing attack detection is the lack of appropriate IoT datasets. Therefore, in IoT domain, to develop attack detection, we need appropriate IoT datasets. Datasets must be suitable for the IoT systems, which means specifically for IoT, large, new, rich, high quality, updated, publicly available, and very sufficient in order to improve learning and get the best results as well as to increase the security in the context of IoT networks. Data/dataset preprocessing using the Python environment (the universal programming language) is an important phase for identifying features and cleaning data. The learning base must be rich enough. Therefore, to improve learning, and get a high recognition rate as well as to obtain complete results, a large and very sufficient dataset is needed. Indeed, these datasets must really contain attacks on IoT systems.

The used datasets like KDDCUP 99 19 , 33 , ISCXIDS2012 21 , NSL-KDD 8 , 12 , 15 , 18 , 19 , 31 , and CSE-CIC-IDS2018 13 are not IoT-oriented. Whereas, datasets such as Bot-IoT 7 , 13 , 24 , 25 , 30 , 31 , 35 , TON_IoT 11 , 28 , 32 , MQTT-IOT-IDS2020 36 , MQTTSet 37 , and IOT-23 38 are the typical IoT datasets. The following Fig.  7 displays typical IoT datasets obtained from the different related works. Indeed, the UNSW-NB15 8 , 17 , 18 , 22 , 33 dataset is showed in the Fig.  7 because it is composed of some IoT traffic. The goal is to choose a dataset among the datasets found. Based on Table 2 , works 8 , 12 , 15 , 18 , 19 , 31 , 33 were based on old resources, which is outdated, the sources must be updated. Recently, TON_IoT and Bot-IoT are used a lot 7 , 11 , 13 , 24 , 25 , 28 , 30 , 31 , 32 , 35 . Moreover, the Bot-IoT dataset is widely used a lot because it is the first created (2018) (see Fig.  7 ). Based on Tables 2 and 3 we can conclude that:

Between 2017 and 2019, there is an orientation to KDDCUP 99, NSL-KDD which are not compatible to IoT environment. Several authors use them 8 , 12 , 15 , 18 , 19 , 31 , 33 ;

Between late 2020–2021: Generation of new datasets that are 100% compatible with the IoT environment

figure 7

Typical datasets for IoT security obtained from the different related works.

According to 25 , existing datasets, introduce various challenges, like poor reliably labeled data, a lack of attack diversity such as redundancy of traffic, botnet scenarios, and missing ground truth. However, the Bot-IoT is a new dataset addresses the aforementioned challenges. The BoT-IoT dataset was created in 2018, specifically for IoT systems, it comprises both attack traffic and legitimate. DoS (HTTP, TCP, UDP), service scan, DDoS (HTTP, TCP, UDP), keylogging, and data exfiltration, are the included attacks. The collected dataset was from a realistic representation of an IoT network, it includes botnet and normal traffic. It contains over 72 million records of network activity/traffic. These records were collected from a simulated IoT environment 11 . However, this dataset does not include sensor readings of IoT devices. The TON_IoT dataset contains heterogeneous data sources gathered from the Telemetry data of IoT/IIoT services, and the Operating Systems logs as well as Network traffic of IoT network, that were collected from a realistic representation of a medium-scale network designed at the Cyber Range and IoT Labs at the UNSW Canberra. It includes nine types of cyber-attacks namely DDoS, DoS, Scanning, password cracking attack, Cross-site Scripting (XSS), data injection, ransomware, backdoor and MITM (Man-in-The-Middle) 11 . The TON_IoT has various advantages and new properties which are currently lacking in the existing (state-of-the-art) datasets 11 :

TON_IoT has various attack and normal events for different IoT and IIoT services;

TON_IoT includes heterogeneous data sources;

TON_IoT was collected from a testbed with a realistic representation of a IoT architecture for communicating Edge, Fog and Cloud layers.

The MQTT-IoT-IDS2020 dataset (MQTT internet of things intrusion detection dataset) was built by a simulated MQTT network. It includes twelve sensors, a simulated camera, a broker, and an intruder/attacker. This dataset contains five labels, four labels are attacks as shown in Table 3 , and the fifth is for normal. A recent IoT dataset specific to the MQTT protocol and adopted for IoT networks was created in 2021, called MQTTSet, it includes an MQTT broker and eight IoT sensors in a smart home like humidity, door opening/closure, temperature, CO-Gas, smoke, fan status, light intensity, and motion at different temporal intervals. IOT-23 38 was concerned on DNS traffic in the IoT context.

The novelty focuses on introducing recent IoT datasets. The Table 3 presents a comparative study between the most recent IoT datasets (created between 2018 and 2021) which containing IoT traces. Bot-IoT dataset did not contains a variety of types of attacks. KDDCUP 99, and NSL-KDD datasets are not for IoT systems and outdated. As a consequence, the UNSW-NB15 dataset is eliminated in the Table 3 , because it was not new, it was created in 2015 by N. Moustafa and J. Slay 41 , and it was not an IoT dataset, it is for a general purpose, as well as it does not include specific characteristics of IoT/IIoT app. Only network features that were extracted from MAC (Media Access Control) layer from 802.11 wireless network are included in the AWID dataset. KDDCUP 99, NSL-KDD, UNSW-NB15, AWID, Bot-IoT, and ISCX do not include IoT telemetry data.

Table 3 only shows the most publicly available cybersecurity IoT datasets and not network security datasets. Table 3 compares the datasets analyzed during this current study and illustrates the comparison criteria used: (1) Year, (2) Attack categories, (3) Availability, (4) Authors, (5) Application, (6) Developer, (7) Web link, (8) Labeled data, (9) Size, (10) Drawbacks. Based on Table 3 , we can say that we found only few datasets for IoT named MQTT-IoT-IDS2020, MQTTSet, IOT-23, Bot-IoT, TON_IoT. These IoT datasets are publicly available, free, downloadable via their persistent web link to datasets and not on reasonable request of their corresponding author (i.e., data are not public). If the data are not public, then, it will be available from the corresponding author on reasonable request. No permissions are available on request by the Publisher for analyzing these datasets as shown in Table 3 .

Among the comparison, we can conclude that TON_IoT is a newly generated and publicly available dataset in IoT and IIoT networks for evaluating an IDS and it has many advantages compared to other existed datasets. The TON_IoT dataset contains IoT/IIoT service telemetry. This comparison found that TON_IoT has a better advantage than older datasets. This is a recent source, the sources must be updated and not outdated.

In conclusion, the majority of IoT datasets that have recently been published were created to test IoT network-based IDSs. However, they do not have the actual data (i.e., measurement/telemetry data) generated from sensor readings; instead, to detect attacks on IoT networks, they primarily comprise flow/packet-level information, or a combination of both 11 . This dataset (TON_IoT) includes 7 IoT representing Thermostat, Fridge sensor, Motion light sensor, Garage door, Weather, GPS (Global Positioning Sensor) sensor, and Modbus. Based on various IoT device data, this dataset offers an accurate environment for an IDS for IoT devices.

For training and evaluation, such IDSs require an up-to-date and representative IoT dataset. The evaluation of AI methods including the intrusion detection methods plays specific role. Indeed, the evaluation of the efficiency of IoT security methods and the accuracy are related to the used IoT datasets which must reflect real-world for IoT applications. The use of IoT-related datasets that reflect real-world IoT applications plays a vital role. However, benchmark IoT and IIoT datasets for evaluating IDSs-enabled IoT systems are lack 11 . In addition, there is a lack of availability of real-world datasets for IoT and IIoT applications 11 .

Analysis on AI methods for classifying and detecting IoT security attacks

Concerning AI algorithms, the chosen model is the LSTM model that is a DL method, because it achieved the high performance in term of accuracy, as well as it was achieved the highest accuracy in several works as example, we cite 23 , 25 than other techniques. After this analyzed works, the concluded results show that LSTM is the best one for classifying the incoming packet into the abnormal and normal state 23 . The evaluation results of 23 showed that the proposed framework on four datasets including (1) IoT data set collected on their Mirai botnet (Mirai-CCU), (2) USTC-TFC2016, (3) ISCX2012, and (4) IoT data set from Robert Gordon University (Mirai-RGU), can achieve near 100% accuracy as well as precision in detecting malicious packets. In other words, the best AI algorithm prediction accuracy was 99.99% and going to 100% on the four datasets. On the other hand, the LSTM algorithm can perform the classification with nearly 100% accuracy 23 . So, with this model, they obtained an accuracy of 100%, a precision of 100%, a recall of 100%, a F-Score of 100%, and a false positive rate of 0%. In other words, the obtained results are promising not only in terms of accuracy but also in multiple metrics like precision, recall, F-Score and false positive rate. The second observation is that the highest and the perfect accuracy gained was by the LSTM model.

Authors of 11 , demonstrated that the LSTM model can be formulated as a classification problem in a supervised manner to be used for attack detection. To sum up, and based on 13 , 22 as well as several searches, I observed that the DL-based classification approaches is gaining the best classification accuracy or performance than classical ML approaches for both the binary classification and the multiclass classification. The results show that LSTM outperforms other models. Finally, the LSTM outperformed other Artificial Intelligence algorithms, especially, the Deep Learning (DL) algorithms. Whereas, the lowest accuracy score was 54% which has done by NB (Naïve Bayes) method according to 11 . The third observation is that DL-based classifiers/models are more accurate than classical ML methods. i.e., classical ML methods lack accuracy, scalability, and robustness. In addition, memory requirements 21 and detection speed of DL are better compared to those of classical ML. The main drawback of traditional Machine Learning and many Deep Learning techniques is the lack of memory 21 to recall previous events. This limitation is solved by RNN.

The next question that can be asked is as follows: can we use the LSTM (Long Short-Term Memory) technique to classify and/or detect attacks and intrusions in the IoT security systems? based on 13 , DL techniques, especially Neural Networks types (NNs) can be used for attack classification and anomaly detection tasks in IoT. In addition, in 25 the LSTM model was evaluated on binary classification on two datasets and it can also be used for anomaly detection 21 have used LSTM for attack detection and considered two datasets utilized for multiclass and binary classifications. (Hwang et al. 23 ) used it for classifying malicious traffic, they also suggested an IDS for detecting malicious network traffic. Besides, in 11 , it can be used for attack detection. The fourth observation is that the AI is a good tool to adopt for improving IoT security system, especially, the LSTM-based DL approach that can be used for both tasks (classification and detection). For instance, in the work 25 , the LSTM model has been used for anomaly detection task, whereas, in 13 , it has been used for anomaly detection and attack classification tasks. Many surveys try to evaluate the performance of AI algorithms, the best and the highest accuracy reached was under a LSTM technique. But all of rest (including ML and other DL techniques and algorithms) the accuracy doesn’t reach 100%.

Regarding the performance of AI techniques, the methodology followed in the Table 2 , shows that all accuracies rates are more than 74.78%. The accuracy of the LSTM model is about 99.99%. So, if we compare it with the other results of the literature works, we can say that, the LSTM method has been successfully applied for detecting anomalies and intrusions as well as attack classification or classifying malicious traffic, which also means that the LSTM model classify the incoming packet into the abnormal and normal state.

As a result, LSTM is classified as the best algorithms among the AI techniques, in spite the fact that it recorded high accuracy. This choice of the model is justified by a comprehensive analysis and a performance evaluation of the existing AI methods used especially for IoT security systems. This research has been made in order to improve the IoT security systems. For this reason, I precise the task (classification and/or detection) in which we can apply the LSTM algorithm in IoT systems. In other words, choosing a wrong AI technique could lead to loss accuracy and effectiveness. Indeed, choosing a wrong dataset will produce incorrect and erroneous results. The pre-processing of the chosen dataset (including identifying features and cleaning data) is also important for improving the prediction accuracy in the IoT security. Additionally, in the fifth observation, I concluded that the DL techniques is more suitable for attack classification and anomaly detection tasks compared to ML in the IoT environment. In other words, DL-based algorithms have better classification and detection accuracy as compared to the traditional ML models for IoT attack classification and detection. To compare the performance of AI algorithms, many studies have been conducted using different approaches, datasets, and methodologies for different use cases.

After obtaining these promising results, we can conclude that the DL based LSTM technique can be an ideal solution not only for attack classification, but also for attack detection. The results and findings clearly show that the LSTM algorithm is the best way to resolve both classification and detection problems in the IoT domain. The LSTM method achieved the best performance in the papers 11 , 13 , 21 , 23 , 25 . Through these previous experiments, we concluded that LSTM algorithm delivers best results than other AI classifiers in terms of accuracy for classifying and detecting attacks and intrusions in IoT systems. According to these studies the LSTM method allows to achieve the best accuracy rate, so, the classification and the detection of IoT attacks and intrusions by LSTM leads to good results. In terms of a comparison between the different ML and DL techniques, we compared the AI algorithms to deduce the one which gives us the suitable/best results, and we got the best results with the LSTM. In other words, Table 2 has been showed that the LSTM algorithm is more accurate than the other AI algorithms used in the literature. LSTM's accuracies range from 98 to 100%, for the BoT-IoT, AWID, Mirai-CCU, Mirai-RGU, ISCX2012, USTC-TFC2016, and CSE-CIC-IDS2018 datasets.

In the second step we will detail the proposed new taxonomy of AI algorithms for IoT security. Indeed, it analyzes these AI algorithms by giving a summary comparison, in order to have a comparative study between the most known classification techniques for IoT attacks. This study aims at illustrating the applications of some of the best-known ML/DL approaches to a classification problem of IoT attacks by covering and detailing the advantages and disadvantages of each one. This survey, covers and combines several technologies and methods for securing IoT systems.

Proposed new taxonomy of AI algorithms for IoT security

In this subsection, I try to propose a general taxonomy to summarize the supervised classification algorithms used in IoT security. The objective is to present the most known method for classifying attacks in IoT security. In this subsection, the suggested taxonomy of supervised classification techniques can be applied in IoT security. Thus, in this paper, we provide an overview of AI techniques and its applications in IoT security, based on that we propose a novel taxonomy of AI-based IoT security to improve the success of AI techniques in IoT security systems. The aim is to explore ML and DL for IoT attack classification and present a new taxonomy of them, with also a comparison between these classifiers.

A detailed existed taxonomy is given by 42 summarizes a general outline of taxonomy of ML algorithms, it contains 63 algorithms categorized into 11 classes namely DL techniques, NN, Bayesian, DT, Regression, Clustering, Regularization, Rule System, Instance based, Ensemble and Dimensionality Reduction. Despite its advantages, this taxonomy has not got a clear vision. Further, this taxonomy lacks some important algorithms of AI techniques, and the CTM method will also be categorized.

The idea is to summarize the eleven classes existing in 42 in only six classes and each class is divided into subclasses. Moreover, it is still lacked the most popular ML algorithms like SVM, Reduced Error Pruning Tree (REPTree), Multilayer Perceptron (MLP), Feedforward NN (FFNN), Stochastic Gradient Descent. So, these algorithms must be added in ML subclasses, but, Recurrent Neural Networks (RNNs), Long Short-Term Memory Networks (LSTMs), Generative Adversarial Networks (GAN), Feed forward Deep Networks (FDN) can be categorized as DL techniques.

The taxonomy existed in the image/figure 42 lacks some important and known algorithms like SVM, that we will be categorized in Instance based. Besides, Recurrent Neural Networks (RNNs) and Long Short-Term Memory Networks (LSTMs) will be classified in DL techniques. Furthermore, Multilayer Perceptron (MLP) and Stochastic Gradient Descent will be categorized in ANN. Moreover, GAN and FDN can be classified according to this survey in DL techniques. Otherwise, the study in 43 categorized FFNN in NN techniques. Thereafter, REPTree is another DT learner used for classification, it is aimed at building simpler and faster tree models using information gain for splitting.

Additionally, the Classification Tree Method 1 , 3 (CTM method) does not classified by any authors in the literature review. So, for this reason, I suggest to classify CTM method in the DT class (see Fig.  8 ). A brief taxonomy of AI subsets is proposed in Fig. 8  in which, six major categories of classification methods are mainly presented: Bayesian algorithms, Decision Tree, Ensemble algorithms, Instance-based algorithms, Neural Network and DL techniques. Each class of classification techniques is divided into several algorithms.

figure 8

Taxonomy of AI techniques used for classification task for IoT security.

In this work, we eliminate the Self-Organizing Map (SOM) from the proposed taxonomy, because, it is an unsupervised ANN learning. However, DL techniques are not limited only to CNN, DBM (Deep Boltzmann Machine), DBN (Deep belief network), SAE (Stacked Autoencoder), but there are others techniques like RNN, DNN, AE (Autoencoder), RBM (Restricted Boltzmann machine), GAN. In this context, this class can be divided into two subclasses, the first one is supervised DL techniques which contains DNN, RNN, CNN, DBN. The second one is unsupervised DL techniques including GAN, RBM, AE.

In addition to this, ML algorithms can be grouped into three major categories, namely, supervised learning, unsupervised learning and reinforcement learning, for each class there are subclasses. This subsection will limit its research in only supervised learning, because the classification techniques belong it. In supervised learning there are two subclasses named classification and regression algorithms. Unsupervised learning contains clustering and dimensionality reduction.

In this context, we are interested on supervised learning and more precisely on supervised classification algorithms that can be used to classify and detect attacks. Most supervised learning algorithms attempt to find a model (a mathematical function) that explains the relationship between input data and output classes. This study aims at illustrating the applications of some of the best-known ML approaches to a classification problem of IoT attacks by covering and detailing the advantages and disadvantages of each one. Then, in the following, we displayed this existing classification methods/classifiers with their working principal, each classifier has its own strength and limitation. Among them we cite the most popular classifier: k-NN, SVM, DT, BN, RF, etc.

Several recent surveys in this area include 44 , 45 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 have been conducted on several aspects of AI and its integration with IoT security, however, most of these studies are theoretical. There is no promising work on integrating AI models into real IDS systems for IoT in order to eliminate both false negatives and false positives, as well as to make a real test and evaluate their models with real attacks with real network traffic and deploy it in reality (in real IoT environment).

The study 49 also covered some aspects of DL and big data for IoT security. There are researches that work on IoT attacks classification or/and detection using different AI algorithms or use other methods such as statistical models to detect anomalies in the IoT system. Most of the current work in the field of AI can be called “narrow AI” 48 . Narrow AI means that AI-based solutions which address and solve a specific problem. Thus, the opportunities and possibilities of both AI and IoT can advance 48 when they are combined. The IoT needs to rely on AI because it is impossible for a human being to find information in the data generated by the IoT 48 . Without AI, the data generated by the IoT remains useless. Furthermore, if a new pattern is detected in the data, the machine will be sufficiently capable of learning on its own, which would be impossible on any non-AI IoT system to do 48 .

To have a relevant AI classification technique, we also aim to make a comparative study of the AI classification techniques with other classification method proposed by 1 , 3 . There are many researches to classify IoT attacks. Each author/researcher has its own method of working. Each one has different techniques in classification and/or detection. There are researches that work on classification or/and detection using different intelligent ML or DL algorithms or use other classification method such as the CTM method (Classification Tree Method). These parts of AI are discussed in the following. This present survey focuses on presenting the strengths and limitations of each approach that can be used for classification of IoT attacks. It will also provide an overview of the most popular ML and DL algorithms to get an idea of the supervised learning approach used in classifying IoT attacks. The following subsections will demonstrate different types of classification learning, including ML (KNN, SVM, DT, NB, etc.) and DL (CNN, RNN, etc.). To do this, we will mention in detail the existing methods by describing briefly their advantages/disadvantages with also the IoT security application of it. A taxonomy of AI techniques used for classification task is shown in Fig.  8 . This proposed taxonomy can help researchers address AI techniques in the context of classifying and detecting attacks and intrusions to improve IoT security systems.

This section studied different AI techniques. The goal of it is to make a general comparative study due to the lack of updated work on comparing different ML and DL techniques. In this work, three different classification algorithms are provided which are: CTM method, ML and DL techniques. The main objective of this systematic research is to present the AI algorithms used in classification tasks. This survey takes a deeper look and analysis at how AI algorithms are being used to classify attacks in IoT domain. In addition, the following tables summarize what is detailed as advantages and disadvantages. Consequently, the goal is to make a synthesis and choice of algorithm that I found better compared to others and can be exploited in IoT security in general and IoT attacks classification and detection in particular. So, AI algorithms are important in securing IoT systems. In the following, we make a comparative study for different AI classification techniques used for IoT security purpose. Thus, we critically deal with the AI algorithms and provide their applicability in IoT security systems.

The CTM method 1 , 3 classifies the various attacks on IoT security, and embedded systems to automatically generate test cases and selects the relevant test cases. Indeed, it is a formal method for graphically representing test cases. It provides a systematic procedure for creating problem-specific test cases. Its basic idea is to ignore test data and separate the input of the test into several classes/subsets according to the aspect that is relevant by the tester.

KNN (K-Nearest Neighbour) method is easy to apply, simple, cheap, and it is used in malware detections, intrusion detection, and anomaly detection in the IoT, DoS/DDoS attack detection 47 , 52 , authentication of an IoT element 50 . The principle of the k-NN algorithm is based on the choice of the class from the classes of the nearest neighbors, that is to say, it is about making decisions by looking for one or more similar cases in the learning set. The trick is to determine the similarity between the data instances. k-NN captures the idea of similarity (also called distance proximity or closeness). This method requires choosing a distance (Euclidean distance) and the number of neighbors to take into account.

SVM (Support Vector Machine) has a high level of accuracy that makes it suitable for IoT security applications such as intrusion detection, smart grid attacks, malware detection, DoS/DDoS attack detection 47 , 52 , authentication of an IoT element 50 . However, the disadvantage of using this technique is that sometimes it is difficult to use an optimal kernel function 52 .

DTs (Decision Tree) are used as a classifier in security application such as intrusion detection 50 , DDoS, Device authentication; DoS/DDoS attack detection 47 , 52 , authentication of an IoT element 50 .

NB (Nave Bayes) is used in IoT for anomaly detection and detecting intrusion in the network layer. It can also be used for device authentication 47 , 50 , 52 . NB is a probabilistic model based on Bayes theorem. The Bayesian network is a probabilistic graphical model for knowledge acquisition, enrichment and exploitation.

EL (Ensemble Learning) is used for intrusion detection, malware detection, and anomaly detection 47 . It is well suited for solving most problems because it uses several learning algorithms. However, EL has high time complexity compared to another single classifier 52 .

RF (Random Forest) uses a couple of DTs to create an algorithm in order to obtain an accurate and strong outcomes estimation model. It is used in anomaly detection, DDoS attack detection, Device authentication; DoS/DDoS attack detection; Intrusion detection; Malware detection, unauthorized IoT devices identification in network surface attacks 47 , 52 .

NN (Neural Network) techniques reduce network response time and therefore increase IoT system performance 47 , detecting DoS attacks in the IoT networks. However, the disadvantage of the use of NNs resides in the” black box” aspect of NNs, i.e., only the overall decision appears at the network output, without understanding and having information about what is behind and the way that this decision was. Which means that we cannot assume the performance of NNs. Now, the NNs are applied for all kinds of applications in multiple fields. The NN can modify itself based on the actions results, which enables the resolution and learning of problems without any conventional programming. A NN is defined by: (1) The number of layers; (2) The number of neurons per layer; (3) The activation functions; (4) The set of all the weights relative to the neurons.

ANN (Artificial Neural Networks) is one of the most widely used algorithms in ML. ANN can be used for Security of IoT networks 50 , Anomaly/Intrusion Detection 44 , DDoS attack detection. The ANN is composed of 3 layers (input, hidden and output). Each layer consists of one or multiple neurons. If ANN was trained with updated 52 datasets, it performed better in DDoS attack detection.

CNN (Convolutional Neural Networks) and RNN (Recurrent Neural Networks) 52 can be used for device authentication, intrusion detection; malware detection, DoS/DDoS attack detection. CNN is a supervised learning, which is also a variation of the regular NNs. CNNs also known as convolution nets (ConvNets) are a category of NN (Neural Networks). It’s composed of an input layer, many hidden layers between the input and the output and an output layer. i.e., it is composed of an input layer, followed by one/more convolutional layers, a pooling layer, one/more fully connected layers, finally an output layer. The general CNN architecture consists of convolutional, pooling and fully connected layers. Convolutional network 49 can be used for malware detection, privacy of an IoT element 50 , security of mobile networks 50 .

RNN (Recurrent Neural Networks) can be used for automatic natural language processing, intrusion/anomaly detection and speech recognition. RNN is considered as supervised or unsupervised learning. It is a multilayer NN with the previous set of hidden unit activations feeding back into the network along with the inputs. RNNs have a memory which captures information about what has been calculated so far. It is for the discipline of automatic NLP (Natural Language Processing). It consists of an input, one/more hidden layers, and an output layer. Each composed of one/more recurrent neurons. The main drawback of traditional Machine Learning is the lack of memory 21 to recall previous events. RNN solves this limitation/drawback by maintaining loops from current to previous states 21 to enable information persistence.

LSTM (Long Short-Term Memory) LSTMs are a supervised learning, which is a special of RNN, they are capable of learning long term dependencies. It usually consists of gates and memory cells 11 . Using different gate units in LSTM can solve the problem of gradient vanishing or explosion 11 caused by memory loss for long-term sequences. For the purpose of detecting attacks, the LSTM model can be developed/formulated as a problem of classification in a supervised manner 11 .

MLP (Multi-layer Perceptron): consists of an input, several hidden layers, and an output layer. Each layer consists of one/more nodes. The input layer receives the signal while the output layer performs a prediction on input. The hidden layer(s) is the true MLP’s computational engines. It is trained in supervised learning way/manner with the BP (back-propagation) algorithm. In other words, for training, the MLP uses backpropagation technique which is a supervised learning technique. In the MLP with backpropagation, neurons (nerve cells) of a layer are linked to all the neurons of the adjacent layers. To solve a problem, the implementation of a MLP algorithm needs the identification of the best weights applicable to each of the inter-neuronal connections through a backpropagation technique. MLP can be used to classify network traffic in IoT backbone network.

RBM (Restricted Boltzmann Machine) is a kind of ANN that can be used for intrusion detection 49 . It is able to represent and solve difficult problems 49 . RBM consists of two types of processes, learning and testing. In the learning phase, a huge amount of examples of desired inputs and outputs are given to create the RBM structure where a general rule is learned for mapping inputs to outputs. In the testing phase, the RBM produces outputs for new inputs while adhering to the general rule obtained during the learning phase.

DBN (Deep Belief Network) is a type of DNN (Deep Neural Network) with several hidden layers that consist of RBM layers. Connections are directed between layers with the exception of the units of each layer. The DBN contains 13 a layer of hidden units and a layer of visible units. The layer of hidden units learns to represent features and the layer of visible units represents the data 13 . The DBN is a generative graphical model(s) with multiple hidden causal variables. A DBN can be used to fabricate the FFDNN (feed-forward deep neural network) for the IoT 13 .

DAE (Deep Autoencoder) can be used for IoT botnet attack detection 49 , cyber security intrusion detection 13 . An autoencoder is composed 13 of the decoder and the encoder, it is a type of ANN. DAE is a NN with more than one hidden layer 54 . DAE extracts the internal relationship of the data by learning the optimal network parameters 54 that will result in an output similar to the input as much as possible.

GAN (Generative Adversarial Network) GANs are an unsupervised learning algorithm which uses two NNs named "Generator" and "Discriminator" opposed to each other. In other words, the GAN combines two NNs, one of which generates/creates the objects, while the second estimates them. The first network is known as G, the Generator, while the second one is known as D, the Discriminator. GAN can be employed for Intrusion detection 55 , and security anomalies 55 .

As shown this section should be detailed, it makes a summary and tool choice that I find better compared to others and can be exploited in IoT security. For that I put the advantages and disadvantages in several tables, to be clearer and more organized. These tables summarize what I have detailed as advantages and disadvantages. In the comparison between CTM method, AI algorithms, previously detailed and shown in Tables 5 and 6 , we mainly focus on the tasks, the case study, the strengths and the weaknesses (see Tables 4 , 5 and 6 ).

As shown in Table 6 , NIDSs are cross-platform and more scalable compared to HIDSs. To offer a higher level of security, these solutions can be used together. For the AI algorithms, as shown in Table 5 , the results teach us that each AI algorithm has its advantages and disadvantages/limitations that could affect its performance. From the table shown in Table 5 , we conclude that the LSTM has many advantages and few drawbacks compared to other AI algorithms. As synthesis, we can say that there is no automatic method to choose or to propose a model of NNs, that means the absence of a systematic method making it possible to define the best topology of the network and the number of neurons to be placed in the hidden layer (s). Otherwise, if there is no "black box" restriction; NNs will be the better choice.

There are many AI models that can be used for IoT security in classification and/or detection contexts, but each model has its own advantages, drawbacks and applicable scenario. Thereby, we should select appropriate model/classifier according to the Table 5 which give description of models, strength and weakness and application/problem in IoT security systems. The advantages and the disadvantages of the model will affect the choice of the algorithm. We often observe the use of AI algorithms with their different techniques and especially NNs. We conclude that the technique of classifying and detecting IoT attacks using LSTM is an efficient method. i.e., the choice of LSTM for IoT attack classification and detection is really a good choice. In this paper, we discuss the different types of AI, the related works of each type, aiming at the classification and detection of IoT attacks and intrusions. This study proves the effectiveness of using LSTM in the context of the classification and detection of attacks and intrusions in IoT systems. It is therefore necessary to use LSTM technique. Thus, we make a comparison of the proposed AI classification techniques with other classification technique named CTM in order to have a relevant method for classifying or/and detecting attacks in IoT systems.

The findings indicate that many research have incorporated deep learning with IoT security or machine learning with IoT security. Nevertheless, there is a lack of studies in incorporating machine learning and deep learning for IoT security. However, these investigations have proven the feasibility and efficiency of incorporating machine and deep learning for IoT security. This paper is based on two studies: (1) a performance study of artificial intelligence for IoT security; and (2) a comparative study between machine learning and deep learning for IoT security. In conclusion, the performance study has proven that LSTM is the best method compared to other traditional techniques of classification. LSTM algorithm is always the best reaching technique in these datasets BoT-IoT, AWID, Mirai-CCU, Mirai-RGU, ISCX2012, USTC-TFC2016, and CSE-CIC-IDS2018, with respectively 98%, 98.22%, 99.46%, 100%, 99.99%, 99.99%, and 98.394% compared to other AI models (between 74.78 and 99%). On the other hand, a comparative study has proven that LSTM has many benefits and few drawbacks compared to other AI classification techniques. By examining these findings, it is clear that the LSTM model is the best classifier according to its highest accuracy (100%). This means that when comparing the resulting state-of-the-art performance, LSTM achieves the best performance in all metrics especially the accuracy, with the seven-datasets. In addition, we clearly see that the LSTM deals with learning long-term dependencies, it also deals with sensor data 11 . Furthermore, the LSTM can be developed as a problem of classification in a supervised manner for use in attack detection. Moreover, an LSTM model can solve the problem of gradient vanishing or explosion. In addition, unlike traditional ML, the LSTM can be used to recognize repeated attack patterns in a long sequence of packets regardless/independent of window size 21 . Also, it is clear that the LSTM technique is resilient against adversaries since adversaries cannot fit to feature learning algorithms in order to advance their breaching techniques. Indeed, the precision-recall curve of LSTM is greater and higher which indicates that it perfectly classifies normal and attack instances into their respective classes 21 . Thus, the long memory of LSTM model over a long data sequence allowed the model/approach to perform better compared to other classical ML techniques. This means that the performance of the LSTM model leads to the efficiency of deep learning for attack classification and intrusion detection. The LSTM method is also designed to solve RNN problems. All these reasons might encourage many researchers from exploring its potential in classifying and detecting cyber security threats in IoT systems. These findings pave the way for more robust security for IoT environments. Moreover, the findings of the research on the contribution of artificial intelligence in the security of IoT systems are an indication of Deep Learning’s potential to succeed in IoT cybersecurity environment. Which confirmed the superiority of DL algorithms over traditional ML approaches.

There is a lack of appropriate and big IoT datasets that contain updated and new attacks behaviors. Therefore, researchers should work on building other IoT datasets to achieve the best possible findings/results using different deep learning techniques to do attacks classification and detection in IoT environment. This paper encourages the successful application and adoption of DL based LSTM in improving IoT security systems. Also, the use of other techniques than ML such as XAI in order to enhance and improve the security of IoT security systems is another interesting research direction.

In this section, the comparative study used can be useful for the reader to get an idea about security attacks, vulnerabilities, and a survey on AI algorithms for IoT security systems. We conducted a comparative study of artificial intelligence approaches for attacks classification and intrusion detection, we analyzed seventeen machine and deep learning approaches, including K-Nearest Neighbour, Support Vector Machine, Decision Tree, Nave Bayes, Ensemble Learning, Random Forest, Neural Network, Artificial Neural Networks, Convolutional Neural Networks, Recurrent Neural Networks, Long Short-Term Memory, Multi-layer Perceptron, Restricted Boltzmann Machine, Deep Belief Network, Deep Autoencoder, Generative Adversarial Network, Adversarial Autoencoder.

In addition, some studies proposed hybrid-based NIDS (Network Intrusion Detection Systems) for IoT systems that combines anomaly detection and signature-based detection. The anomaly detection method detects unknown or novel attacks, and the signature-based detection method detects known attacks. RPiDS proposed by 60 which is a novel intrusion detection or IDS architecture for the IoT environment based on existing tools. Authors identified the Snort as the IDS for their system, and the Raspberry Pi as a base hardware. Snort is the widely known open-source IDS, it is also a rule/signature-based IDS, multi-platform, lightweight NIDS, but it has just a single thread. They experimented with different configurations of Snort, namely detection engine as well as number of rules loaded, to perform the on the usage of RAM and CPU rate by Snort on the Raspberry. Their experiments demonstrate that the Raspberry Pi is capable of hosting Snort, by making RPiDS approach a feasible solution. The results show that their proposed architecture can effectively serve as IDS in IoT. However, the study didn’t investigate alternatives to Snort; they didn’t consider experimenting with other open-source IDSs like Suricata. Suricata’s main appreciated feature is the multi-threading, while, Snort’s main criticized feature is the single-threading.

Recently, hybrid intrusion detection has been proposed by 47 that combines anomaly detection and misuse detection. Hybrid detection systems train the two models (i.e., anomaly and misuse detection models) independently, then aggregate their results. Authors 47 , proposed a new hybrid/combined detection method which hierarchically integrates an anomaly detection model and a misuse detection model. Their experiment results show that their proposed method performed better than the conventional methods in detecting both known attacks and unknown attacks. In the integration process, the misuse model captures known attacks, then uses the anomaly model to supplement the ability to detect the unknown attack. First, the program used training data consisted of known attack traffic and normal traffic to train a C4.5 DT (Decision Tree) algorithm in order to build the misuse detection model, then trained 1-class SVMs (Support Vector Machine) on unknown attack sub dataset of the C4.5 DT branches to create/establish multiple anomaly detection models.

Moreover, existing efforts on network traffic anomaly detection include statistics-based methods and machine learning based methods. The proposed hybrid method was based on wavelet analysis and RVM (Relevance Vector Machine) 61 . Malicious attacks and network failures are the security problems with network. Therefore, to ensure network security, detecting anomalies of network traffic is the effective manner. For network traffic prediction, simple statistical models are not good enough 61 . For that, authors 61 proposed a hybrid method that was a combination between statistical and ML techniques to solve the network traffic prediction problem and anomaly detection. The proposed model was composed of (1) network traffic data decomposition into low-frequency components and high-frequency components using wavelet decomposition, then, (2) non-linear model RVM (Relevance Vector Machine) and Auto Regressive Moving Average (ARMA) are employed for prediction. i.e., for prediction, they applied the RVM model on high-frequency components and ARMA model on low-frequency components. In the experiment, they used a dataset ( http://newsfeed.ntcu.net/-news/2006 ) from the network traffic library. The dataset collected 300 network traffic data records per hour from August 1st to November 10th, 2011. 250 records were used for training and the latter 50 for testing. In their research, they used Relative Root Mean Square Error (RRMSE) and Root Mean Square Error (RMSE) to measure the prediction results. The results demonstrate the feasibility of combining machine learning methods and statistical methods together. Besides, their experiments evaluate the efficiency of their model.

In this context, integrating intrusion detection systems (IDS) into an IoT system gives better insight to secure and monitor the IoT system. Therefore, a secure architecture of IoT system is needed with recommended technologies 4 , 62 . Therefore, implementing an IDS on the network layer is needed. Moreover, communication technologies and protocols in IoT systems must also be secured 4 . As mentioned before, MQTT is the most widely utilized telemetry protocol in IoT 1 , 3 , 4 , 63 , it is one of the most popular protocols for implementing IoT networks 64 . However, the MQTT protocols do not provide the required security level by default 64 . So, evaluating the performance of the MQTT protocol 3 on the different QoS (Quality of Services) levels without and with security implementation is required 64 , 65 . Montori et al. 65 proposed an extension to the standard protocol MQTT called LA-MQTT (Location-Aware MQTT), for spatial-aware communications on IoT scenarios. LA-MQTT supports location-aware IoT communications.

The relevance of this contribution: among the critics, there are several authors working on “the detection of intrusions in an IoT system using AI”, or “the state of the art on AI for intrusion detection in the IoT”. The strength of this paper compared to these works is therefore the problem of comparison don't pose. So, it is new work with new and relevant results. There are no works on the same research question "what is the best classifier?", “which is the best AI algorithm for improving IoT security?”, “can we use the chosen algorithm in order to classify and/or detect intrusions and attacks in the IoT security systems?”, “which datasets are most suitable for IoT systems?”. Moreover, this is the strength, and the added value of this paper.

Challenges of AI with IoT

AI and IoT have challenges. In 66 , author presented in their paper the relation of AI with IoT. When we merge the AI and IoT, the challenges of AI and IoT are listed below:

Security : the important collected data must be secured;

Compatibility and complexity : combining these many devices that are connected together with different technologies may cause many difficulties;

Artificial stupidity : to perform basic tasks perfectly the AI program is unable; AI algorithms must be well used and developed to interpret more accurate data in order to take rational decision;

Lack of confidence : business and customers have a little confidence about the integrity of data created and to protect IoT devices;

Cloud attacks : large amount of data is stored in cloud, for this reason, the risk of data security increase;

Technology : competition for all technologies is the biggest challenge.

Security and privacy are two important factors to consider. Hence, home automation, smart hotel are some applications of AI system in the IoT 66 . This paper aimed to report some related works to IoT security in order to evaluate the performance of the AI in term of accuracy indicator, the employed methods, the used and most available IoT datasets, as well as the achieved results. Now, AI is a solution in many areas, it becomes a key factor in IoT security especially in classifying and detecting attacks. As a results, AI is showing amazing results in the field of the IoT security systems. AI fit perfectly with classifying and detecting attacks in IoT environment. This study presents the AI approaches, especially used in security to handling and improving the IoT security systems.

This survey was oriented towards AI techniques for classification and intrusion detection (the works that have been done between 2017 and 2023, the accuracy that has been deduced from the algorithms, etc.). The main goal is to focus on comparing the efficiency of the AI algorithms. This performance comparison aims us to deduce the AI algorithm used in IoT security system that gives the best prediction results. Finally, the right results got with the LSTM method. Using this method, we have proven that the LSTM is the best technique for classification compared to the other traditional methods. For classification, the noted observation is that the DL-based LSTM method has proven that it is the best method compared to the other traditional techniques. Only LSTM results outperformed other AI classifiers. Indeed, using LSTM algorithm in an intrusion detection field achieves better results than other AI algorithms. Moreover, new and big IoT-datasets should be generated with more IoT attacks, novel attack types, updated attacks behaviors in order to build the new updated AI-IDS model aiming at the elimination of false negatives and false positives.

Regarding datasets, the most recent and new IoT dataset that was created in the year 2021 is MQTTSet. While, IoT-23, TON_IoT, and MQTT-IOT-IDS2020 were created in the year 2020. Regarding AI techniques and models, the chosen algorithm demonstrates the higher accuracy (with 100% which is perfect) when compared to other Artificial Intelligence algorithms existing in the literature. It allowed a detection with lower loss rates (0.58%) and a better performance in terms of prediction time. LSTM model has the best results of (reach 100%) which that could be applied on attack classification and anomaly detection in IoT.

As a result, LSTM is the best algorithms among the AI techniques for improving the IoT security, in spite the fact that it recorded high accuracy. For this reason, the purpose of this study is to get pure results that will be used in the future work and situate our contribution among other works carried out in the same context in the literature in order to better direct the future work towards good results.

This study helps to discover a proper classification and detection algorithm (LSTM) and an IoT dataset (TON_IoT) that will be implemented in future work to approve an algorithm achieve accurate, relevant and attractive results by using an effective tool like Python, which means the chosen model will be applied on an IoT dataset. In the upcoming work, we will test on the dataset an intrusion detector based on such an algorithm to approve it and evaluate that model by estimating its performance on the test set. We have vigorously reviewed the important aspects of AI and IoT security, specifically, the current state and potential future directions.

Data availability

The IoT datasets analyzed during the current study as shown in Table 3 are publicly available and not on reasonable request of their corresponding author. These IoT datasets are publicly available, free, downloadable via their persistent web link to datasets as shown in Table 3 .

Hind, M., Noura, O., Amine, K. M., & Sanae, M. (2020). Internet of things: Classification of attacks using CTM method. In Proceedings of the 3rd International Conference on Networking, Information Systems & Security , 1–5. https://doi.org/10.1145/3386723.3387876

https://www.mcafee.com/content/enterprise/fr-ca/security-awareness/operations/what-is-siem.html (accessed: July 07, 2022).

Meziane, H., Ouerdi, N., Kasmi, M. A., & Mazouz, S. (2021). Classifying security attacks in IoT using CTM method. In Emerging Trends in ICT for Sustainable Development , 307–315. Springer. https://doi.org/10.1007/978-3-030-53440-0_32

Meziane, H. & Ouerdi, N. A Study of Modelling IoT Security Systems with Unified Modelling Language (UML). Int. J. Adv. Comput. Sci. Appl.   13 (11), https://doi.org/10.14569/IJACSA.2022.0131130 (2022). 

Akram, H., Konstantas, D. & Mahyoub, M. A comprehensive IoT attacks survey based on a building-blocked reference model. Ijacsa https://doi.org/10.14569/IJACSA.2018.090349 (2018).

Article   Google Scholar  

Crevier, D. AI: The Tumultuous History of the Search for Artificial Intelligence (Basic Books, 1993).

Google Scholar  

Ibitoye, O., Shafiq, O. & Matrawy, A. Analyzing adversarial attacks against deep learning for intrusion detection in IoT networks. In GLOBECOM , 1–6 (2019).

Zhou, Y., Han, M., Liu, L., He, J. S. & Wang, Y. Deep learning approach for cyberattack detection. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) , 262–267 (IEEE, 2018).

Aldwairi, T., Perera, D. & Novotny, M. A. An evaluation of the performance of restricted Boltzmann machines as a model for anomaly network intrusion detection. Comput. Netw. 144 , 111–119 (2018).

Vimalkumar, K. & Radhika, N. A big data framework for intrusion detection in smart grids using apache spark. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) , 198–204 (IEEE, 2017).

Alsaedi, A., Moustafa, N., Tari, Z., Mahmood, A. & Anwar, A. TON_IoT telemetry dataset: A new generation dataset of IoT and IIoT for data-driven intrusion detection systems. IEEE Access 8 , 165130–165150 (2020).

Zhang, Y., Li, P. & Wang, X. Intrusion detection for IoT based on improved genetic algorithm and deep belief network. IEEE Access 7 , 31711–31722. https://doi.org/10.1109/ACCESS.2019.2903723 (2019).

Ferrag, M. A., Maglaras, L., Moschoyiannis, S. & Janicke, H. Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study. J. Inf. Secur. Appl. https://doi.org/10.1016/j.jisa.2019.102419 (2020).

Ferdowsi, A. & Saad, W. Generative adversarial networks for distributed intrusion detection in the internet of things (2019). https://doi.org/10.1109/GLOBECOM38437.2019.9014102 .

Liang, C., Shanmugam, B., Azam, S., Jonkman, M., De Boer, F. & Narayansamy, G. Intrusion detection system for internet of things based on a machine learning approach (2019). https://doi.org/10.1109/ViTECoN.2019.8899448 .

Ge, M., Fu, X., Syed, N., Baig, Z., Teo, G. & Robles-Kelly, A. Deep learning-based intrusion detection for IoT networks. In Proceedings of IEEE Pacific Rim International Symposium on Dependable Computing, PRDC , 2019, vol. 2019-Decem, 256–265. https://doi.org/10.1109/PRDC47002.2019.00056 .

Yuan, D., Ota, K., Dong, M., Zhu, X., Wu, T., Zhang, L., & Ma, J. Intrusion detection for smart home security based on data augmentation with edge computing. In ICC 2020—2020 IEEE International Conference on Communications (ICC) (2020). https://doi.org/10.1109/icc40277.2020.9148632 .

Nagisetty, A. & Gupta, G. P. Framework for detection of malicious activities in IoT networks using keras deep learning library. In Proceedings of the 3rd International Conference on Computing Methodologies and Communication, ICCMC 2019 , 633–637 (2019).

Sainis, N., Srivastava, D. & Singh, R. Feature classification and outlier detection to increased accuracy in intrusion detection system. Int. J. Appl. Eng. Res. 13 (10), 7249–7255 (2018).

Sheikh, N. U., Rahman, H., Vikram, S. & AlQahtani, H. A lightweight signature-based IDS for IoT environment.  arXiv:1811.04582 (2018) .

Diro, A. & Chilamkurti, N. Leveraging LSTM networks for attack detection in fog-to-things communications. IEEE Commun. Mag. 56 (9), 124–130 (2018).

Kasongo, S. M. & Sun, Y. A deep learning method with wrapper based feature extraction for wireless intrusion detection system. Comput. Secur. 92 , 101752 (2020).

Hwang, R.-H., Peng, M.-C., Nguyen, V.-L. & Chang, Y.-L. An LSTM-based deep learning approach for classifying malicious traffic at the packet level. Appl. Sci. 9 (16), 3414 (2019).

Ferrag, M. A. & Maglaras, L. DeepCoin: A novel deep learning and blockchain-based energy exchange framework for smart grids. IEEE Trans. Eng. Manage. 67 (4), 1285–1297. https://doi.org/10.1109/TEM.2019.2922936 (2019).

Koroniotis, N., Moustafa, N., Sitnikova, E. & Turnbull, B. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst. 100 , 779–796. https://doi.org/10.1016/j.future.2019.05.041 (2019).

Altaf, A., Abbas, H., Iqbal, F. & Derhab, A. Trust models of Internet of smart things: A survey, open issues, and future directions. J. Netw. Comput. Appl. 137 , 93–111 (2019).

Bacha, S. et al. Anomaly-based intrusion detection system in IoT using kernel extreme learning machine. J. Ambient Intell. Humaniz. Comput. https://doi.org/10.1007/s12652-022-03887-w (2022).

Le, T., Oktian, Y. & Kim, H. XGBoost for imbalanced multiclass classification-based industrial internet of things intrusion detection systems. Sustainability 14 (14), 87–105 (2022).

Ullah, I. & Mahmoud, Q. H. Design and development of a deep learning-based model for anomaly detection in IoT Networks. IEEE Access 9 , 103906–103926 (2021).

Saba, T., Rehman, A., Sadad, T., Kolivand, H. & Bahaj, S. A. Anomaly-based intrusion detection system for IoT networks through deep learning model. Comput. Electric. Eng. 99 , 107810 (2022).

Guezzaz, A. et al. A lightweight hybrid intrusion detection framework using machine learning for edge-based IIoT security. Int. Arab. J. Inf. Technol. https://doi.org/10.34028/iajit/19/5/14 (2022).

Jamal, A., Hayat, M. F. & Nasir, M. Malware detection and classification in IoT network using ANN. Mehran Univ. Res. J. Eng. Technol. 41 (1), 80–91 (2022).

Basati, A. & Faghih, M. M. PDAE: Efficient network intrusion detection in IoT using parallel deep auto-encoders. Inf. Sci. 598 , 57–74 (2022).

Ali, M., Hu, Y. F., Luong, D. K., Oguntala, G., Li, J. P., & Abdo, K. Adversarial attacks on AI based intrusion detection system for heterogeneous wireless communications networks. In 2020 AIAA/IEEE 39th Digital Avionics Systems Conference (DASC) , 1–6 (IEEE, 2020).

Habibi, O., Chemmakha, M. & Lazaar, M. Imbalanced tabular data modelization using CTGAN and machine learning to improve IoT Botnet attacks detection. Eng. Appl. Artif. Intell. 118 , 105669. https://doi.org/10.1016/j.engappai.2022.105669 (2023).

IEEE DataPort. https://ieee-dataport.org/open-access/mqtt-iot-ids2020-mqtt-internet-things-intrusion-detection-dataset .

https://www.kaggle.com/cnrieiit/mqttset . Last accessed 2021/11/29.

Garcia, S., Parmisano, A. & Erquiaga, M. J. IoT-23: A labeled dataset with malicious and benign IoT network traffic (2020). https://doi.org/10.5281/ZENODO.4743746 .

https://research.unsw.edu.au/projects/bot-iot-dataset . Last accessed 2021/08/10.

https://research.unsw.edu.au/projects/toniot-datasets . Last accessed 2021/08/10.

Moustafa, N. & Slay, J. The evaluation of network anomaly detection systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Inf. Secur. J. 25 (1–3), 18–31. https://doi.org/10.1080/19393555.2015.1125974 (2016).

Mithun Sridharan, July 17, 2015, https://jixta.wordpress.com/2015/07/17/machine-learning-algorithms-mindmap/ . Accessed 02 July 2020.

Mulvey, D., Foh, C. H., Imran, M. A. & Tafazolli, R. Cell fault management using machine learning techniques. IEEE Access 7 , 124514–124539. https://doi.org/10.1109/ACCESS.2019.2938410 (2019).

Hussain, F., Hussain, R., Hassan, S. A. & Hossain, E. Machine learning in IoT security: Current solutions and future challenges. IEEE Commun. Surv. Tutor. https://doi.org/10.1109/COMST.2020.2986444 (2020).

Koroniotis, N., Moustafa, N. & Sitnikova, E. Forensics and deep learning mechanisms for botnets in internet of things: A survey of challenges and solutions. IEEE Access 7 , 61764–61785 (2019).

Liang, F., Hatcher, W. G., Liao, W., Gao, W. & Yu, W. Machine learning for security and the internet of things: The good, the bad, and the ugly. IEEE Access 7 , 158126–158147 (2019).

Wu, H., Han, H., Wang, X. & Sun, S. Research on artificial intelligence enhancing internet of things security: A survey. IEEE Access 8 , 153826–153848 (2020).

Ghosh, A., Chakraborty, D. & Law, A. Artificial intelligence in Internet of things. CAAI Trans. Intell. Technol. 3 (4), 208–218 (2018).

Amanullah, M. A. et al. Deep learning and big data technologies for IoT security. Comput. Commun. 151 , 495–517 (2020).

Cui, L. et al. A survey on application of machine learning for Internet of Things. Int. J. Mach. Learn. Cybern. 9 , 1399–1417 (2018).

Khurana, N., Mittal, S., Piplai, A., & Joshi, A. Preventing poisoning attacks on AI based threat intelligence systems. In 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP) , 1–6 (IEEE, 2019).

Tahsien, S. M., Karimipour, H. & Spachos, P. Machine learning based solutions for security of Internet of Things (IoT): A survey. J. Netw. Comput. Appl. 161 , 102630 (2020).

HaddadPajouh, H., Dehghantanha, A., Parizi, R. M., Aledhari, M. & Karimipour, H. A survey on internet of things security: Requirements, challenges, and solutions. Internet Things 14 , 100129 (2021).

Tian, Z., Luo, C., Qiu, J., Du, X. & Guizani, M. A distributed deep learning system for web attack detection on edge devices. IEEE Trans. Ind. Inform. 16 (3), 1963–1971. https://doi.org/10.1109/TII.2019.2938778 (2020).

Belenko, V., Chernenko, V., Kalinin, M. & Krundyshev, V. Evaluation of GAN applicability for intrusion detection in self-organizing networks of cyber physical systems (2018). https://doi.org/10.1109/RUSAUTOCON.2018.8501783 .

Hass, A. M. J. Guide to Advanced Software Testing 179–186 (Artech House, 2008).

MATH   Google Scholar  

Chen, T. Y., & Poon, P. L. Classification-hierarchy table: A methodology for constructing the classification tree. In Proceedings of 1996 Australian Software Engineering Conference , 93–104 (IEEE, 1996). https://doi.org/10.1109/ASWEC.1996.534127

Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I. & Frey, B. Adversarial autoencoders (2015). arXiv preprint arXiv:1511.05644 .

Othman, S. M., Alsohybe, N. T., Ba-Alwi, F. M. & Zahary, A. T. Survey on intrusion detection system types. Int. J. Cyber Secur. Digit. Forensics 7 , 444–463 (2018).

Sforzin, A., Gomez Marmol, F., Conti, M. & Bohli, J.-M. (2016). RPiDS: Raspberry Pi IDS—A fruitful intrusion detection system for IoT. 440–448 (2016). https://doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0080

Wang, H. Anomaly detection of network traffic based on prediction and self-adaptive threshold. Int. J. Future Gener. Commun. Netw. 8 (6), 205–214 (2015).

Hind, M., Noura, O., Sanae, M. & Abraham, A. A comparative study for modeling IoT security systems. In Intelligent Systems Design and Applications. ISDA 2022. Lecture Notes in Networks and Systems Vol. 717 (eds Abraham, A. et al. ) (Springer, 2023). https://doi.org/10.1007/978-3-031-35510-3_25 .

Chapter   Google Scholar  

Hind, M., Noura, O. & Abraham, A. Modeling IoT based Forest Fire Detection System with IoTsec. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 15 , 201–213 (2023).

Alkhafajee, A. R., Al-Muqarm, A. M. A., Alwan, A. H. & Mohammed, Z. R. Security and performance analysis of MQTT Protocol with TLS in IoT Networks. In 2021 4th International Iraqi Conference on Engineering Technology and Their Applications (IICETA) , 206–211 (IEEE, 2021).

Montori, F., Gigli, L., Sciullo, L. & Felice, M. D. LA-MQTT: Location-aware publish-subscribe communications for the Internet of Things. ACM Trans. Internet Things 3 (3), 1–28 (2022).

Mohamed, E. The relation of artificial intelligence with internet of things: A survey. J. Cybersecur. Inf. Manag. 1 (1), 30–24 (2020).

Download references

Author information

Authors and affiliations.

LACSA Laboratory, Faculty of Sciences (FSO), Mohammed First University (UMP), Oujda, Morocco

Hind Meziane & Noura Ouerdi

You can also search for this author in PubMed   Google Scholar

Contributions

H.M. wrote the main manuscript text and prepared all figures (Conceptualization, Methodology, Supervision, Writing - review & editing) H.M. and N.O. reviewed the manuscript.

Corresponding author

Correspondence to Hind Meziane .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Meziane, H., Ouerdi, N. A survey on performance evaluation of artificial intelligence algorithms for improving IoT security systems. Sci Rep 13 , 21255 (2023). https://doi.org/10.1038/s41598-023-46640-9

Download citation

Received : 26 May 2023

Accepted : 03 November 2023

Published : 01 December 2023

DOI : https://doi.org/10.1038/s41598-023-46640-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

An adaptive detection model for ipv6 extension header threats based on deterministic decision automaton.

  • Liancheng Zhang
  • Mingyue Ren

Scientific Reports (2024)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

research paper on artificial intelligence algorithms

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Innovation (Camb)
  • v.2(4); 2021 Nov 28

Artificial intelligence: A powerful paradigm for scientific research

1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

35 University of Chinese Academy of Sciences, Beijing 100049, China

5 Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China

10 Zhongshan Hospital Institute of Clinical Science, Fudan University, Shanghai 200032, China

Changping Huang

18 Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China

11 Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China

37 Songshan Lake Materials Laboratory, Dongguan, Guangdong 523808, China

26 Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China

Xingchen Liu

28 Institute of Coal Chemistry, Chinese Academy of Sciences, Taiyuan 030001, China

2 Institute of Software, Chinese Academy of Sciences, Beijing 100190, China

Fengliang Dong

3 National Center for Nanoscience and Technology, Beijing 100190, China

Cheng-Wei Qiu

4 Department of Electrical and Computer Engineering, National University of Singapore, Singapore 117583, Singapore

6 Department of Gynaecology, Obstetrics and Gynaecology Hospital, Fudan University, Shanghai 200011, China

36 Shanghai Key Laboratory of Female Reproductive Endocrine-Related Diseases, Shanghai 200011, China

7 School of Food Science and Technology, Dalian Polytechnic University, Dalian 116034, China

41 Second Affiliated Hospital School of Medicine, and School of Public Health, Zhejiang University, Hangzhou 310058, China

8 Department of Obstetrics and Gynecology, Peking University Third Hospital, Beijing 100191, China

9 Zhejiang Provincial People’s Hospital, Hangzhou 310014, China

Chenguang Fu

12 School of Materials Science and Engineering, Zhejiang University, Hangzhou 310027, China

Zhigang Yin

13 Fujian Institute of Research on the Structure of Matter, Chinese Academy of Sciences, Fuzhou 350002, China

Ronald Roepman

14 Medical Center, Radboud University, 6500 Nijmegen, the Netherlands

Sabine Dietmann

15 Institute for Informatics, Washington University School of Medicine, St. Louis, MO 63110, USA

Marko Virta

16 Department of Microbiology, University of Helsinki, 00014 Helsinki, Finland

Fredrick Kengara

17 School of Pure and Applied Sciences, Bomet University College, Bomet 20400, Kenya

19 Agriculture College of Shihezi University, Xinjiang 832000, China

Taolan Zhao

20 Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China

21 The Brain Cognition and Brain Disease Institute, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China

38 Shenzhen-Hong Kong Institute of Brain Science-Shenzhen Fundamental Research Institutions, Shenzhen 518055, China

Jialiang Yang

22 Geneis (Beijing) Co., Ltd, Beijing 100102, China

23 Department of Communication Studies, Hong Kong Baptist University, Hong Kong, China

24 South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China

39 Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Guangzhou 510650, China

Zhaofeng Liu

27 Shanghai Astronomical Observatory, Chinese Academy of Sciences, Shanghai 200030, China

29 Suzhou Institute of Nano-Tech and Nano-Bionics, Chinese Academy of Sciences, Suzhou 215123, China

Xiaohong Liu

30 Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China

James P. Lewis

James m. tiedje.

34 Center for Microbial Ecology, Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI 48824, USA

40 Zhejiang Lab, Hangzhou 311121, China

25 Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200031, China

31 Department of Computer Science, Aberystwyth University, Aberystwyth, Ceredigion SY23 3FL, UK

Zhipeng Cai

32 Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA

33 Institute of Soil Science, Chinese Academy of Sciences, Nanjing 210008, China

Jiabao Zhang

Artificial intelligence (AI) coupled with promising machine learning (ML) techniques well known from computer science is broadly affecting many aspects of various fields including science and technology, industry, and even our day-to-day life. The ML techniques have been developed to analyze high-throughput data with a view to obtaining useful insights, categorizing, predicting, and making evidence-based decisions in novel ways, which will promote the growth of novel applications and fuel the sustainable booming of AI. This paper undertakes a comprehensive survey on the development and application of AI in different aspects of fundamental sciences, including information science, mathematics, medical science, materials science, geoscience, life science, physics, and chemistry. The challenges that each discipline of science meets, and the potentials of AI techniques to handle these challenges, are discussed in detail. Moreover, we shed light on new research trends entailing the integration of AI into each scientific discipline. The aim of this paper is to provide a broad research guideline on fundamental sciences with potential infusion of AI, to help motivate researchers to deeply understand the state-of-the-art applications of AI-based fundamental sciences, and thereby to help promote the continuous development of these fundamental sciences.

Graphical abstract

An external file that holds a picture, illustration, etc.
Object name is fx1.jpg

Public summary

  • • “Can machines think?” The goal of artificial intelligence (AI) is to enable machines to mimic human thoughts and behaviors, including learning, reasoning, predicting, and so on.
  • • “Can AI do fundamental research?” AI coupled with machine learning techniques is impacting a wide range of fundamental sciences, including mathematics, medical science, physics, etc.
  • • “How does AI accelerate fundamental research?” New research and applications are emerging rapidly with the support by AI infrastructure, including data storage, computing power, AI algorithms, and frameworks.

Introduction

“Can machines think?” Alan Turing posed this question in his famous paper “Computing Machinery and Intelligence.” 1 He believes that to answer this question, we need to define what thinking is. However, it is difficult to define thinking clearly, because thinking is a subjective behavior. Turing then introduced an indirect method to verify whether a machine can think, the Turing test, which examines a machine's ability to show intelligence indistinguishable from that of human beings. A machine that succeeds in the test is qualified to be labeled as artificial intelligence (AI).

AI refers to the simulation of human intelligence by a system or a machine. The goal of AI is to develop a machine that can think like humans and mimic human behaviors, including perceiving, reasoning, learning, planning, predicting, and so on. Intelligence is one of the main characteristics that distinguishes human beings from animals. With the interminable occurrence of industrial revolutions, an increasing number of types of machine types continuously replace human labor from all walks of life, and the imminent replacement of human resources by machine intelligence is the next big challenge to be overcome. Numerous scientists are focusing on the field of AI, and this makes the research in the field of AI rich and diverse. AI research fields include search algorithms, knowledge graphs, natural languages processing, expert systems, evolution algorithms, machine learning (ML), deep learning (DL), and so on.

The general framework of AI is illustrated in Figure 1 . The development process of AI includes perceptual intelligence, cognitive intelligence, and decision-making intelligence. Perceptual intelligence means that a machine has the basic abilities of vision, hearing, touch, etc., which are familiar to humans. Cognitive intelligence is a higher-level ability of induction, reasoning and acquisition of knowledge. It is inspired by cognitive science, brain science, and brain-like intelligence to endow machines with thinking logic and cognitive ability similar to human beings. Once a machine has the abilities of perception and cognition, it is often expected to make optimal decisions as human beings, to improve the lives of people, industrial manufacturing, etc. Decision intelligence requires the use of applied data science, social science, decision theory, and managerial science to expand data science, so as to make optimal decisions. To achieve the goal of perceptual intelligence, cognitive intelligence, and decision-making intelligence, the infrastructure layer of AI, supported by data, storage and computing power, ML algorithms, and AI frameworks is required. Then by training models, it is able to learn the internal laws of data for supporting and realizing AI applications. The application layer of AI is becoming more and more extensive, and deeply integrated with fundamental sciences, industrial manufacturing, human life, social governance, and cyberspace, which has a profound impact on our work and lifestyle.

An external file that holds a picture, illustration, etc.
Object name is gr1.jpg

The general framework of AI

History of AI

The beginning of modern AI research can be traced back to John McCarthy, who coined the term “artificial intelligence (AI),” during at a conference at Dartmouth College in 1956. This symbolized the birth of the AI scientific field. Progress in the following years was astonishing. Many scientists and researchers focused on automated reasoning and applied AI for proving of mathematical theorems and solving of algebraic problems. One of the famous examples is Logic Theorist, a computer program written by Allen Newell, Herbert A. Simon, and Cliff Shaw, which proves 38 of the first 52 theorems in “Principia Mathematica” and provides more elegant proofs for some. 2 These successes made many AI pioneers wildly optimistic, and underpinned the belief that fully intelligent machines would be built in the near future. However, they soon realized that there was still a long way to go before the end goals of human-equivalent intelligence in machines could come true. Many nontrivial problems could not be handled by the logic-based programs. Another challenge was the lack of computational resources to compute more and more complicated problems. As a result, organizations and funders stopped supporting these under-delivering AI projects.

AI came back to popularity in the 1980s, as several research institutions and universities invented a type of AI systems that summarizes a series of basic rules from expert knowledge to help non-experts make specific decisions. These systems are “expert systems.” Examples are the XCON designed by Carnegie Mellon University and the MYCIN designed by Stanford University. The expert system derived logic rules from expert knowledge to solve problems in the real world for the first time. The core of AI research during this period is the knowledge that made machines “smarter.” However, the expert system gradually revealed several disadvantages, such as privacy technologies, lack of flexibility, poor versatility, expensive maintenance cost, and so on. At the same time, the Fifth Generation Computer Project, heavily funded by the Japanese government, failed to meet most of its original goals. Once again, the funding for AI research ceased, and AI was at the second lowest point of its life.

In 2006, Geoffrey Hinton and coworkers 3 , 4 made a breakthrough in AI by proposing an approach of building deeper neural networks, as well as a way to avoid gradient vanishing during training. This reignited AI research, and DL algorithms have become one of the most active fields of AI research. DL is a subset of ML based on multiple layers of neural networks with representation learning, 5 while ML is a part of AI that a computer or a program can use to learn and acquire intelligence without human intervention. Thus, “learn” is the keyword of this era of AI research. Big data technologies, and the improvement of computing power have made deriving features and information from massive data samples more efficient. An increasing number of new neural network structures and training methods have been proposed to improve the representative learning ability of DL, and to further expand it into general applications. Current DL algorithms match and exceed human capabilities on specific datasets in the areas of computer vision (CV) and natural language processing (NLP). AI technologies have achieved remarkable successes in all walks of life, and continued to show their value as backbones in scientific research and real-world applications.

Within AI, ML is having a substantial broad effect across many aspects of technology and science: from computer science to geoscience to materials science, from life science to medical science to chemistry to mathematics and to physics, from management science to economics to psychology, and other data-intensive empirical sciences, as ML methods have been developed to analyze high-throughput data to obtain useful insights, categorize, predict, and make evidence-based decisions in novel ways. To train a system by presenting it with examples of desired input-output behavior, could be far easier than to program it manually by predicting the desired response for all potential inputs. The following sections survey eight fundamental sciences, including information science (informatics), mathematics, medical science, materials science, geoscience, life science, physics, and chemistry, which develop or exploit AI techniques to promote the development of sciences and accelerate their applications to benefit human beings, society, and the world.

AI in information science

AI aims to provide the abilities of perception, cognition, and decision-making for machines. At present, new research and applications in information science are emerging at an unprecedented rate, which is inseparable from the support by the AI infrastructure. As shown in Figure 2 , the AI infrastructure layer includes data, storage and computing power, ML algorithms, and the AI framework. The perception layer enables machines have the basic ability of vision, hearing, etc. For instance, CV enables machines to “see” and identify objects, while speech recognition and synthesis helps machines to “hear” and recognize speech elements. The cognitive layer provides higher ability levels of induction, reasoning, and acquiring knowledge with the help of NLP, 6 knowledge graphs, 7 and continual learning. 8 In the decision-making layer, AI is capable of making optimal decisions, such as automatic planning, expert systems, and decision-supporting systems. Numerous applications of AI have had a profound impact on fundamental sciences, industrial manufacturing, human life, social governance, and cyberspace. The following subsections provide an overview of the AI framework, automatic machine learning (AutoML) technology, and several state-of-the-art AI/ML applications in the information field.

An external file that holds a picture, illustration, etc.
Object name is gr2.jpg

The knowledge graph of the AI framework

The AI framework provides basic tools for AI algorithm implementation

In the past 10 years, applications based on AI algorithms have played a significant role in various fields and subjects, on the basis of which the prosperity of the DL framework and platform has been founded. AI frameworks and platforms reduce the requirement of accessing AI technology by integrating the overall process of algorithm development, which enables researchers from different areas to use it across other fields, allowing them to focus on designing the structure of neural networks, thus providing better solutions to problems in their fields. At the beginning of the 21st century, only a few tools, such as MATLAB, OpenNN, and Torch, were capable of describing and developing neural networks. However, these tools were not originally designed for AI models, and thus faced problems, such as complicated user API and lacking GPU support. During this period, using these frameworks demanded professional computer science knowledge and tedious work on model construction. As a solution, early frameworks of DL, such as Caffe, Chainer, and Theano, emerged, allowing users to conveniently construct complex deep neural networks (DNNs), such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and LSTM conveniently, and this significantly reduced the cost of applying AI models. Tech giants then joined the march in researching AI frameworks. 9 Google developed the famous open-source framework, TensorFlow, while Facebook's AI research team released another popular platform, PyTorch, which is based on Torch; Microsoft Research published CNTK, and Amazon announced MXNet. Among them, TensorFlow, also the most representative framework, referred to Theano's declarative programming style, offering a larger space for graph-based optimization, while PyTorch inherited the imperative programming style of Torch, which is intuitive, user friendly, more flexible, and easier to be traced. As modern AI frameworks and platforms are being widely applied, practitioners can now assemble models swiftly and conveniently by adopting various building block sets and languages specifically suitable for given fields. Polished over time, these platforms gradually developed a clearly defined user API, the ability for multi-GPU training and distributed training, as well as a variety of model zoos and tool kits for specific tasks. 10 Looking forward, there are a few trends that may become the mainstream of next-generation framework development. (1) Capability of super-scale model training. With the emergence of models derived from Transformer, such as BERT and GPT-3, the ability of training large models has become an ideal feature of the DL framework. It requires AI frameworks to train effectively under the scale of hundreds or even thousands of devices. (2) Unified API standard. The APIs of many frameworks are generally similar but slightly different at certain points. This leads to some difficulties and unnecessary learning efforts, when the user attempts to shift from one framework to another. The API of some frameworks, such as JAX, has already become compatible with Numpy standard, which is familiar to most practitioners. Therefore, a unified API standard for AI frameworks may gradually come into being in the future. (3) Universal operator optimization. At present, kernels of DL operator are implemented either manually or based on third-party libraries. Most third-party libraries are developed to suit certain hardware platforms, causing large unnecessary spending when models are trained or deployed on different hardware platforms. The development speed of new DL algorithms is usually much faster than the update rate of libraries, which often makes new algorithms to be beyond the range of libraries' support. 11

To improve the implementation speed of AI algorithms, much research focuses on how to use hardware for acceleration. The DianNao family is one of the earliest research innovations on AI hardware accelerators. 12 It includes DianNao, DaDianNao, ShiDianNao, and PuDianNao, which can be used to accelerate the inference speed of neural networks and other ML algorithms. Of these, the best performance of a 64-chip DaDianNao system can achieve a speed up of 450.65× over a GPU, and reduce the energy by 150.31×. Prof. Chen and his team in the Institute of Computing Technology also designed an Instruction Set Architecture for a broad range of neural network accelerators, called Cambricon, which developed into a serial DL accelerator. After Cambricon, many AI-related companies, such as Apple, Google, HUAWEI, etc., developed their own DL accelerators, and AI accelerators became an important research field of AI.

AI for AI—AutoML

AutoML aims to study how to use evolutionary computing, reinforcement learning (RL), and other AI algorithms, to automatically generate specified AI algorithms. Research on the automatic generation of neural networks has existed before the emergence of DL, e.g., neural evolution. 13 The main purpose of neural evolution is to allow neural networks to evolve according to the principle of survival of the fittest in the biological world. Through selection, crossover, mutation, and other evolutionary operators, the individual quality in a population is continuously improved and, finally, the individual with the greatest fitness represents the best neural network. The biological inspiration in this field lies in the evolutionary process of human brain neurons. The human brain has such developed learning and memory functions that it cannot do without the complex neural network system in the brain. The whole neural network system of the human brain benefits from a long evolutionary process rather than gradient descent and back propagation. In the era of DL, the application of AI algorithms to automatically generate DNN has attracted more attention and, gradually, developed into an important direction of AutoML research: neural architecture search. The implementation methods of neural architecture search are usually divided into the RL-based method and the evolutionary algorithm-based method. In the RL-based method, an RNN is used as a controller to generate a neural network structure layer by layer, and then the network is trained, and the accuracy of the verification set is used as the reward signal of the RNN to calculate the strategy gradient. During the iteration, the controller will give the neural network, with higher accuracy, a higher probability value, so as to ensure that the strategy function can output the optimal network structure. 14 The method of neural architecture search through evolution is similar to the neural evolution method, which is based on a population and iterates continuously according to the principle of survival of the fittest, so as to obtain a high-quality neural network. 15 Through the application of neural architecture search technology, the design of neural networks is more efficient and automated, and the accuracy of the network gradually outperforms that of the networks designed by AI experts. For example, Google's SOTA network EfficientNet was realized through the baseline network based on neural architecture search. 16

AI enabling networking design adaptive to complex network conditions

The application of DL in the networking field has received strong interest. Network design often relies on initial network conditions and/or theoretical assumptions to characterize real network environments. However, traditional network modeling and design, regulated by mathematical models, are unlikely to deal with complex scenarios with many imperfect and high dynamic network environments. Integrating DL into network research allows for a better representation of complex network environments. Furthermore, DL could be combined with the Markov decision process and evolve into the deep reinforcement learning (DRL) model, which finds an optimal policy based on the reward function and the states of the system. Taken together, these techniques could be used to make better decisions to guide proper network design, thereby improving the network quality of service and quality of experience. With regard to the aspect of different layers of the network protocol stack, DL/DRL can be adopted for network feature extraction, decision-making, etc. In the physical layer, DL can be used for interference alignment. It can also be used to classify the modulation modes, design efficient network coding 17 and error correction codes, etc. In the data link layer, DL can be used for resource (such as channels) allocation, medium access control, traffic prediction, 18 link quality evaluation, and so on. In the network (routing) layer, routing establishment and routing optimization 19 can help to obtain an optimal routing path. In higher layers (such as the application layer), enhanced data compression and task allocation is used. Besides the above protocol stack, one critical area of using DL is network security. DL can be used to classify the packets into benign/malicious types, and how it can be integrated with other ML schemes, such as unsupervised clustering, to achieve a better anomaly detection effect.

AI enabling more powerful and intelligent nanophotonics

Nanophotonic components have recently revolutionized the field of optics via metamaterials/metasurfaces by enabling the arbitrary manipulation of light-matter interactions with subwavelength meta-atoms or meta-molecules. 20 , 21 , 22 The conventional design of such components involves generally forward modeling, i.e., solving Maxwell's equations based on empirical and intuitive nanostructures to find corresponding optical properties, as well as the inverse design of nanophotonic devices given an on-demand optical response. The trans-dimensional feature of macro-optical components consisting of complex nano-antennas makes the design process very time consuming, computationally expensive, and even numerically prohibitive, such as device size and complexity increase. DL is an efficient and automatic platform, enabling novel efficient approaches to designing nanophotonic devices with high-performance and versatile functions. Here, we present briefly the recent progress of DL-based nanophotonics and its wide-ranging applications. DL was exploited for forward modeling at first using a DNN. 23 The transmission or reflection coefficients can be well predicted after training on huge datasets. To improve the prediction accuracy of DNN in case of small datasets, transfer learning was introduced to migrate knowledge between different physical scenarios, which greatly reduced the relative error. Furthermore, a CNN and an RNN were developed for the prediction of optical properties from arbitrary structures using images. 24 The CNN-RNN combination successfully predicted the absorption spectra from the given input structural images. In inverse design of nanophotonic devices, there are three different paradigms of DL methods, i.e., supervised, unsupervised, and RL. 25 Supervised learning has been utilized to design structural parameters for the pre-defined geometries, such as tandem DNN and bidirectional DNNs. Unsupervised learning methods learn by themselves without a specific target, and thus are more accessible to discovering new and arbitrary patterns 26 in completely new data than supervised learning. A generative adversarial network (GAN)-based approach, combining conditional GANs and Wasserstein GANs, was proposed to design freeform all-dielectric multifunctional metasurfaces. RL, especially double-deep Q-learning, powers up the inverse design of high-performance nanophotonic devices. 27 DL has endowed nanophotonic devices with better performance and more emerging applications. 28 , 29 For instance, an intelligent microwave cloak driven by DL exhibits millisecond and self-adaptive response to an ever-changing incident wave and background. 28 Another example is that a DL-augmented infrared nanoplasmonic metasurface is developed for monitoring dynamics between four major classes of bio-molecules, which could impact the fields of biology, bioanalytics, and pharmacology from fundamental research, to disease diagnostics, to drug development. 29 The potential of DL in the wide arena of nanophotonics has been unfolding. Even end-users without optics and photonics background could exploit the DL as a black box toolkit to design powerful optical devices. Nevertheless, how to interpret/mediate the intermediate DL process and determine the most dominant factors in the search for optimal solutions, are worthy of being investigated in depth. We optimistically envisage that the advancements in DL algorithms and computation/optimization infrastructures would enable us to realize more efficient and reliable training approaches, more complex nanostructures with unprecedented shapes and sizes, and more intelligent and reconfigurable optic/optoelectronic systems.

AI in other fields of information science

We believe that AI has great potential in the following directions:

  • • AI-based risk control and management in utilities can prevent costly or hazardous equipment failures by using sensors that detect and send information regarding the machine's health to the manufacturer, predicting possible issues that could occur so as to ensure timely maintenance or automated shutdown.
  • • AI could be used to produce simulations of real-world objects, called digital twins. When applied to the field of engineering, digital twins allow engineers and technicians to analyze the performance of an equipment virtually, thus avoiding safety and budget issues associated with traditional testing methods.
  • • Combined with AI, intelligent robots are playing an important role in industry and human life. Different from traditional robots working according to the procedures specified by humans, intelligent robots have the ability of perception, recognition, and even automatic planning and decision-making, based on changes in environmental conditions.
  • • AI of things (AIoT) or AI-empowered IoT applications. 30 have become a promising development trend. AI can empower the connected IoT devices, embedded in various physical infrastructures, to perceive, recognize, learn, and act. For instance, smart cities constantly collect data regarding quality-of-life factors, such as the status of power supply, public transportation, air pollution, and water use, to manage and optimize systems in cities. Due to these data, especially personal data being collected from informed or uninformed participants, data security, and privacy 31 require protection.

AI in mathematics

Mathematics always plays a crucial and indispensable role in AI. Decades ago, quite a few classical AI-related approaches, such as k-nearest neighbor, 32 support vector machine, 33 and AdaBoost, 34 were proposed and developed after their rigorous mathematical formulations had been established. In recent years, with the rapid development of DL, 35 AI has been gaining more and more attention in the mathematical community. Equipped with the Markov process, minimax optimization, and Bayesian statistics, RL, 36 GANs, 37 and Bayesian learning 38 became the most favorable tools in many AI applications. Nevertheless, there still exist plenty of open problems in mathematics for ML, including the interpretability of neural networks, the optimization problems of parameter estimation, and the generalization ability of learning models. In the rest of this section, we discuss these three questions in turn.

The interpretability of neural networks

From a mathematical perspective, ML usually constructs nonlinear models, with neural networks as a typical case, to approximate certain functions. The well-known Universal Approximation Theorem suggests that, under very mild conditions, any continuous function can be uniformly approximated on compact domains by neural networks, 39 which serves a vital function in the interpretability of neural networks. However, in real applications, ML models seem to admit accurate approximations of many extremely complicated functions, sometimes even black boxes, which are far beyond the scope of continuous functions. To understand the effectiveness of ML models, many researchers have investigated the function spaces that can be well approximated by them, and the corresponding quantitative measures. This issue is closely related to the classical approximation theory, but the approximation scheme is distinct. For example, Bach 40 finds that the random feature model is naturally associated with the corresponding reproducing kernel Hilbert space. In the same way, the Barron space is identified as the natural function space associated with two-layer neural networks, and the approximation error is measured using the Barron norm. 41 The corresponding quantities of residual networks (ResNets) are defined for the flow-induced spaces. For multi-layer networks, the natural function spaces for the purposes of approximation theory are the tree-like function spaces introduced in Wojtowytsch. 42 There are several works revealing the relationship between neural networks and numerical algorithms for solving partial differential equations. For example, He and Xu 43 discovered that CNNs for image classification have a strong connection with multi-grid (MG) methods. In fact, the pooling operation and feature extraction in CNNs correspond directly to restriction operation and iterative smoothers in MG, respectively. Hence, various convolution and pooling operations used in CNNs can be better understood.

The optimization problems of parameter estimation

In general, the optimization problem of estimating parameters of certain DNNs is in practice highly nonconvex and often nonsmooth. Can the global minimizers be expected? What is the landscape of local minimizers? How does one handle the nonsmoothness? All these questions are nontrivial from an optimization perspective. Indeed, numerous works and experiments demonstrate that the optimization for parameter estimation in DL is itself a much nicer problem than once thought; see, e.g., Goodfellow et al. 44 As a consequence, the study on the solution landscape ( Figure 3 ), also known as loss surface of neural networks, is no longer supposed to be inaccessible and can even in turn provide guidance for global optimization. Interested readers can refer to the survey paper (Sun et al. 45 ) for recent progress in this aspect.

An external file that holds a picture, illustration, etc.
Object name is gr3.jpg

Recent studies indicate that nonsmooth activation functions, e.g., rectified linear units, are better than smooth ones in finding sparse solutions. However, the chain rule does not work in the case that the activation functions are nonsmooth, which then makes the widely used stochastic gradient (SG)-based approaches not feasible in theory. Taking approximated gradients at nonsmooth iterates as a remedy ensures that SG-type methods are still in extensive use, but that the numerical evidence has also exposed their limitations. Also, the penalty-based approaches proposed by Cui et al. 46 and Liu et al. 47 provide a new direction to solve the nonsmooth optimization problems efficiently.

The generalization ability of learning models

A small training error does not always lead to a small test error. This gap is caused by the generalization ability of learning models. A key finding in statistical learning theory states that the generalization error is bounded by a quantity that grows with the increase of the model capacity, but shrinks as the number of training examples increases. 48 A common conjecture relating generalization to solution landscape is that flat and wide minima generalize better than sharp ones. Thus, regularization techniques, including the dropout approach, 49 have emerged to force the algorithms to bypass the sharp minima. However, the mechanism behind this has not been fully explored. Recently, some researchers have focused on the ResNet-type architecture, with dropout being inserted after the last convolutional layer of each modular building. They thus managed to explain the stochastic dropout training process and the ensuing dropout regularization effect from the perspective of optimal control. 50

AI in medical science

There is a great trend for AI technology to grow more and more significant in daily operations, including medical fields. With the growing needs of healthcare for patients, hospital needs are evolving from informationization networking to the Internet Hospital and eventually to the Smart Hospital. At the same time, AI tools and hardware performance are also growing rapidly with each passing day. Eventually, common AI algorithms, such as CV, NLP, and data mining, will begin to be embedded in the medical equipment market ( Figure 4 ).

An external file that holds a picture, illustration, etc.
Object name is gr4.jpg

AI doctor based on electronic medical records

For medical history data, it is inevitable to mention Doctor Watson, developed by the Watson platform of IBM, and Modernizing Medicine, which aims to solve oncology, and is now adopted by CVS & Walgreens in the US and various medical organizations in China as well. Doctor Watson takes advantage of the NLP performance of the IBM Watson platform, which already collected vast data of medical history, as well as prior knowledge in the literature for reference. After inputting the patients' case, Doctor Watson searches the medical history reserve and forms an elementary treatment proposal, which will be further ranked by prior knowledge reserves. With the multiple models stored, Doctor Watson gives the final proposal as well as the confidence of the proposal. However, there are still problems for such AI doctors because, 51 as they rely on prior experience from US hospitals, the proposal may not be suitable for other regions with different medical insurance policies. Besides, the knowledge updating of the Watson platform also relies highly on the updating of the knowledge reserve, which still needs manual work.

AI for public health: Outbreak detection and health QR code for COVID-19

AI can be used for public health purposes in many ways. One classical usage is to detect disease outbreaks using search engine query data or social media data, as Google did for prediction of influenza epidemics 52 and the Chinese Academy of Sciences did for modeling the COVID-19 outbreak through multi-source information fusion. 53 After the COVID-19 outbreak, a digital health Quick Response (QR) code system has been developed by China, first to detect potential contact with confirmed COVID-19 cases and, secondly, to indicate the person's health status using mobile big data. 54 Different colors indicate different health status: green means healthy and is OK for daily life, orange means risky and requires quarantine, and red means confirmed COVID-19 patient. It is easy to use for the general public, and has been adopted by many other countries. The health QR code has made great contributions to the worldwide prevention and control of the COVID-19 pandemic.

Biomarker discovery with AI

High-dimensional data, including multi-omics data, patient characteristics, medical laboratory test data, etc., are often used for generating various predictive or prognostic models through DL or statistical modeling methods. For instance, the COVID-19 severity evaluation model was built through ML using proteomic and metabolomic profiling data of sera 55 ; using integrated genetic, clinical, and demographic data, Taliaz et al. built an ML model to predict patient response to antidepressant medications 56 ; prognostic models for multiple cancer types (such as liver cancer, lung cancer, breast cancer, gastric cancer, colorectal cancer, pancreatic cancer, prostate cancer, ovarian cancer, lymphoma, leukemia, sarcoma, melanoma, bladder cancer, renal cancer, thyroid cancer, head and neck cancer, etc.) were constructed through DL or statistical methods, such as least absolute shrinkage and selection operator (LASSO), combined with Cox proportional hazards regression model using genomic data. 57

Image-based medical AI

Medical image AI is one of the most developed mature areas as there are numerous models for classification, detection, and segmentation tasks in CV. For the clinical area, CV algorithms can also be used for computer-aided diagnosis and treatment with ECG, CT, eye fundus imaging, etc. As human doctors may be tired and prone to make mistakes after viewing hundreds and hundreds of images for diagnosis, AI doctors can outperform a human medical image viewer due to their specialty at repeated work without fatigue. The first medical AI product approved by FDA is IDx-DR, which uses an AI model to make predictions of diabetic retinopathy. The smartphone app SkinVision can accurately detect melanomas. 58 It uses “fractal analysis” to identify moles and their surrounding skin, based on size, diameter, and many other parameters, and to detect abnormal growth trends. AI-ECG of LEPU Medical can automatically detect heart disease with ECG images. Lianying Medical takes advantage of their hardware equipment to produce real-time high-definition image-guided all-round radiotherapy technology, which successfully achieves precise treatment.

Wearable devices for surveillance and early warning

For wearable devices, AliveCor has developed an algorithm to automatically predict the presence of atrial fibrillation, which is an early warning sign of stroke and heart failure. The 23andMe company can also test saliva samples at a small cost, and a customer can be provided with information based on their genes, including who their ancestors were or potential diseases they may be prone to later in life. It provides accurate health management solutions based on individual and family genetic data. In the 20–30 years of the near feature, we believe there are several directions for further research: (1) causal inference for real-time in-hospital risk prediction. Clinical doctors usually acquire reasonable explanations for certain medical decisions, but the current AI models nowadays are usually black box models. The casual inference will help doctors to explain certain AI decisions and even discover novel ground truths. (2) Devices, including wearable instruments for multi-dimensional health monitoring. The multi-modality model is now a trend for AI research. With various devices to collect multi-modality data and a central processor to fuse all these data, the model can monitor the user's overall real-time health condition and give precautions more precisely. (3) Automatic discovery of clinical markers for diseases that are difficult to diagnose. Diseases, such as ALS, are still difficult for clinical doctors to diagnose because they lack any effective general marker. It may be possible for AI to discover common phenomena for these patients and find an effective marker for early diagnosis.

AI-aided drug discovery

Today we have come into the precision medicine era, and the new targeted drugs are the cornerstones for precision therapy. However, over the past decades, it takes an average of over one billion dollars and 10 years to bring a new drug into the market. How to accelerate the drug discovery process, and avoid late-stage failure, are key concerns for all the big and fiercely competitive pharmaceutical companies. The highlighted emerging role of AI, including ML, DL, expert systems, and artificial neural networks (ANNs), has brought new insights and high efficiency into the new drug discovery processes. AI has been adopted in many aspects of drug discovery, including de novo molecule design, structure-based modeling for proteins and ligands, quantitative structure-activity relationship research, and druggable property judgments. DL-based AI appliances demonstrate superior merits in addressing some challenging problems in drug discovery. Of course, prediction of chemical synthesis routes and chemical process optimization are also valuable in accelerating new drug discovery, as well as lowering production costs.

There has been notable progress in the AI-aided new drug discovery in recent years, for both new chemical entity discovery and the relating business area. Based on DNNs, DeepMind built the AlphaFold platform to predict 3D protein structures that outperformed other algorithms. As an illustration of great achievement, AlphaFold successfully and accurately predicted 25 scratch protein structures from a 43 protein panel without using previously built proteins models. Accordingly, AlphaFold won the CASP13 protein-folding competition in December 2018. 59 Based on the GANs and other ML methods, Insilico constructed a modular drug design platform GENTRL system. In September 2019, they reported the discovery of the first de novo active DDR1 kinase inhibitor developed by the GENTRL system. It took the team only 46 days from target selection to get an active drug candidate using in vivo data. 60 Exscientia and Sumitomo Dainippon Pharma developed a new drug candidate, DSP-1181, for the treatment of obsessive-compulsive disorder on the Centaur Chemist AI platform. In January 2020, DSP-1181 started its phase I clinical trials, which means that, from program initiation to phase I study, the comprehensive exploration took less than 12 months. In contrast, comparable drug discovery using traditional methods usually needs 4–5 years with traditional methods.

How AI transforms medical practice: A case study of cervical cancer

As the most common malignant tumor in women, cervical cancer is a disease that has a clear cause and can be prevented, and even treated, if detected early. Conventionally, the screening strategy for cervical cancer mainly adopts the “three-step” model of “cervical cytology-colposcopy-histopathology.” 61 However, limited by the level of testing methods, the efficiency of cervical cancer screening is not high. In addition, owing to the lack of knowledge by doctors in some primary hospitals, patients cannot be provided with the best diagnosis and treatment decisions. In recent years, with the advent of the era of computer science and big data, AI has gradually begun to extend and blend into various fields. In particular, AI has been widely used in a variety of cancers as a new tool for data mining. For cervical cancer, a clinical database with millions of medical records and pathological data has been built, and an AI medical tool set has been developed. 62 Such an AI analysis algorithm supports doctors to access the ability of rapid iterative AI model training. In addition, a prognostic prediction model established by ML and a web-based prognostic result calculator have been developed, which can accurately predict the risk of postoperative recurrence and death in cervical cancer patients, and thereby better guide decision-making in postoperative adjuvant treatment. 63

AI in materials science

As the cornerstone of modern industry, materials have played a crucial role in the design of revolutionary forms of matter, with targeted properties for broad applications in energy, information, biomedicine, construction, transportation, national security, spaceflight, and so forth. Traditional strategies rely on the empirical trial and error experimental approaches as well as the theoretical simulation methods, e.g., density functional theory, thermodynamics, or molecular dynamics, to discover novel materials. 64 These methods often face the challenges of long research cycles, high costs, and low success rates, and thus cannot meet the increasingly growing demands of current materials science. Accelerating the speed of discovery and deployment of advanced materials will therefore be essential in the coming era.

With the rapid development of data processing and powerful algorithms, AI-based methods, such as ML and DL, are emerging with good potentials in the search for and design of new materials prior to actually manufacturing them. 65 , 66 By integrating material property data, such as the constituent element, lattice symmetry, atomic radius, valence, binding energy, electronegativity, magnetism, polarization, energy band, structure-property relation, and functionalities, the machine can be trained to “think” about how to improve material design and even predict the properties of new materials in a cost-effective manner ( Figure 5 ).

An external file that holds a picture, illustration, etc.
Object name is gr5.jpg

AI is expected to power the development of materials science

AI in discovery and design of new materials

Recently, AI techniques have made significant advances in rational design and accelerated discovery of various materials, such as piezoelectric materials with large electrostrains, 67 organic-inorganic perovskites for photovoltaics, 68 molecular emitters for efficient light-emitting diodes, 69 inorganic solid materials for thermoelectrics, 70 and organic electronic materials for renewable-energy applications. 66 , 71 The power of data-driven computing and algorithmic optimization can promote comprehensive applications of simulation and ML (i.e., high-throughput virtual screening, inverse molecular design, Bayesian optimization, and supervised learning, etc.), in material discovery and property prediction in various fields. 72 For instance, using a DL Bayesian framework, the attribute-driven inverse materials design has been demonstrated for efficient and accurate prediction of functional molecular materials, with desired semiconducting properties or redox stability for applications in organic thin-film transistors, organic solar cells, or lithium-ion batteries. 73 It is meaningful to adopt automation tools for quick experimental testing of potential materials and utilize high-performance computing to calculate their bulk, interface, and defect-related properties. 74 The effective convergence of automation, computing, and ML can greatly speed up the discovery of materials. In the future, with the aid of AI techniques, it will be possible to accomplish the design of superconductors, metallic glasses, solder alloys, high-entropy alloys, high-temperature superalloys, thermoelectric materials, two-dimensional materials, magnetocaloric materials, polymeric bio-inspired materials, sensitive composite materials, and topological (electronic and phonon) materials, and so on. In the past decade, topological materials have ignited the research enthusiasm of condensed matter physicists, materials scientists, and chemists, as they exhibit exotic physical properties with potential applications in electronics, thermoelectrics, optics, catalysis, and energy-related fields. From the most recent predictions, more than a quarter of all inorganic materials in nature are topologically nontrivial. The establishment of topological electronic materials databases 75 , 76 , 77 and topological phononic materials databases 78 using high-throughput methods will help to accelerate the screening and experimental discovery of new topological materials for functional applications. It is recognized that large-scale high-quality datasets are required to practice AI. Great efforts have also been expended in building high-quality materials science databases. As one of the top-ranking databases of its kind, the “atomly.net” materials data infrastructure, 79 has calculated the properties of more than 180,000 inorganic compounds, including their equilibrium structures, electron energy bands, dielectric properties, simulated diffraction patterns, elasticity tensors, etc. As such, the atomly.net database has set a solid foundation for extending AI into the area of materials science research. The X-ray diffraction (XRD)-matcher model of atomly.net uses ML to match and classify the experimental XRD to the simulated patterns. Very recently, by using the dataset from atomly.net, an accurate AI model was built to rapidly predict the formation energy of almost any given compound to yield a fairly good predictive ability. 80

AI-powered Materials Genome Initiative

The Materials Genome Initiative (MGI) is a great plan for rational realization of new materials and related functions, and it aims to discover, manufacture, and deploy advanced materials efficiently, cost-effectively, and intelligently. The initiative creates policy, resources, and infrastructure for accelerating materials development at a high level. This is a new paradigm for the discovery and design of next-generation materials, and runs from a view point of fundamental building blocks toward general materials developments, and accelerates materials development through efforts in theory, computation, and experiment, in a highly integrated high-throughput manner. MGI raises an ultimately high goal and high level for materials development and materials science for humans in the future. The spirit of MGI is to design novel materials by using data pools and powerful computation once the requirements or aspirations of functional usages appear. The theory, computation, and algorithm are the primary and substantial factors in the establishment and implementation of MGI. Advances in theories, computations, and experiments in materials science and engineering provide the footstone to not only accelerate the speed at which new materials are realized but to also shorten the time needed to push new products into the market. These AI techniques bring a great promise to the developing MGI. The applications of new technologies, such as ML and DL, directly accelerate materials research and the establishment of MGI. The model construction and application to science and engineering, as well as the data infrastructure, are of central importance. When the AI-powered MGI approaches are coupled with the ongoing autonomy of manufacturing methods, the potential impact to society and the economy in the future is profound. We are now beginning to see that the AI-aided MGI, among other things, integrates experiments, computation, and theory, and facilitates access to materials data, equips the next generation of the materials workforce, and enables a paradigm shift in materials development. Furthermore, the AI-powdered MGI could also design operational procedures and control the equipment to execute experiments, and to further realize autonomous experimentation in future material research.

Advanced functional materials for generation upgrade of AI

The realization and application of AI techniques depend on the computational capability and computer hardware, and this bases physical functionality on the performance of computers or supercomputers. For our current technology, the electric currents or electric carriers for driving electric chips and devices consist of electrons with ordinary characteristics, such as heavy mass and low mobility. All chips and devices emit relatively remarkable heat levels, consuming too much energy and lowering the efficiency of information transmission. Benefiting from the rapid development of modern physics, a series of advanced materials with exotic functional effects have been discovered or designed, including superconductors, quantum anomalous Hall insulators, and topological fermions. In particular, the superconducting state or topologically nontrivial electrons will promote the next-generation AI techniques once the (near) room temperature applications of these states are realized and implanted in integrated circuits. 81 In this case, the central processing units, signal circuits, and power channels will be driven based on the electronic carriers that show massless, energy-diffusionless, ultra-high mobility, or chiral-protection characteristics. The ordinary electrons will be removed from the physical circuits of future-generation chips and devices, leaving superconducting and topological chiral electrons running in future AI chips and supercomputers. The efficiency of transmission, for information and logic computing will be improved on a vast scale and at a very low cost.

AI for materials and materials for AI

The coming decade will continue to witness the development of advanced ML algorithms, newly emerging data-driven AI methodologies, and integrated technologies for facilitating structure design and property prediction, as well as to accelerate the discovery, design, development, and deployment of advanced materials into existing and emerging industrial sectors. At this moment, we are facing challenges in achieving accelerated materials research through the integration of experiment, computation, and theory. The great MGI, proposed for high-level materials research, helps to promote this process, especially when it is assisted by AI techniques. Still, there is a long way to go for the usage of these advanced functional materials in future-generation electric chips and devices to be realized. More materials and functional effects need to be discovered or improved by the developing AI techniques. Meanwhile, it is worth noting that materials are the core components of devices and chips that are used for construction of computers or machines for advanced AI systems. The rapid development of new materials, especially the emergence of flexible, sensitive, and smart materials, is of great importance for a broad range of attractive technologies, such as flexible circuits, stretchable tactile sensors, multifunctional actuators, transistor-based artificial synapses, integrated networks of semiconductor/quantum devices, intelligent robotics, human-machine interactions, simulated muscles, biomimetic prostheses, etc. These promising materials, devices, and integrated technologies will greatly promote the advancement of AI systems toward wide applications in human life. Once the physical circuits are upgraded by advanced functional or smart materials, AI techniques will largely promote the developments and applications of all disciplines.

AI in geoscience

Ai technologies involved in a large range of geoscience fields.

Momentous challenges threatening current society require solutions to problems that belong to geoscience, such as evaluating the effects of climate change, assessing air quality, forecasting the effects of disaster incidences on infrastructure, by calculating the incoming consumption and availability of food, water, and soil resources, and identifying factors that are indicators for potential volcanic eruptions, tsunamis, floods, and earthquakes. 82 , 83 It has become possible, with the emergence of advanced technology products (e.g., deep sea drilling vessels and remote sensing satellites), for enhancements in computational infrastructure that allow for processing large-scale, wide-range simulations of multiple models in geoscience, and internet-based data analysis that facilitates collection, processing, and storage of data in distributed and crowd-sourced environments. 84 The growing availability of massive geoscience data provides unlimited possibilities for AI—which has popularized all aspects of our daily life (e.g., entertainment, transportation, and commerce)—to significantly contribute to geoscience problems of great societal relevance. As geoscience enters the era of massive data, AI, which has been extensively successful in different fields, offers immense opportunities for settling a series of problems in Earth systems. 85 , 86 Accompanied by diversified data, AI-enabled technologies, such as smart sensors, image visualization, and intelligent inversion, are being actively examined in a large range of geoscience fields, such as marine geoscience, rock physics, geology, ecology, seismicity, environment, hydrology, remote sensing, Arc GIS, and planetary science. 87

Multiple challenges in the development of geoscience

There are some traits of geoscience development that restrict the applicability of fundamental algorithms for knowledge discovery: (1) inherent challenges of geoscience processes, (2) limitation of geoscience data collection, and (3) uncertainty in samples and ground truth. 88 , 89 , 90 Amorphous boundaries generally exist in geoscience objects between space and time that are not as well defined as objects in other fields. Geoscience phenomena are also significantly multivariate, obey nonlinear relationships, and exhibit spatiotemporal structure and non-stationary characteristics. Except for the inherent challenges of geoscience observations, the massive data at multiple dimensions of time and space, with different levels of incompleteness, noise, and uncertainties, disturb processes in geoscience. For supervised learning approaches, there are other difficulties owing to the lack of gold standard ground truth and the “small size” of samples (e.g., a small amount of historical data with sufficient observations) in geoscience applications.

Usage of AI technologies as efficient approaches to promote the geoscience processes

Geoscientists continually make every effort to develop better techniques for simulating the present status of the Earth system (e.g., how much greenhouse gases are released into the atmosphere), and the connections between and within its subsystems (e.g., how does the elevated temperature influence the ocean ecosystem). Viewed from the perspective of geoscience, newly emerging approaches, with the aid of AI, are a perfect combination for these issues in the application of geoscience: (1) characterizing objects and events 91 ; (2) estimating geoscience variables from observations 92 ; (3) forecasting geoscience variables according to long-term observations 85 ; (4) exploring geoscience data relationships 93 ; and (5) causal discovery and causal attribution. 94 While characterizing geoscience objects and events using traditional methods are primarily rooted in hand-coded features, algorithms can automatically detect the data by improving the performance with pattern-mining techniques. However, due to spatiotemporal targets with vague boundaries and the related uncertainties, it can be necessary to advance pattern-mining methods that can explain the temporal and spatial characteristics of geoscience data when characterizing different events and objects. To address the non-stationary issue of geoscience data, AI-aided algorithms have been expanded to integrate the holistic results of professional predictors and engender robust estimations of climate variables (e.g., humidity and temperature). Furthermore, forecasting long-term trends of the current situation in the Earth system using AI-enabled technologies can simulate future scenarios and formulate early resource planning and adaptation policies. Mining geoscience data relationships can help us seize vital signs of the Earth system and promote our understanding of geoscience developments. Of great interest is the advancement of AI-decision methodology with uncertain prediction probabilities, engendering vague risks with poorly resolved tails, signifying the most extreme, transient, and rare events formulated by model sets, which supports various cases to improve accuracy and effectiveness.

AI technologies for optimizing the resource management in geoscience

Currently, AI can perform better than humans in some well-defined tasks. For example, AI techniques have been used in urban water resource planning, mainly due to their remarkable capacity for modeling, flexibility, reasoning, and forecasting the water demand and capacity. Design and application of an Adaptive Intelligent Dynamic Water Resource Planning system, the subset of AI for sustainable water resource management in urban regions, largely prompted the optimization of water resource allocation, will finally minimize the operation costs and improve the sustainability of environmental management 95 ( Figure 6 ). Also, meteorology requires collecting tremendous amounts of data on many different variables, such as humidity, altitude, and temperature; however, dealing with such a huge dataset is a big challenge. 96 An AI-based technique is being utilized to analyze shallow-water reef images, recognize the coral color—to track the effects of climate change, and to collect humidity, temperature, and CO 2 data—to grasp the health of our ecological environment. 97 Beyond AI's capabilities for meteorology, it can also play a critical role in decreasing greenhouse gas emissions originating from the electric-power sector. Comprised of production, transportation, allocation, and consumption of electricity, many opportunities exist in the electric-power sector for Al applications, including speeding up the development of new clean energy, enhancing system optimization and management, improving electricity-demand forecasts and distribution, and advancing system monitoring. 98 New materials may even be found, with the auxiliary of AI, for batteries to store energy or materials and absorb CO 2 from the atmosphere. 99 Although traditional fossil fuel operations have been widely used for thousands of years, AI techniques are being used to help explore the development of more potential sustainable energy sources for the development (e.g., fusion technology). 100

An external file that holds a picture, illustration, etc.
Object name is gr6.jpg

Applications of AI in hydraulic resource management

In addition to the adjustment of energy structures due to climate change (a core part of geoscience systems), a second, less-obvious step could also be taken to reduce greenhouse gas emission: using AI to target inefficiencies. A related statistical report by the Lawrence Livermore National Laboratory pointed out that around 68% of energy produced in the US could be better used for purposeful activities, such as electricity generation or transportation, but is instead contributing to environmental burdens. 101 AI is primed to reduce these inefficiencies of current nuclear power plants and fossil fuel operations, as well as improve the efficiency of renewable grid resources. 102 For example, AI can be instrumental in the operation and optimization of solar and wind farms to make these utility-scale renewable-energy systems far more efficient in the production of electricity. 103 AI can also assist in reducing energy losses in electricity transportation and allocation. 104 A distribution system operator in Europe used AI to analyze load, voltage, and network distribution data, to help “operators assess available capacity on the system and plan for future needs.” 105 AI allowed the distribution system operator to employ existing and new resources to make the distribution of energy assets more readily available and flexible. The International Energy Agency has proposed that energy efficiency is core to the reform of energy systems and will play a key role in reducing the growth of global energy demand to one-third of the current level by 2040.

AI as a building block to promote development in geoscience

The Earth’s system is of significant scientific interest, and affects all aspects of life. 106 The challenges, problems, and promising directions provided by AI are definitely not exhaustive, but rather, serve to illustrate that there is great potential for future AI research in this important field. Prosperity, development, and popularization of AI approaches in the geosciences is commonly driven by a posed scientific question, and the best way to succeed is that AI researchers work closely with geoscientists at all stages of research. That is because the geoscientists can better understand which scientific question is important and novel, which sample collection process can reasonably exhibit the inherent strengths, which datasets and parameters can be used to answer that question, and which pre-processing operations are conducted, such as removing seasonal cycles or smoothing. Similarly, AI researchers are better suited to decide which data analysis approaches are appropriate and available for the data, the advantages and disadvantages of these approaches, and what the approaches actually acquire. Interpretability is also an important goal in geoscience because, if we can understand the basic reasoning behind the models, patterns, or relationships extracted from the data, they can be used as building blocks in scientific knowledge discovery. Hence, frequent communication between the researchers avoids long detours and ensures that analysis results are indeed beneficial to both geoscientists and AI researchers.

AI in the life sciences

The developments of AI and the life sciences are intertwined. The ultimate goal of AI is to achieve human-like intelligence, as the human brain is capable of multi-tasking, learning with minimal supervision, and generalizing learned skills, all accomplished with high efficiency and low energy cost. 107

Mutual inspiration between AI and neuroscience

In the past decades, neuroscience concepts have been introduced into ML algorithms and played critical roles in triggering several important advances in AI. For example, the origins of DL methods lie directly in neuroscience, 5 which further stimulated the emergence of the field of RL. 108 The current state-of-the-art CNNs incorporate several hallmarks of neural computation, including nonlinear transduction, divisive normalization, and maximum-based pooling of inputs, 109 which were directly inspired by the unique processing of visual input in the mammalian visual cortex. 110 By introducing the brain's attentional mechanisms, a novel network has been shown to produce enhanced accuracy and computational efficiency at difficult multi-object recognition tasks than conventional CNNs. 111 Other neuroscience findings, including the mechanisms underlying working memory, episodic memory, and neural plasticity, have inspired the development of AI algorithms that address several challenges in deep networks. 108 These algorithms can be directly implemented in the design and refinement of the brain-machine interface and neuroprostheses.

On the other hand, insights from AI research have the potential to offer new perspectives on the basics of intelligence in the brains of humans and other species. Unlike traditional neuroscientists, AI researchers can formalize the concepts of neural mechanisms in a quantitative language to extract their necessity and sufficiency for intelligent behavior. An important illustration of such exchange is the development of the temporal-difference (TD) methods in RL models and the resemblance of TD-form learning in the brain. 112 Therefore, the China Brain Project covers both basic research on cognition and translational research for brain disease and brain-inspired intelligence technology. 113

AI for omics big data analysis

Currently, AI can perform better than humans in some well-defined tasks, such as omics data analysis and smart agriculture. In the big data era, 114 there are many types of data (variety), the volume of data is big, and the generation of data (velocity) is fast. The high variety, big volume, and fast velocity of data makes having it a matter of big value, but also makes it difficult to analyze the data. Unlike traditional statistics-based methods, AI can easily handle big data and reveal hidden associations.

In genetics studies, there are many successful applications of AI. 115 One of the key questions is to determine whether a single amino acid polymorphism is deleterious. 116 There have been sequence conservation-based SIFT 117 and network-based SySAP, 118 but all these methods have met bottlenecks and cannot be further improved. Sundaram et al. developed PrimateAI, which can predict the clinical outcome of mutation based on DNN. 119 Another problem is how to call copy-number variations, which play important roles in various cancers. 120 , 121 Glessner et al. proposed a DL-based tool DeepCNV, in which the area under the receiver operating characteristic (ROC) curve was 0.909, much higher than other ML methods. 122 In epigenetic studies, m6A modification is one of the most important mechanisms. 123 Zhang et al. developed an ensemble DL predictor (EDLm6APred) for mRNA m6A site prediction. 124 The area under the ROC curve of EDLm6APred was 86.6%, higher than existing m6A methylation site prediction models. There are many other DL-based omics tools, such as DeepCpG 125 for methylation, DeepPep 126 for proteomics, AtacWorks 127 for assay for transposase-accessible chromatin with high-throughput sequencing, and deepTCR 128 for T cell receptor sequencing.

Another emerging application is DL for single-cell sequencing data. Unlike bulk data, in which the sample size is usually much smaller than the number of features, the sample size of cells in single-cell data could also be big compared with the number of genes. That makes the DL algorithm applicable for most single-cell data. Since the single-cell data are sparse and have many unmeasured missing values, DeepImpute can accurately impute these missing values in the big gene × cell matrix. 129 During the quality control of single-cell data, it is important to remove the doublet solo embedded cells, using autoencoder, and then build a feedforward neural network to identify the doublet. 130 Potential energy underlying single-cell gradients used generative modeling to learn the underlying differentiation landscape from time series single-cell RNA sequencing data. 131

In protein structure prediction, the DL-based AIphaFold2 can accurately predict the 3D structures of 98.5% of human proteins, and will predict the structures of 130 million proteins of other organisms in the next few months. 132 It is even considered to be the second-largest breakthrough in life sciences after the human genome project 133 and will facilitate drug development among other things.

AI makes modern agriculture smart

Agriculture is entering a fourth revolution, termed agriculture 4.0 or smart agriculture, benefiting from the arrival of the big data era as well as the rapid progress of lots of advanced technologies, in particular ML, modern information, and communication technologies. 134 , 135 Applications of DL, information, and sensing technologies in agriculture cover the whole stages of agricultural production, including breeding, cultivation, and harvesting.

Traditional breeding usually exploits genetic variations by searching natural variation or artificial mutagenesis. However, it is hard for either method to expose the whole mutation spectrum. Using DL models trained on the existing variants, predictions can be made on multiple unidentified gene loci. 136 For example, an ML method, multi-criteria rice reproductive gene predictor, was developed and applied to predict coding and lincRNA genes associated with reproductive processes in rice. 137 Moreover, models trained in species with well-studied genomic data (such as Arabidopsis and rice) can also be applied to other species with limited genome information (such as wild strawberry and soybean). 138 In most cases, the links between genotypes and phenotypes are more complicated than we expected. One gene can usually respond to multiple phenotypes, and one trait is generally the product of the synergism between multi-genes and multi-development. For this reason, multi-traits DL models were developed and enabled genomic editing in plant breeding. 139 , 140

It is well known that dynamic and accurate monitoring of crops during the whole growth period is vitally important to precision agriculture. In the new stage of agriculture, both remote sensing and DL play indispensable roles. Specifically, remote sensing (including proximal sensing) could produce agricultural big data from ground, air-borne, to space-borne platforms, which have a unique potential to offer an economical approach for non-destructive, timely, objective, synoptic, long-term, and multi-scale information for crop monitoring and management, thereby greatly assisting in precision decisions regarding irrigation, nutrients, disease, pests, and yield. 141 , 142 DL makes it possible to simply, efficiently, and accurately discover knowledge from massive and complicated data, especially for remote sensing big data that are characterized with multiple spatial-temporal-spectral information, owing to its strong capability for feature representation and superiority in capturing the essential relation between observation data and agronomy parameters or crop traits. 135 , 143 Integration of DL and big data for agriculture has demonstrated the most disruptive force, as big as the green revolution. As shown in Figure 7 , for possible application a scenario of smart agriculture, multi-source satellite remote sensing data with various geo- and radio-metric information, as well as abundance of spectral information from UV, visible, and shortwave infrared to microwave regions, can be collected. In addition, advanced aircraft systems, such as unmanned aerial vehicles with multi/hyper-spectral cameras on board, and smartphone-based portable devices, will be used to obtain multi/hyper-spectral data in specific fields. All types of data can be integrated by DL-based fusion techniques for different purposes, and then shared for all users for cloud computing. On the cloud computing platform, different agriculture remote sensing models developed by a combination of data-driven ML methods and physical models, will be deployed and applied to acquire a range of biophysical and biochemical parameters of crops, which will be further analyzed by a decision-making and prediction system to obtain the current water/nutrient stress, growth status, and to predict future development. As a result, an automatic or interactive user service platform can be accessible to make the correct decisions for appropriate actions through an integrated irrigation and fertilization system.

An external file that holds a picture, illustration, etc.
Object name is gr7.jpg

Integration of AI and remote sensing in smart agriculture

Furthermore, DL presents unique advantages in specific agricultural applications, such as for dense scenes, that increase the difficulty of artificial planting and harvesting. It is reported that CNNs and Autoencoder models, trained with image data, are being used increasingly for phenotyping and yield estimation, 144 such as counting fruits in orchards, grain recognition and classification, disease diagnosis, etc. 145 , 146 , 147 Consequently, this may greatly liberate the labor force.

The application of DL in agriculture is just beginning. There are still many problems and challenges for the future development of DL technology. We believe, with the continuous acquisition of massive data and the optimization of algorithms, DL will have a better prospect in agricultural production.

AI in physics

The scale of modern physics ranges from the size of a neutron to the size of the Universe ( Figure 8 ). According to the scale, physics can be divided into four categories: particle physics on the scale of neutrons, nuclear physics on the scale of atoms, condensed matter physics on the scale of molecules, and cosmic physics on the scale of the Universe. AI, also called ML, plays an important role in all physics in different scales, since the use of the AI algorithm will be the main trend in data analyses, such as the reconstruction and analysis of images.

An external file that holds a picture, illustration, etc.
Object name is gr8.jpg

Scale of the physics

Speeding up simulations and identifications of particles with AI

There are many applications or explorations of applications of AI in particle physics. We cannot cover all of them here, but only use lattice quantum chromodynamics (LQCD) and the experiments on the Beijing spectrometer (BES) and the large hadron collider (LHC) to illustrate the power of ML in both theoretical and experimental particle physics.

LQCD studies the nonperturbative properties of QCD by using Monte Carlo simulations on supercomputers to help us understand the strong interaction that binds quarks together to form nucleons. Markov chain Monte Carlo simulations commonly used in LQCD suffer from topological freezing and critical slowing down as the simulations approach the real situation of the actual world. New algorithms with the help of DL are being proposed and tested to overcome those difficulties. 148 , 149 Physical observables are extracted from LQCD data, whose signal-to-noise ratio deteriorates exponentially. For non-Abelian gauge theories, such as QCD, complicated contour deformations can be optimized by using ML to reduce the variance of LQCD data. Proof-of-principle applications in two dimensions have been studied. 150 ML can also be used to reduce the time cost of generating LQCD data. 151

On the experimental side, particle identification (PID) plays an important role. Recently, a few PID algorithms on BES-III were developed, and the ANN 152 is one of them. Also, extreme gradient boosting has been used for multi-dimensional distribution reweighting, muon identification, and cluster reconstruction, and can improve the muon identification. U-Net is a convolutional network for pixel-level semantic segmentation, which is widely used in CV. It has been applied on BES-III to solve the problem of multi-turn curling track finding for the main drift chamber. The average efficiency and purity for the first turn's hits is about 91%, at the threshold of 0.85. Current (and future) particle physics experiments are producing a huge amount of data. Machine leaning can be used to discriminate between signal and overwhelming background events. Examples of data analyses on LHC, using supervised ML, can be found in a 2018 collaboration. 153 To take the potential advantage of quantum computers forward, quantum ML methods are also being investigated, see, for example, Wu et al., 154 and references therein, for proof-of-concept studies.

AI makes nuclear physics powerful

Cosmic ray muon tomography (Muography) 155 is an imaging graphe technology using natural cosmic ray muon radiation rather than artificial radiation to reduce the dangers. As an advantage, this technology can detect high-Z materials without destruction, as muon is sensitive to high-Z materials. The Classification Model Algorithm (CMA) algorithm is based on the classification in the supervised learning and gray system theory, and generates a binary classifier designing and decision function with the input of the muon track, and the output indicates whether the material exists at the location. The AI helps the user to improve the efficiency of the scanning time with muons.

AIso, for nuclear detection, the Cs 2 LiYCl 6 :Ce (CLYC) signal can react to both electrons and neutrons to create a pulse signal, and can therefore be applied to detect both neutrons and electrons, 156 but needs identification of the two particles by analyzing the shapes of the waves, that is n-γ ID. The traditional method has been the PSD (pulse shape discrimination) method, which is used to separate the waves of two particles by analyzing the distribution of the pulse information—such as amplitude, width, raise time, fall time, and the two particles that can be separated when the distribution has two separated Gaussian distributions. The traditional PSD can only analyze single-pulse waves, rather than multipulse waves, when two particles react with CLYC closely. But it can be solved by using an ANN method for classification of the six categories (n,γ,n + n,n + γ,γ + n,γ). Also, there are several parameters that could be used by AI to improve the reconstruction algorithm with high efficiency and less error.

AI-aided condensed matter physics

AI opens up a new avenue for physical science, especially when a trove of data is available. Recent works demonstrate that ML provides useful insights to improve the density functional theory (DFT), in which the single-electron picture of the Kohn-Sham scheme has the difficulty of taking care of the exchange and correlation effects of many-body systems. Yu et al. proposed a Bayesian optimization algorithm to fit the Hubbard U parameter, and the new method can find the optimal Hubbard U through a self-consistent process with good efficiency compared with the linear response method, 157 and boost the accuracy to the near-hybrid-functional-level. Snyder et al. developed an ML density functional for a 1D non-interacting non-spin-polarized fermion system to obtain significantly improved kinetic energy. This method enabled a direct approximation of the kinetic energy of a quantum system and can be utilized in orbital-free DFT modeling, and can even bypass the solving of the Kohn-Sham equation—while maintaining the precision to the quantum chemical level when a strong correlation term is included. Recently, FermiNet showed that the many-body quantum mechanics equations can be solved via AI. AI models also show advantages of capturing the interatom force field. In 2010, the Gaussian approximation potential (GAP) 158 was introduced as a powerful interatomic force field to describe the interactions between atoms. GAP uses kernel regression and invariant many-body representations, and performs quite well. For instance, it can simulate crystallization of amorphous crystals under high pressure fairly accurately. By employing the smooth overlap of the atomic position kernel (SOAP), 159 the accuracy of the potential can be further enhanced and, therefore, the SOAP-GAP can be viewed as a field-leading method for AI molecular dynamic simulation. There are also several other well-developed AI interatomic potentials out there, e.g., crystal graph CNNs provide a widely applicable way of vectorizing crystalline materials; SchNet embeds the continuous-filter convolutional layers into its DNNs for easing molecular dynamic as the potentials are space continuous; DimeNet constructs the directional message passing neural network by adding not only the bond length between atoms but also the bond angle, the dihedral angle, and the interactions between unconnected atoms into the model to obtain good accuracy.

AI helps explore the Universe

AI is one of the newest technologies, while astronomy is one of the oldest sciences. When the two meet, new opportunities for scientific breakthroughs are often triggered. Observations and data analysis play a central role in astronomy. The amount of data collected by modern telescopes has reached unprecedented levels, even the most basic task of constructing a catalog has become challenging with traditional source-finding tools. 160 Astronomers have developed automated and intelligent source-finding tools based on DL, which not only offer significant advantages in operational speed but also facilitate a comprehensive understanding of the Universe by identifying particular forms of objects that cannot be detected by traditional software and visual inspection. 160 , 161

More than a decade ago, a citizen science project called “Galaxy Zoo” was proposed to help label one million images of galaxies collected by the Sloan Digital Sky Survey (SDSS) by posting images online and recruiting volunteers. 162 Larger optical telescopes, in operation or under construction, produce data several orders of magnitude higher than SDSS. Even with volunteers involved, there is no way to analyze the vast amount of data received. The advantages of ML are not limited to source-finding and galaxy classification. In fact, it has a much wider range of applications. For example, CNN plays an important role in detecting and decoding gravitational wave signals in real time, reconstructing all parameters within 2 ms, while traditional algorithms take several days to accomplish the same task. 163 Such DL systems have also been used to automatically generate alerts for transients and track asteroids and other fast-moving near-Earth objects, improving detection efficiency by several orders of magnitude. In addition, astrophysicists are exploring the use of neural networks to measure galaxy clusters and study the evolution of the Universe.

In addition to the amazing speed, neural networks seem to have a deeper understanding of the data than expected and can recognize more complex patterns, indicating that the “machine” is evolving rather than just learning the characteristics of the input data.

AI in chemistry

Chemistry plays an important “central” role in other sciences 164 because it is the investigation of the structure and properties of matter, and identifies the chemical reactions that convert substances into to other substances. Accordingly, chemistry is a data-rich branch of science containing complex information resulting from centuries of experiments and, more recently, decades of computational analysis. This vast treasure trove of data is most apparent within the Chemical Abstract Services, which has collected more than 183 million unique organic and inorganic substances, including alloys, coordination compounds, minerals, mixtures, polymers, and salts, and is expanding by addition of thousands of additional new substances daily. 165 The unlimited complexity in the variety of material compounds explains why chemistry research is still a labor-intensive task. The level of complexity and vast amounts of data within chemistry provides a prime opportunity to achieve significant breakthroughs with the application of AI. First, the type of molecules that can be constructed from atoms are almost unlimited, which leads to unlimited chemical space 166 ; the interconnection of these molecules with all possible combinations of factors, such as temperature, substrates, and solvents, are overwhelmingly large, giving rise to unlimited reaction space. 167 Exploration of the unlimited chemical space and reaction space, and navigating to the optimum ones with the desired properties, is thus practically impossible solely from human efforts. Secondly, in chemistry, the huge assortment of molecules and the interplay of them with the external environments brings a new level of complexity, which cannot be simply predicted using physical laws. While many concepts, rules, and theories have been generalized from centuries of experience from studying trivial (i.e., single component) systems, nontrivial complexities are more likely as we discover that “more is different” in the words of Philip Warren Anderson, American physicist and Nobel Laureate. 168 Nontrivial complexities will occur when the scale changes, and the breaking of symmetry in larger, increasingly complex systems, and the rules will shift from quantitative to qualitative. Due to lack of systematic and analytical theory toward the structures, properties, and transformations of macroscopic substances, chemistry research is thus, incorrectly, guided by heuristics and fragmental rules accumulated over the previous centuries, yielding progress that only proceeds through trial and error. ML will recognize patterns from large amounts of data; thereby offering an unprecedented way of dealing with complexity, and reshaping chemistry research by revolutionizing the way in which data are used. Every sub-field of chemistry, currently, has utilized some form of AI, including tools for chemistry research and data generation, such as analytical chemistry and computational chemistry, as well as application to organic chemistry, catalysis, and medical chemistry, which we discuss herein.

AI breaks the limitations of manual feature selection methods

In analytical chemistry, the extraction of information has traditionally relied heavily on the feature selection techniques, which are based on prior human experiences. Unfortunately, this approach is inefficient, incomplete, and often biased. Automated data analysis based on AI will break the limitations of manual variable selection methods by learning from large amounts of data. Feature selection through DL algorithms enables information extraction from the datasets in NMR, chromatography, spectroscopy, and other analytical tools, 169 thereby improving the model prediction accuracy for analysis. These ML approaches will greatly accelerate the analysis of materials, leading to the rapid discovery of new molecules or materials. Raman scattering, for instance, since its discovery in the 1920s, has been widely employed as a powerful vibrational spectroscopy technology, capable of providing vibrational fingerprints intrinsic to analytes, thus enabling identification of molecules. 170 Recently, ML methods have been trained to recognize features in Raman (or SERS) spectra for the identity of an analyte by applying DL networks, including ANN, CNN, and fully convolutional network for feature engineering. 171 For example, Leong et al. designed a machine-learning-driven “SERS taster” to simultaneously harness useful vibrational information from multiple receptors for enhanced multiplex profiling of five wine flavor molecules at ppm levels. Principal-component analysis is employed for the discrimination of alcohols with varying degrees of substitution, and supported with vector machine discriminant analysis, is used to quantitatively classify all flavors with 100% accuracy. 172 Overall, AI techniques provide the first glimmer of hope for a universal method for spectral data analysis, which is fast, accurate, objective and definitive and with attractive advantages in a wide range of applications.

AI improves the accuracy and efficiency for various levels of computational theory

Complementary to analytical tools, computational chemistry has proven a powerful approach for using simulations to understand chemical properties; however, it is faced with an accuracy-versus-efficiency dilemma. This dilemma greatly limits the application of computational chemistry to real-world chemistry problems. To overcome this dilemma, ML and other AI methods are being applied to improve the accuracy and efficiency for various levels of theory used to describe the effects arising at different time and length scales, in the multi-scaling of chemical reactions. 173 Many of the open challenges in computational chemistry can be solved by ML approaches, for example, solving Schrödinger's equation, 174 developing atomistic 175 or coarse graining 176 potentials, constructing reaction coordinates, 177 developing reaction kinetics models, 178 and identifying key descriptors for computable properties. 179 In addition to analytical chemistry and computational chemistry, several disciplines of chemistry have incorporated AI technology to chemical problems. We discuss the areas of organic chemistry, catalysis, and medical chemistry as examples of where ML has made a significant impact. Many examples exist in literature for other subfields of chemistry and AI will continue to demonstrate breakthroughs in a wide range of chemical applications.

AI enables robotics capable of automating the synthesis of molecules

Organic chemistry studies the structure, property, and reaction of carbon-based molecules. The complexity of the chemical and reaction space, for a given property, presents an unlimited number of potential molecules that can be synthesized by chemists. Further complications are added when faced with the problems of how to synthesize a particular molecule, given that the process relies much on heuristics and laborious testing. Challenges have been addressed by researchers using AI. Given enough data, any properties of interest of a molecule can be predicted by mapping the molecular structure to the corresponding property using supervised learning, without resorting to physical laws. In addition to known molecules, new molecules can be designed by sampling the chemical space 180 using methods, such as autoencoders and CNNs, with the molecules coded as sequences or graphs. Retrosynthesis, the planning of synthetic routes, which was once considered an art, has now become much simpler with the help of ML algorithms. The Chemetica system, 181 for instance, is now capable of autonomous planning of synthetic routes that are subsequently proven to work in the laboratory. Once target molecules and the route of synthesis are determined, suitable reaction conditions can be predicted or optimized using ML techniques. 182

The integration of these AI-based approaches with robotics has enabled fully AI-guided robotics capable of automating the synthesis of small organic molecules without human intervention Figure 9 . 183 , 184

An external file that holds a picture, illustration, etc.
Object name is gr9.jpg

A closed loop workflow to enable automatic and intelligent design, synthesis, and assay of molecules in organic chemistry by AI

AI helps to search through vast catalyst design spaces

Catalytic chemistry originates from catalyst technologies in the chemical industry for efficient and sustainable production of chemicals and fuels. Thus far, it is still a challenging endeavor to make novel heterogeneous catalysts with good performance (i.e., stable, active, and selective) because a catalyst's performance depends on many properties: composition, support, surface termination, particle size, particle morphology, atomic coordination environment, porous structure, and reactor during the reaction. The inherent complexity of catalysis makes discovering and developing catalysts with desired properties more dependent on intuition and experiment, which is costly and time consuming. AI technologies, such as ML, when combined with experimental and in silico high-throughput screening of combinatorial catalyst libraries, can aid catalyst discovery by helping to search through vast design spaces. With a well-defined structure and standardized data, including reaction results and in situ characterization results, the complex association between catalytic structure and catalytic performance will be revealed by AI. 185 , 186 An accurate descriptor of the effect of molecules, molecular aggregation states, and molecular transport, on catalysts, could also be predicted. With this approach, researchers can build virtual laboratories to develop new catalysts and catalytic processes.

AI enables screening of chemicals in toxicology with minimum ethical concerns

A more complicated sub-field of chemistry is medical chemistry, which is a challenging field due to the complex interactions between the exotic substances and the inherent chemistry within a living system. Toxicology, for instance, as a broad field, seeks to predict and eliminate substances (e.g., pharmaceuticals, natural products, food products, and environmental substances), which may cause harm to a living organism. Living organisms are already complex, nearly any known substance can cause toxicity at a high enough exposure because of the already inherent complexity within living organisms. Moreover, toxicity is dependent on an array of other factors, including organism size, species, age, sex, genetics, diet, combination with other chemicals, overall health, and/or environmental context. Given the scale and complexity of toxicity problems, AI is likely to be the only realistic approach to meet regulatory body requirements for screening, prioritization, and risk assessment of chemicals (including mixtures), therefore revolutionizing the landscape in toxicology. 187 In summary, AI is turning chemistry from a labor-intensive branch of science to a highly intelligent, standardized, and automated field, and much more can be achieved compared with the limitation of human labor. Underlying knowledge with new concepts, rules, and theories is expected to advance with the application of AI algorithms. A large portion of new chemistry knowledge leading to significant breakthroughs is expected to be generated from AI-based chemistry research in the decades to come.

Conclusions

This paper carries out a comprehensive survey on the development and application of AI across a broad range of fundamental sciences, including information science, mathematics, medical science, materials science, geoscience, life science, physics, and chemistry. Despite the fact that AI has been pervasively used in a wide range of applications, there still exist ML security risks on data and ML models as attack targets during both training and execution phases. Firstly, since the performance of an ML system is highly dependent on the data used to train it, these input data are crucial for the security of the ML system. For instance, adversarial example attacks 188 providing malicious input data often lead the ML system into making false judgments (predictions or categorizations) with small perturbations that are imperceptible to humans; data poisoning by intentionally manipulating raw, training, or testing data can result in a decrease in model accuracy or lead to other error-specific attack purposes. Secondly, ML model attacks include backdoor attacks on DL, CNN, and federated learning that manipulate the model's parameters directly, as well as model stealing attack, model inversion attack, and membership inference attack, which can steal the model parameters or leak the sensitive training data. While a number of defense techniques against these security threats have been proposed, new attack models that target ML systems are constantly emerging. Thus, it is necessary to address the problem of ML security and develop robust ML systems that remain effective under malicious attacks.

Due to the data-driven character of the ML method, features of the training and testing data must be drawn from the same distribution, which is difficult to guarantee in practice. This is because, in practical application, the data source might be different from that in the training dataset. In addition, the data feature distribution may drift over time, which leads to a decline of the performance of the model. Moreover, if the model is trained with only new data, it will lead to catastrophic “forgetting” of the model, which means the model only remembers the new features and forgets the previously learned features. To solve this problem, more and more scholars pay attention on how to make the model have the ability of lifelong learning, that is, a change in the computing paradigm from “offline learning + online reasoning” to “online continuous learning,” and thus give the model have the ability of lifelong learning, just like a human being.

Acknowledgments

This work was partially supported by the National Key R&D Program of China (2018YFA0404603, 2019YFA0704900, 2020YFC1807000, and 2020YFB1313700), the Youth Innovation Promotion Association CAS (2011225, 2012006, 2013002, 2015316, 2016275, 2017017, 2017086, 2017120, 2017204, 2017300, 2017399, 2018356, 2020111, 2020179, Y201664, Y201822, and Y201911), NSFC (nos. 11971466, 12075253, 52173241, and 61902376), the Foundation of State Key Laboratory of Particle Detection and Electronics (SKLPDE-ZZ-201902), the Program of Science & Technology Service Network of CAS (KFJ-STS-QYZX-050), the Fundamental Science Center of the National Nature Science Foundation of China (nos. 52088101 and 11971466), the Scientific Instrument Developing Project of CAS (ZDKYYQ20210003), the Strategic Priority Research Program (B) of CAS (XDB33000000), the National Science Foundation of Fujian Province for Distinguished Young Scholars (2019J06023), the Key Research Program of Frontier Sciences, CAS (nos. ZDBS-LY-7022 and ZDBS-LY-DQC012), the CAS Project for Young Scientists in Basic Research (no. YSBR-005). The study is dedicated to the 10th anniversary of the Youth Innovation Promotion Association of the Chinese Academy of Sciences.

Author contributions

Y.X., Q.W., Z.A., Fei W., C.L., Z.C., J.M.T., and J.Z. conceived and designed the research. Z.A., Q.W., Fei W., Libo.Z., Y.W., F.D., and C.W.-Q. wrote the “ AI in information science ” section. Xin.L. wrote the “ AI in mathematics ” section. J.Q., K.H., W.S., J.W., H.X., Y.H., and X.C. wrote the “ AI in medical science ” section. E.L., C.F., Z.Y., and M.L. wrote the “ AI in materials science ” section. Fang W., R.R., S.D., M.V., and F.K. wrote the “ AI in geoscience ” section. C.H., Z.Z., L.Z., T.Z., J.D., J.Y., L.L., M.L., and T.H. wrote the “ AI in life sciences ” section. Z.L., S.Q., and T.A. wrote the “ AI in physics ” section. X.L., B.Z., X.H., S.C., X.L., W.Z., and J.P.L. wrote the “ AI in chemistry ” section. Y.X., Q.W., and Z.A. wrote the “Abstract,” “ introduction ,” “ history of AI ,” and “ conclusions ” sections.

Declaration of interests

The authors declare no competing interests.

Published Online: October 28, 2021

Research on Artificial Intelligence Algorithm and Its Application in Games

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Help | Advanced Search

Computer Science > Artificial Intelligence

Title: artificial intelligence (ai) in legal data mining.

Abstract: Despite the availability of vast amounts of data, legal data is often unstructured, making it difficult even for law practitioners to ingest and comprehend the same. It is important to organise the legal information in a way that is useful for practitioners and downstream automation tasks. The word ontology was used by Greek philosophers to discuss concepts of existence, being, becoming and reality. Today, scientists use this term to describe the relation between concepts, data, and entities. A great example for a working ontology was developed by Dhani and Bhatt. This ontology deals with Indian court cases on intellectual property rights (IPR) The future of legal ontologies is likely to be handled by computer experts and legal experts alike.

Submission history

Access paper:.

  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

To revisit this article, visit My Profile, then View saved stories .

  • Backchannel
  • Newsletters
  • WIRED Insider
  • WIRED Consulting

By Steven Levy

AI Is a Black Box. Anthropic Figured Out a Way to Look Inside

Light trails moving inside of black box on pedestal in front of a blue backdrop

For the past decade, AI researcher Chris Olah has been obsessed with artificial neural networks. One question in particular engaged him, and has been the center of his work, first at Google Brain, then OpenAI, and today at AI startup Anthropic, where he is a cofounder. “What's going on inside of them?” he says. “We have these systems, we don't know what's going on. It seems crazy.”

That question has become a core concern now that generative AI has become ubiquitous. Large language models like ChatGPT , Gemini , and Anthropic’s own Claude have dazzled people with their language prowess and infuriated people with their tendency to make things up . Their potential to solve previously intractable problems enchants techno-optimists. But LLMs are strangers in our midst. Even the people who build them don’t know exactly how they work, and massive effort is required to create guardrails to prevent them from churning out bias, misinformation, and even blueprints for deadly chemical weapons. If the people building the models knew what happened inside these “black boxes,'' it would be easier to make them safer.

Olah believes that we’re on the path to this. He leads an Anthropic team that has peeked inside that black box. Essentially, they are trying to reverse engineer large language models to understand why they come up with specific outputs—and, according to a paper released today, they have made significant progress.

Maybe you’ve seen neuroscience studies that interpret MRI scans to identify whether a human brain is entertaining thoughts of a plane, a teddy bear, or a clock tower. Similarly, Anthropic has plunged into the digital tangle of the neural net of its LLM, Claude, and pinpointed which combinations of its crude artificial neurons evoke specific concepts, or “features.” The company’s researchers have identified the combination of artificial neurons that signify features as disparate as burritos, semicolons in programming code, and—very much to the larger goal of the research—deadly biological weapons. Work like this has potentially huge implications for AI safety: If you can figure out where danger lurks inside an LLM, you are presumably better equipped to stop it.

I met with Olah and three of his colleagues, among 18 Anthropic researchers on the “mechanistic interpretability” team. They explain that their approach treats artificial neurons like letters of Western alphabets, which don’t usually have meaning on their own but can be strung together sequentially to have meaning. “ C doesn’t usually mean something,” says Olah. “But car does.” Interpreting neural nets by that principle involves a technique called dictionary learning, which allows you to associate a combination of neurons that, when fired in unison, evoke a specific concept, referred to as a feature.

“It’s sort of a bewildering thing,” says Josh Batson, an Anthropic research scientist. “We’ve got on the order of 17 million different concepts [in an LLM], and they don't come out labeled for our understanding. So we just go look, when did that pattern show up?”

Don’t Believe the Biggest Myth About Heat Pumps

By Matt Simon

The Earth Is About to Feast on Dead Cicadas

By Celia Ford

Neuralink’s First User Is ‘Constantly Multitasking’ With His Brain Implant

By Emily Mullin

The End of ‘iPhone’

By Carlton Reid

Last year, the team began experimenting with a tiny model that uses only a single layer of neurons. (Sophisticated LLMs have dozens of layers.) The hope was that in the simplest possible setting they could discover patterns that designate features. They ran countless experiments with no success. “We tried a whole bunch of stuff, and nothing was working. It looked like a bunch of random garbage,” says Tom Henighan, a member of Anthropic’s technical staff. Then a run dubbed “Johnny”—each experiment was assigned a random name—began associating neural patterns with concepts that appeared in its outputs.

“Chris looked at it, and he was like, ‘Holy crap. This looks great,’” says Henighan, who was stunned as well. “I looked at it, and was like, ‘Oh, wow, wait, is this working?’”

Suddenly the researchers could identify the features a group of neurons were encoding. They could peer into the black box. Henighan says he identified the first five features he looked at. One group of neurons signified Russian texts. Another was associated with mathematical functions in the Python computer language. And so on.

Once they showed they could identify features in the tiny model, the researchers set about the hairier task of decoding a full-size LLM in the wild. They used Claude Sonnet, the medium-strength version of Anthropic’s three current models. That worked, too. One feature that stuck out to them was associated with the Golden Gate Bridge. They mapped out the set of neurons that, when fired together, indicated that Claude was “thinking” about the massive structure that links San Francisco to Marin County. What’s more, when similar sets of neurons fired, they evoked subjects that were Golden Gate Bridge-adjacent: Alcatraz, California governor Gavin Newsom, and the Hitchcock movie Vertigo , which was set in San Francisco. All told the team identified millions of features—a sort of Rosetta Stone to decode Claude’s neural net. Many of the features were safety-related, including “getting close to someone for some ulterior motive,” “discussion of biological warfare,” and “villainous plots to take over the world.”

The Anthropic team then took the next step, to see if they could use that information to change Claude’s behavior. They began manipulating the neural net to augment or diminish certain concepts—a kind of AI brain surgery, with the potential to make LLMs safer and augment their power in selected areas. “Let's say we have this board of features. We turn on the model, one of them lights up, and we see, ‘Oh, it's thinking about the Golden Gate Bridge,’” says Shan Carter, an Anthropic scientist on the team. “So now, we’re thinking, what if we put a little dial on all these? And what if we turn that dial?”

So far, the answer to that question seems to be that it’s very important to turn the dial the right amount. By suppressing those features, Anthropic says, the model can produce safer computer programs and reduce bias. For instance, the team found several features that represented dangerous practices, like unsafe computer code, scam emails, and instructions for making dangerous products.

Image may contain Text Paper and Page

The opposite occurred when the team intentionally provoked those dicey combinations of neurons to fire. Claude churned out computer programs with dangerous buffer overflow bugs, scam emails, and happily offered advice on how to make weapons of destruction. If you twist the dial too much— cranking it to 11 in the Spinal Tap sense—the language model becomes obsessed with that feature. When the research team turned up the juice on the Golden Gate feature, for example, Claude constantly changed the subject to refer to that glorious span. Asked what its physical form was, the LLM responded, “I am the Golden Gate Bridge … my physical form is the iconic bridge itself.”

When the Anthropic researchers amped up a feature related to hatred and slurs to 20 times its usual value, according to the paper, “this caused Claude to alternate between racist screed and self-hatred,” unnerving even the researchers.

Given those results, I wondered whether Anthropic, intending to help make AI safer, might not be doing the opposite, providing a toolkit that could also be used to generate AI havoc. The researchers assured me that there were other, easier ways to create those problems , if a user were so inclined.

Anthropic’s team isn’t the only one working to crack open the black box of LLMs. There’s a group at DeepMind also working on the problem, run by a researcher who used to work with Olah . A team led by David Bau of Northeastern University has worked on a system to identify and edit facts within an open source LLM. The team called the system “Rome” because with a single tweak the researchers convinced the model that the Eiffel Tower was just across from the Vatican, and a few blocks away from the Colosseum. Olah says that he’s encouraged that more people are working on the problem, using a variety of techniques. “It’s gone from being an idea that two and a half years ago we were thinking about and were quite worried about, to now being a decent-sized community that is trying to push on this idea.”

The Anthropic researchers did not want to remark on OpenAI’s disbanding its own major safety research initiative , and the remarks by team co-lead Jan Leike, who said that the group had been “sailing against the wind,” unable to get sufficient computer power. (OpenAI has since reiterated that it is committed to safety.) In contrast, Anthropic’s Dictionary team says that their considerable compute requirements were met without resistance by the company’s leaders. “It’s not cheap,” adds Olah.

Anthropic’s work is only a start. When I asked the researchers whether they were claiming to have solved the black box problem, their response was an instant and unanimous no. And there are a lot of limitations to the discoveries announced today. For instance, the techniques they use to identify features in Claude won’t necessarily help decode other large language models. Northeastern’s Bau says that he’s excited by the Anthropic team’s work; among other things their success in manipulating the model “is an excellent sign they’re finding meaningful features.”

But Bau says his enthusiasm is tempered by some of the approach’s limitations. Dictionary learning can’t identify anywhere close to all the concepts an LLM considers, he says, because in order to identify a feature you have to be looking for it. So the picture is bound to be incomplete, though Anthropic says that bigger dictionaries might mitigate this.

Still, Anthropic’s work seems to have put a crack in the black box. And that’s when the light comes in.

You Might Also Like …

In your inbox: Will Knight's Fast Forward explores advances in AI

Indian voters are being bombarded with millions of deepfakes

They bought tablets in prison —and found a broken promise

The one thing that’s holding back the heat pump

It's always sunny: Here are the best sunglasses for every adventure

research paper on artificial intelligence algorithms

Reece Rogers

Google Thinks It Can Cash In on Generative AI. Microsoft Already Has

Paresh Dave

Meta’s Open Source Llama 3 Is Already Nipping at OpenAI’s Heels

Will Knight

Net Neutrality Returns to a Very Different Internet

Dell Cameron

A Lawsuit Argues Meta Is Required by Law to Let You Control Your Own Feed

Vittoria Elliott

Musi Won Over Millions. Is the Free Music Streaming App Too Good to Be True?

Kate Knibbs

Generative AI Doesn’t Make Hardware Less Hard

Lauren Goode

It’s Time to Believe the AI Hype

Steven Levy

Digital Commons@Lindenwood University

  • < Previous

Home > RESEARCH_SCHOLARSHIP_RESOURCES > RS > FACULTY-RESEARCH-PAPERS > 540

Faculty Scholarship

Essence as algorithm: public perceptions of ai-powered avatars of real people.

James Hutson , Lindenwood University Follow Jay Ratican , Lindenwood University Follow Colleen Biri , Lindenwood University Follow

Document Type

Publication title.

Journal of Artificial Intelligence and Robotics

This paper investigates the intersection of generative AI, Large Language Models (LLM), and robotics. Exemplified by systems like ChatGPT and technological marvels such as Ameca the Robot, the combination of technologies will allow humans to transcend the limitations of death. Through digital necromancy, a practice encompassing the technological resurrection of deceased individuals, the ability to not only passively see recordings of loved ones but to interact with them is made possible, leading to ethical and psychological considerations. Therefore, examining these trends extends into the motives underlying engagement with both incorporeal and corporeal reproductions of individuals, with reasons ranging from memory conservation to the attainment of emotional closure. In order to further research in this area, results from a survey are presented, offering a detailed portrayal of prevailing societal perspectives on AI-powered avatars. These insights shed light on the multifaceted interplay between technology and human emotion, the market dynamics propelling this emerging field, and the anticipatory understanding necessary to confront future ethical and functional challenges. The research contributes significantly to the ongoing discourse on the role of AI in society, underscoring the necessity of a balanced approach to innovation and ethics in the domain of AI-driven human representation as integration into society becomes standardized.

Publication Date

Creative commons license.

Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License

Recommended Citation

Hutson, James; Ratican, Jay; and Biri, Colleen, "Essence as Algorithm: Public Perceptions of AI-Powered Avatars of Real People" (2023). Faculty Scholarship . 540. https://digitalcommons.lindenwood.edu/faculty-research-papers/540

Since January 12, 2024

Included in

Artificial Intelligence and Robotics Commons

  • Collections
  • Disciplines

Advanced Search

  • Notify me via email or RSS

Author Corner

  • Submit Research

Home | About | FAQ | My Account | Accessibility Statement

Privacy Copyright

research paper on artificial intelligence algorithms

New Anthropic Research Sheds Light on AI's 'Black Box'

D espite the fact that they’re created by humans, large language models are still quite mysterious. The high-octane algorithms that power our current artificial intelligence boom have a way of doing things that aren’t outwardly explicable to the people observing them. This is why AI has largely been dubbed a “black box,” a phenomenon that isn’t easily understood from the outside.

Newly published research from Anthropic, one of the top companies in the AI industry, attempts to shed some light on the more confounding aspects of AI’s algorithmic behavior. On Tuesday, Anthropic published a research paper designed to explain why its AI chatbot, Claude, chooses to generate content about certain subjects over others.

AI systems are set up in a rough approximation of the human brain—layered neural networks that intake and process information and then make “decisions” or predictions based on that information. Such systems are “trained” on large subsets of data, which allows them to make algorithmic connections. When AI systems output data based on their training, however, human observers don’t always know how the algorithm arrived at that output.

This mystery has given rise to the field of AI “interpretation ,” where researchers attempt to trace the path of the machine’s decision-making so they can understand its output. In the field of AI interpretation, a “feature” refers to a pattern of activated “neurons” within a neural net—effectively a concept that the algorithm may refer back to. The more “features” within a neural net that researchers can understand, the more they can understand how certain inputs trigger the net to affect certain outputs.

In a memo on its findings, Anthropic researchers explain how they used a process known as “dictionary learning” to decipher what parts of Claude’s neural network mapped to specific concepts. Using this method, researchers say they were able to “begin to understand model behavior by seeing which features respond to a particular input, thus giving us insight into the model’s ‘reasoning’ for how it arrived at a given response.”

In an interview with Anthropic’s research team conducted by Wired’s Steven Levy , staffers explained what it was like to decipher how Claude’s “brain” works. Once they had figured out how to decrypt one feature, it led to others:

One feature that stuck out to them was associated with the Golden Gate Bridge. They mapped out the set of neurons that, when fired together, indicated that Claude was “thinking” about the massive structure that links San Francisco to Marin County. What’s more, when similar sets of neurons fired, they evoked subjects that were Golden Gate Bridge-adjacent: Alcatraz, California Governor Gavin Newsom, and the Hitchcock movie Vertigo, which was set in San Francisco. All told the team identified millions of features—a sort of Rosetta Stone to decode Claude’s neural net.

It should be noted that Anthropic, like other for-profit companies, could have certain, business-related motivations for writing and publishing its research in the way that it has. That said, the team’s paper is public , which means that you can go read it for yourself and make your own conclusions about their findings and methodologies.

For the latest news, Facebook , Twitter and Instagram .

Photo: Andrej Sokolow/picture alliance (Getty Images)

Artificial Intelligence Techniques for the Photovoltaic System: A Systematic Review and Analysis for Evaluation and Benchmarking

  • Review article
  • Open access
  • Published: 08 May 2024

Cite this article

You have full access to this open access article

research paper on artificial intelligence algorithms

  • Abhishek Kumar 1 , 2 ,
  • Ashutosh Kumar Dubey 1 , 3 ,
  • Isaac Segovia Ramírez   ORCID: orcid.org/0000-0001-6429-8617 1 ,
  • Alba Muñoz del Río 1 &
  • Fausto Pedro García Márquez   ORCID: orcid.org/0000-0002-9245-440X 1  

439 Accesses

2 Altmetric

Explore all metrics

Novel algorithms and techniques are being developed for design, forecasting and maintenance in photovoltaic due to high computational costs and volume of data. Machine Learning, artificial intelligence techniques and algorithms provide automated, intelligent and history-based solutions for complex scenarios. This paper aims to identify through a systematic review and analysis the role of artificial intelligence algorithms in photovoltaic systems analysis and control. The main novelty of this work is the exploration of methodological insights in three different ways. The first approach is to investigate the applicability of artificial intelligence techniques in photovoltaic systems. The second approach is the computational study and analysis of data operations, failure predictors, maintenance assessment, safety response, photovoltaic installation issues, intelligent monitoring etc. All these factors are discussed along with the results after applying the artificial intelligence techniques on photovoltaic systems, exploring the challenges and limitations considering a wide variety of latest related manuscripts.

Similar content being viewed by others

research paper on artificial intelligence algorithms

AI-Based Modeling: Techniques, Applications and Research Issues Towards Automation, Intelligent and Smart Systems

research paper on artificial intelligence algorithms

Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective

research paper on artificial intelligence algorithms

Artificial intelligence for decision support systems in the field of operations research: review and future scope of research

Avoid common mistakes on your manuscript.

1 Introduction

Global economic expansion is increasing the power market demand, causing a negative influence on the environment. The wholesale price of electricity has become an important aspect of the energy sector. Electricity is mainly traded in auctions known as power exchanges or pools, where electricity-generating companies provide energy together with pricing rates that can be bid on by essential consumers. Solar photovoltaic (PV) emerges as an alternative energy capable of meeting a greater percentage of global energy needs. Germany has developed by 20% for electricity generation and Japan is generating more PV power in the world, and 6.5% of global PV generation comes from this country. PV has become more cost-effective and the development of inorganic PV material aids in the efficient production of next-generation solar cells [ 1 , 2 ]. As a result, operation and maintenance costs (O&M) have a critical impact on electrical and power module profitability margins, because energy market participants must forecast solar power in the medium and long term [ 3 , 4 ].

The electricity price forecasting has become an important aspect of the energy sector [ 5 ], and Machine Learning (ML) Artificial intelligence (AI) algorithms approaches are widely implemented to recognise different patterns [ 6 , 7 , 8 ]. Despite the significant progress made using AI for PV generation, different challenges must be addressed to be resolved by future research focused on promising techniques based on AI, e.g., Explainable Artificial Intelligence (XAI) and novel hybrid techniques that reduce the weaknesses of current ML techniques to afford new PV operational scenarios required to reach the competitiveness in the energy sector [ 9 , 10 ]. Different techniques for PV generation must work in a dynamic parameter environment, implying that agents cannot have complete awareness of the parameters and environment [ 11 , 12 ]. Some research in the literature reviewed the issue of software failures, that may lead to inaccurate approximations [ 13 , 14 , 15 ]. The use of partial quantitative measurements and inconsistent information on the solar production environment in AI models can allow agents to operate in a variety of situations and restrictions, both at the users and the service provider levels [ 16 ]. The reviewed research literature demonstrates that several models have been employed for solar generating, with the benefits and drawbacks explored [ 1 , 17 , 18 ], demonstrating that there is a need to establish models and findings that are generally applicable to a wide variety of scenario. It is also required that the results are accurate, being the lack of modelling details a key problem across a wide part of the reviewed literature [ 19 ]. The accuracy of estimating PV system performance is constrained through the use of configuration models [ 20 , 21 ], e.g., the Multiple Linear Regression (MLR) model is less effective than Artificial Neural Networks (ANN) in detecting hidden layers [ 22 ]. The performance of models has been affected by the dynamic parameters used for checking the efficiency of PV generation, showing low performance that affects the validity of this research study [ 23 ]. There is growing interest in identifying AI and ML to integrate real-time responses from other power channels interfacing with the power system, and this is considered to play a significant role in the future [ 24 ]. The adoption of more multi-agent frameworks, where agents can function in a simulated, stochastic environment while also relaxing assumptions about the preferences and expectations of participating entities, could be a promising approach for studies [ 25 ].

The fundamental objective of this study is to develop a new framework that can stimulate human-machine collaboration to estimate daily power prices in the wholesale market [ 26 , 27 ]. The main novelties of this study are as follows:

Analysis of different frameworks : The comparison of the power production using different PV cells is understood with the help of the review of different research papers. This section proposes a novel analysis of dynamic parameters used for checking the efficiency of PV generation that have a relevant influence on the performance of the models. The objective is to demonstrate that poor model performance affects the validity of research studies.

Analysis of various models with different parameters in PV generation : Several models are used in the field of electricity forecasting, but these models have their advances and limitations. The objective of this study is to improve the traditional technologies in terms of accuracy and precision of prediction.

Analysis of the state of the art : Several researchers have proposed different models for the same purposes. Recent research work done in electricity forecasting and the PV generation domain has been reviewed and analysed to select the best model and find novel approaches [ 28 , 29 , 30 ]. One of the most relevant novelties of the paper is an analysis of the state of the art performed in three different phases, that has not been found in the current literature:

Review-based analysis : The most recent studies in chronological order have been studied to analyze the main areas of the global energy industry under current investigation. The objective is to detect main trends and novel research lines.

Method-based analysis : The volume and variety of new studies and methods have grown exponentially in recent years, compared to traditional technologies. This phase analyzes new optimized and hybrid models to detect the most relevant and fastest-growing techniques.

Result-based analysis : The research outcomes achieved by the researchers according to the models they used, have been analysed in this phase.

This paper is structured as follows: Sect. 2 develops an analysis of the current state of the art based on different reviews, methodologies and results. Section 3 presents the overall discussion of the research, where all the applied methodologies and algorithms are summarized to extract relevant data about the state of the art. Finally, the main conclusions are provided in Sect. 4.

2 Related Work

2.1 review based analysis.

A new deep learning model is used for the enhancement of visual ability in PV generation [ 31 ]. The improvement of accuracy is the primary objective of the mainly with real-time data analysis [ 32 , 33 ]. Figure  1 shows the basic process of the PV generation process. The radiant energy is absorbed by the PV panel or grid of PV panels [ 34 ]. This radiant energy is then converted into DC power and then it is passed to the power electronic converters [ 32 , 35 ]. These converters convert the DC power into the grid frequency power [ 36 ].

figure 1

General diagram of PV generation

Camargo and Schmidt [ 37 ] presented a survey to collect data from installations in Chile. The proposed hypothesis of the research focused on four data sets from PV installation. The multi-annual time series was simulated in the research, using forecasting and neighbour methods. It was demonstrated that deseasonalized aggregated data used to have a better correlation. The older MERRA-2 reanalysis dataset is the most critical limitation and the global re-analysis of the data set is one of the main advantages of this research. Guo et al. [ 38 ] suggested that PV plan installation is the most reliable and stable plan worldwide, and for this case study, solar power is used to limit the plan for PV generation. A prediction model algorithm is used in this research, analysing a short-term PV model. Boosting and bagging approaches were applied together with meteorological parameters and historical data as the main dataset. It is indicated that the cutting-edge plan of AI has predicted a new PV model. The single model was applied to act as a limitation in this context.

The primary objective of the study presented by Ogawa, and Mori [ 39 ] is to identify PV forecasting with an efficient method as a forecasting plan and a Multi-layer Perceptron (MLP) neural network. Two approaches based on ANNs and a statistical approach were implemented. The used dataset was distributed, and the error was normalised as a result of the MLP-based method. Under this PV plan, there is an issue or limitation such as Economic Load Dispatching (ELD) and Unit Commitment (UC). A Deep Neural Network (DNN) based method has been proposed by Dorokhova et al. [ 40 ], determining that PV generation has reduced the energy crisis and carbon emission plan positively. The control method, model predictive control and a reinforcement learning control data-driven approach were applied to a 2-year dataset. The electric vehicle charging problem was reduced by using reinforcement learning control, being confirmed that maximisation of PV consumption is highly required.

The primary objective of the research developed by Beltrán et al. [ 41 ] is to promote a novel framework to encourage the collaboration of human machines to forecast the price of electricity daily in the wholesale market. Model agnostic, time series analysis and time decomposition, STL, Holt-Winters, Bates and Granger method, Aiolfi and Timmermann method were applied to develop different approaches: loop approach, non-linear approach, and robust approach. Data stories depend on a model-agnostic method. The proposed framework was limited in grasping the bounds of point forecasts by marking the point intervals although it provided satisfactory results by increasing the computational power. Long Short-Term Memory (LSTM) framework was proposed to control the fault of ML algorithms, that are applied only to a huge volume of information [ 42 ]. Two-stage hybrid method, wrapper method, ML, deep learning, statistical methods and other forecasting approaches were applied. The robustness of LSTM was comparatively better than normal LSTM, demonstrating that a single framework is unable to predict PVPG accurately due to the bounds of the stand-alone process. Operators can solve time series forecasting issues due to the proper utilization of LSTM. PV generation is an effective solution that enables people to overcome the current energy challenges [ 36 , 43 ]. Representing a relevant solar irradiation predictor that combines the benefits of ML and the optimization of temporal and spatial parameters is a survey objective of Rodríguez et al. [ 31 ]. ML and statistical methods, single model and AI approach were the main models. The accuracy of the results of feed-forward neural networks (FFNN) was better than the persistence model. The major limitation is the high training requirements for FFNN and spatial-temporal model but it is demonstrated that the solar irradiation predictor was able to accurately measure certain changes.

Dynamic power flow, regression, and two core-casting methods were chosen together with the DL approach with time sequence and time-series data. Distribution losses have been found while using secondary and primary LSTM systems. The major drawback of this proposed mechanism is the absence of voltage collapse mitigation and limit violation according to Yin et al. [ 44 ]. Visser et al. [ 45 ] proposed the performance analysis of 12 different models for forecasting the day-ahead power production with an agreement to the market conditions. The objective of the research was to examine the effect of multiple PV systems with a variation in the inter-system distance on the performance of forecasted models. The method of clean data has been used to analyse 12 different models with their pre and post-processing phases. The direct and indirect approach has been chosen to analyze the PV model, regression and ML-based models. The dataset is formed by PV power output from 152 PV systems located in the Netherlands. The result of the research showed that density and spatial factors presented a positive effect on the performance of the models. The limitation of the research was related to the testing conditions as many cases are of a single site or single aggression of several systems. The advantage of the research was related about the managers of the power grid will be able to minimise the grid imbalances with the application of the models.

The research developed by Gowid and Massoud [ 46 ] designed a tool to identify robust and practical PV Maximum Power Point (MPP) with the use of reliable experimental data. They performed a comparative study of input scenarios and developed a novel tool, being necessary to develop a correlation between the current at the maximum power point \( {I}_{MP}\) and the maximum power voltage \( {V}_{MP}\) with PV electrical, thermal, and meteorological variables. The MPP approach was implemented to conduct the research, and the results displayed a decrease of 74.3% in MSE of \( {V}_{MP}\) and a reduction by 95% in the MSE of \( {I}_{MP}\) . The parameter combination scenarios were restricted to maximize the accuracy in the research, although this tool will help to maximise the power output. A novel research was developed by Tian et al. [ 47 ] to analyse the feasibility and necessity of connection of PV power generation introduction into rail transit power supply. The authors applied the indirect forecasting method for forecasting the load of PV power generation. The LSTM neural network approach has been adopted in the research and the data of the research has been collected from a PV power station every 10 min for 207 days. The LSTM neural network results presented elevated levels of fitting and high prediction accuracy. There is no research limitation in the paper and the benefit of the research is that the capability of the LSTM neural network is well known. The application of ML and data-driven techniques at the level of monitoring, controlling, optimisation and fault detection of power generation systems was applied by Sun and You [ 48 ]. The main objective is the analysis of ML and data-driven techniques in terms of flexibility and profitability. ML and data-driven methods were used together with FT, Bayesian nonparametric and Markov chain Monte Carlo (MCMC) approaches. The results showed that big data ML-based methods of regression seemed to be more powerful in characterising nonlinear multivariable systems, although this control method had a limitation of dealing with high uncertainty and nonlinearity. The research will benefit in showing the ML-based data in control, visibility, profitability, and safety during power generation. These factors are essential to ensure a proper data analytics process, and Fig.  2 illustrates how much these factors can be achieved through the ML techniques applied in this section, showing a basic comparison of ML usage in the aspects of visibility, manoeuvrability, flexibility, profitability at intervals of significance of these features.

figure 2

Comparison of various ML factors in terms of usage and accuracy

PV forecasting is another relevant research topic. The research developed by Behera et al. [ 49 ] analysed short-term PV power forecasting by the Empirical Mode Decomposition (EMD) technique. The objective of the research is to construct a 3-stage approach and analyse the PV power forecasting with the constructed approach. Statistical methods were used to analyse the historical solar irradiance data and the 3-stage approach of the EMD technique was implemented. The result of the analysis of the collected data showed that EMD techniques have performed better than the conventional technique with the address to short-term PV power forecasting. High reliability, optimal placement, acceptable power quality and low-cost operation are a few factors that are the limitations of the application of solar power. The background of the research is based on the implementation of the ML network for yielding promising solutions in pattern recognition and low estimation bias. According to Khodayar et al. [ 50 ], the objective set for the study is to discriminative deep models to estimate future solar energy as per the historical measurements. The statistical method of generative variation based on autoencoders (AE) was used to estimate the disaggregated BTM signals. The Pecan Street dataset was used to carry out the research, and the numerical results of the real-world Event Detection (ED) dataset showed significant improvements in ED accuracy to sparse coding approaches. There are problems of energy disaggregation in PV penetration in the sparse coding approaches but the application of multiple deep ANN can increase the prediction accuracy.

The development of Floating Solar PV (FSPV) systems is in the nascent stages of long-term performance and its feasibility was not effectively addressed by Goswami et al. [ 51 ]. The analytical method used for the mathematical modelling of the PV cell to develop a computational method was based on evolutionary algorithms for the assessment of FSPV. The numerical approach was used in the research for the determination of parameters although there was no dataset applicable. The proposed model applied on 100 FSPV modules showed superiority in the estimation of its parameters and the efficiency of the model decreased under low irradiance. The research findings will help in the sound judgments of the scientists with the implementation of the FSPV systems. The discovery of the inorganic PV material helps in the effective ways of generating new-age solar cells [ 52 ]. Predicting the efficiency of inorganic PV materials with relevant ML is the primary objective of the survey. The use of ML method was used along with density functional calculations in material discoveries. The ML approach was issued for the study to recognise the patterns between materials using three datasets as the input, training, and production datasets. The results showed that the ML method was an effective method for fast atomic level prediction of PV materials [ 52 ]. The author did not point out any disadvantages related to the study. The application of ML helps in the accuracy of prediction regarding the PV materials of different crystal structures. The use of solar energy is used by countries to decrease ecological hazards. The Root Mean Squared Error (RMSE), mean absolute error and R 2 model were used to explore the relationship between numerous input parameters and solar PV using ML models [ 9 ]. ML and predictive approaches have been used for the research, using experimental datasets. The results showed that the proposed ML approach was accurate at predicting the power of different solar PV panels. The effect of dust and wind on solar PV is still incomplete in terms of experiments, but the Support Vector Machine (SVM) and the Gaussian Process Regression (GPR) models enhanced the forecasting of solar PV power.

The lack of weather information affects the theoretical power calculations and focuses on the whole power station rather than the arrays, according to Liang et al. [ 53 ]. A data-driven method was used along with the Extreme Learning Machine (ELM) model to solve the problem through the status evaluation method of arrays. Six arrays were selected on a random basis from the large-scale power station in China and a statistical approach was implemented for the evaluation of the data. The results demonstrated that the status assessment accuracy was 90%, confirming its effectiveness. The main contribution of this study is the identification of the proposed method to reflect the status of PV arrays, although the evaluation of the degradation of the PV arrays was not possible through the current method. The primary objective of the analysis carried out by Kalogirou and Sencan [ 54 ] is to analyse the international PV market and forecast the market demands and developments in 2010. The basic methods used in this research calculated manufacturing costs, identified production efficiencies, and then forecasted the PV world market. A profitable approach and single crystal silicon module were implemented in this research. The dataset for this research is formed by datasets of the PV world market, PV production efficiencies and manufacturing PV costs dataset. Due to the practical limit of advanced technology, forecasted data on the development and market demand of PV may not be accurate. The focus of the research is to examine the possibility of utilising AI in forecasting the production of PV energy using Auto-Regressive Moving Averages and regression methods. Energy production management, moving averages, data-driven modelling, and classical approaches have been implemented by Maycock [ 55 ] to forecast the production of electricity. The initial Production Dataset (PD and SI) was inspected in this research. The simulation result showed that MFNN can be effectively used by operators to predict the energy production by the unit of PV production. Lack of better precision determines an enhanced complexity of the ANN model resulting in decreasing the lesser accuracy value in predicted data.

Hossain et al. [ 56 ] presented a forecasting algorithm with an LSTM ANN to predict PV power generation. This work applied a statistical analytical research method with three years of (2016–2018) based simulation. The implementation of synthetic irradiance forecast can improve by up to 33% in accuracy compared to using the hourly categorical type of sky forecast. The superiority of the LSTM ANN with the proposed features was tested by exploring other machine learning algorithms, such as the recurrent neural network, the Generalised Regression Neural Network (GRNN) and the ELM. Almomani et al. [ 57 ]. presented a method for modelling PV arrays based on AI techniques, specifically Genetic Algorithm (GA) and Cuckoo Optimisation Algorithm (COA). The adopted models using GA and COA were implemented in a simulation platform using MATLAB environment for two-diode and two-diode and single-diode models. The obtained models were tested and validated with experimental data taken from the PV power plant at Mutah University. The results showed that for both optimisation algorithms, the two-diode model was more accurate than the single-diode model. The results also revealed that, at different values of temperature and solar irradiance, the COA handled the optimisation problem better with low iterations and better fitness value compared to the GA.

Solar energy is extremely dependent on climate and geography, and fluctuates irregularly, making the integration of PV into power networks problematic. Ahmed et al. [ 28 ] reviewed and evaluated contemporary forecasting techniques. Input correlation analysis revealed that solar irradiance is most correlated with PV production, and for this reason, meteorological classification and cloud movement studies are crucial. Normalisation, wavelet transforms, and augmentation by a generative adversarial network were recommended for network training and forecasting. The authors discussed the established performance evaluation metrics Mean Absolute Error (MAE), RMSE and mean absolute percentage error, with suggestions for including economic utility metrics of economic utility.

Different optimization and control techniques were widely proposed to address four issues including intermittent power supply, low conversion efficiency, the nonlinearity of PV system, and high fabrication costs [ 10 ]. Institutive method, ML methods (supervised and unsupervised learning methods), ANN forecasting method, simulated Annealing, SVM, and Harmony Search approach were applied to a large dataset of solar radiation. SVM was the best fit for reducing the RMSE and MAE values. The calculated RMSE value was 12.41% and the MAE value was 6.95% after using the SVM technique. Weather prediction in a geographic region may not be accurate because it shows a minimum error. The main objective of this research developed by Das et al. [ 58 ] is to inspect the progress of solar technology development and explain the innovative path. The Wille method and hierarchical clustering method were applied together with silhouette validation, and paragraph vectors doc2vec and word2vec, together with ML approach. The experimental dataset demonstrated the impacts of changing parameters on the system of storage and limitations in installing and purchasing solar power. Solar power technology may reach competitiveness in the global marketplace following this research.

Kaliappan et al. [ 59 ] used ANNs to predict the reduction in energy consumption of buildings that would result from installing a new photovoltaic system. This research studies the efficiency of the Elman Neural Network (EN) method, FFNN, and GRNN. The findings of this paper showed that forecasters using ANNs improve accuracy when employing the previous methods (EN-FFNN-GRN). A review of the AI-based Maximum Power Point Tracking (MPPT) in the solar power system was carried out by Yap et al. [ 60 ]. Since conventional MPPT techniques are unable to track the global maximum power point under Partial Shading Conditions (PSC), it is necessary to introduce artificial intelligence techniques to enhance this method [ 34 , 61 ]. A model-based approach has been used to review the specific MPPT open-circuit voltage. Irradiance data has been employed to improve the accuracy of the MPPT prediction. The results demonstrated that all AI-based MPPT techniques showed faster convergence speed, lower steady-state oscillation and higher efficiency compared to conventional MPPT techniques, although the computational cost was also higher. This work provided a detailed comparison of popular highways MPPT techniques for solar power systems. Zhang et al. [ 62 ] applied Deep Convolutional Neural Networks (DCNN) with high-resolution weather forecast data to analyse the cloud movement pattern and its effect on solar power generation forecasting for solar farms. In this research work, the error rate is significantly reduced in comparison with other methods, e.g., the persistent model, the Support Vector Regression (SVR) model and convolutional neural networks. Therefore, it is exposed that solar forecasting DL outperforms sophisticated physical models.

Ai-Habahbeh et al. [ 63 ] used Auto Regressive Moving Average (ARMA) and Autoregressive Integrated Moving Average (ARIMA) methods to achieve a large improvement in PV performance. The parameters of the CNN participation model also optimised the performance. The study aimed to build a suitable model and test it using a simulation method. The result of the study showed a strong influence of temperature and the amount of sunlight, being the main advantage of this approach the reduction and elimination of waiting time. In addition, energy generation was discussed as a vital outcome that is essential for climate change. Digital twinning is a novel technique that provides a virtual representation of a real-world physical system or process (a physical twin) that functions as an indistinguishable digital counterpart for practical purposes such as simulation. The research work developed by Mazhar et al. [ 64 ] provided a systematic review of the integration of Big Data, ML and AI techniques with digital twinning. This paper focused on the role of big data and AI-ML in the creation of Digital Twins (DT) or DT-based systems for various industrial applications. The final section highlights the research potential of the IA-ML for digital twinning, mentioning different challenges and current opportunities.

Different ways for the operation of smart amplification building systems were presented by Hussin et al. [ 65 ]. The authors maximised the use of solar energy and improved the transformation efficiency by using two fundamental MPPT methods, e.g., incremental conductance method and disturbance observation method. According to the obtained results, the MPPT algorithm was used to control the voltage level and restrain the current of the photovoltaic cells at the minimum value, and the actual conversion power was limited affecting the output power of the PV cells. It is concluded that solar energy is cost-effective for commercial purposes due to the reduction of the overall costs of electricity generation and consumption. Gligor et al. [ 66 ] applied a Multi-Layer Feedforward Neural Network (MLFFNN) to forecast PV production. Different configurations of the MLFFNN were compared to forecast one-day production based on a one-day versus a ten-day regression window. The accurate results determined that this particular type of ANN was suitable for the forecasting of energy production generated by PV modules, obtaining better precision in the prediction with this regression window.

The aim of Kihlström and Elbe [ 67 ] was to acquire conclusions about PV technology in the market through market interventions over 40 years. A systematic literature review of peer-reviewed studies on PV technology and market interventions from 1979 to 2019 was provided. Solar PV technology has faced several financial and structural market barriers, including stable governmental market interventions. This article predicts that PV can be an “attractive energy alternative” in the future and a core technology that can develop the specific segment in a solar PV system. The aim of the study performed by Tived [ 13 ] is to obtain a review of AI in the solar PV value chain, its current application and future perspectives. Systematic search and statistical combination of quantitative study have been used with a hybrid approach using SVR, Particle Swarm Optimization (PSO) and SVM. The data was presented as a time series, applying different evolutionary algorithms, e.g., PSO and GA. The potential of AI systems and applications promoted the objectives of the European Green Deal, being illustrated through numerous scenarios that direct and indirect adverse have a relevant impact on the environment. The authors also concluded that environmental policies and regulations should be focused on creating a better regulatory vision. Gailhofer et al. [ 30 ] discussed specific funding needs for research and development since it is improbable that AI applications can be developed to support government activities related to monitoring, planning or infrastructure without public funding.

The work of Kalaiarasi et al. [ 68 ] applied MPPT to track the maximum available power of solar PV systems. A comparative study of various AI-based MPPT techniques, e.g., Fuzzy, ANN and Adaptive Neuro Fuzzy Inference System (ANFIS), was carried out. The output of the PV module interfaces with the resistive load through the Z-source inverter that boosts the input voltage and provides an AC output voltage. This configuration is implemented to find out the experimental remark regarding the PV array. Under the context of the method, modelling PV has been used by Ghannam et al. [ 69 ]. A novel approach of AI combined with cell dataset information was implemented. Optimisation algorithms have been developed through the diode model, being the boundary limits of the segment as the main limitations while maximum valid stimulation is the main advantage. Climate perturbations are increasingly unpredictable and powerful, being required revisions of the global horizontal irradiation and direct normal irradiation calculation models. Othman et al. [ 70 ] used DL as a tool to forecast the behaviour of the production of any geographical site. Different databases from NASA and the Tunisian National Institute of Meteorology were deployed for the climatic parameters of the study region of El Akarit, Gabes, Tunisia to achieve this objective. The use of DL algorithms validated previously made estimates of the energy potential in the studied region. Different techniques were applied and compared, obtaining an overall accuracy rate of around 75%.

A real grid-connected Seawater Desalination Plant (SWDP) in Egypt is implemented using MATLAB/SIMULINK by Ebrahim et al. [ 71 ]. The developed power plant consists of a PV array, a DC/AC converter, the load and the grid. Three MPPT controllers were explored in this study to improve the dynamic performance of the proposed grid-powered SWDP to address the low conversion efficiency of the PV system. Incremental conductance along with three artificial optimisation techniques (PSO, grey wolf optimisation and Harris hawk optimisation) were developed for the dynamic performance evaluation of the presented PV-powered SWDP. The results obtained from the three methods were promising in extracting maximum power with minimum error from the PV system and improving the performance of the SWDP The objective of Khan et al. [ 72 ] is the revision of the AI-based nonlinear integral back-stepping control approach for maximum power extraction of stand-alone PV system using buck-boost converter. The results from the simulation showed that the integral back-stepping technique outperforms the conventional Perturb and Observe (P&O) MPPT technique in all scenarios under varying load and environmental conditions with faults and uncertainty in terms of fast reaction time and minimum tracking error. Kahdka et al. [ 73 ] proposed an ML-based decision-making model scheme to optimise the cleaning interventions of PV modules. This paper analyses different PV panel cleaning practices for different types of PV cells and different cooling methods, identifying the main parameters on which to base cleaning optimisation. The researchers had to face challenges in data selection and data processing that may generate limitations. This study also provided a brief understanding of the current cleaning system of PV solar panels and a future perspective of the cleaning system.

Understanding the underlying inner workings of an AI-based forecasting model can provide insight into the field of application. This knowledge enables us to improve solar PV forecasting models and highlight relevant parameters. XAI is an emerging field of research in the smart grid domain and it helps to understand why the AI system made a forecasting decision. The work of Kuzlu et al. [ 74 ] presented several use cases of solar PV forecasting using XAI tools, such as Local Interpretable Model-Agnostic Explanations (LIME), Shapely Additive exPlanations (SHAP) and ELI5, that can contribute to the adoption of XAI tools for smart grid applications. An understanding of automation detection is provided by Sarp et al. [ 75 ] using explainable XAI in calculating the input and output of solar PV power. The data is tested and trained with an explanatory approach, that uses the calculation of the RMSE metric as an evaluation method. The main conclusion is that the XAI model helps understand the current advantages of PV with its use. The research paper developed by Esen et al. [ 76 ] analysed the influence of timestamps, forecast horizons, input correlation analysis, data pre- and post-processing, meteorological classification, grid optimisation, uncertainty quantification and performance evaluations in PV production analysis. The study focused on the introduction of solar and radiation simulators for PV research. The prototype solar LED system used in this research was examined using numerical formulas, in addition, emphasis is placed on PV tests based on a statistical approach. The tests were carried out with results below 2%, being considered as Class A. This research helps to understand that the use of solar simulators requiring few LEDs may occur in the future.

The efficiency of solar cells decreases with the increased temperature. The study objective was developed by Benghanem and Almohammedi in one of the hottest places in the world to reduce the temperature of the solar cells and improve their efficiency [ 77 ]. In this research, the Thermoelectric Module (TEM) was used for the analysis of performance. The results of the study showed that the efficiency of the solar panels dropped by 0,5% with a per degree rise in Celsius. The best performance of PV/TEM is only applicable in hot areas. The study helped in the suggestion of the hybrid PV/TEM for future use. Organic PV (OPV) cells are based on the concept of polymer-fullerene bulk hetero junction and are interesting due to their low cost and flexibility [ 78 ]. The objective of the study was to optimise the tandem structure composed of OPV cells based on the blend materials of P3HT through a transfer matrix simulation approach, various software and simulations. Optical Thickness data of OPV cells have been collected for the software simulation and the two wavelengths showed a good distribution of 500 nm and 727 nm. It is concluded that the model helps in the application of the OPV cells. To make a simulation on strained GaAsxP1-x and tensile train GaNyAsxP1-x-y in the quantum zones that are well active to get it inserted in the solar cells. Chenini et al. [ 29 ] used the band anticrossing model to explain the dependencies of lower E- and lower E + sub-band energies of the nitrogen concentration. The donor-acceptor approach has been used in this study for preparing low bandgap polymers concluding that the introduction of arsenic into the host material leads to the reduction of the bandgap energy. The limitation of the study was based on semiconductors although this study will help in increasing the absorption and redshift in the corresponding wavelength.

OPV cells are considered third-generation solar cells featuring new materials, such as organic polymer and tandem solar cells. The aim of the study developed by Benghanem and Almohammedi [ 77 ] is to analyse various materials, devices and fabrication techniques for OPV cells. Several roll-to-roll techniques have been used for the design of OPV cells and the slot die coating approach was used for coating the materials. The result of the study showed an increment of 8% efficiency for manufactured PV panels with special conditions and techniques. The limitation of the study is the lack of accuracy to demonstrate this efficiency.

2.2 Summary Review-Based Analysis

The summary review-based analysis is shown in Table 1 . The comparison of energy production using different PV cells is understood with the help of the review conducted in this paper. DL has been demonstrated to be very useful for recognising the PV power generation pattern, and the Physic Constrained-LSTM model boosts in superior prediction performance of solar PV cells in temperature prediction accuracy. Conventional techniques e.g., PSO, ARMA or ARIMA, are being outperformed but advanced techniques based on AI due to their suitability and high accuracy in different scenarios and case studies presented by several authors. It is important to highlight the applicability of ELM and k-means due to high accuracies around approximately 90%, being the ML Matern 5/2 GPR method one the most optimal methods in performance for solar PV power prediction. XAI has proven to be one of the most relevant techniques and the number of research papers that have applied these techniques has grown in the last years. XAI has been proposed as a relevant solution for PV forecasting due to high performance, scalability or reduction of errors, among others.

2.3 Method Based Analysis

The method-based analysis compares different studies and methods, their performance, advantages and disadvantages, as it is observed in Table  2 . The purpose of this section is to demonstrate the main technologies of AI techniques utilised in power systems to improve traditional methods. In most of the research studies, multi-objective optimisation methods, multiple linear regressions and ANN methods have been used obtaining results with high accuracy in different case studies, e.g., rate of output of hydropower. This analysis verifies the computational efficiency of the proposed methods. The results revealed that among several AI techniques, ANN has emerged as one of the most effective methodologies for PV solar forecasting, surpassing the capabilities of GA or fuzzy logic. These research works reveal the operating ways of AI and other ML methods to forecast and improve the output power in the PV operation. Different techniques have shown limitations, e.g., reduction in estimation capabilities, that decrease the reliability of the overall analysis. It has been demonstrated the influence of the sampling rate in the obtention of reliable results required suitable datasets to overpass 80% accuracy. Moreover, it also shows that the PV system can be used at an inclination of 22° and only for plants at higher plants, as is demonstrated in a specific scenario.

Figure  3 exhibits the comparison of accuracy achieved by the various authors in the studies of Table  2 based on either AI or ML techniques. It is possible to observe that the results are not ordered over time and the variability of the results depends on the authors and the possible case studies, increasing the complexity of the efficiency of the overall analysis. It is confirmed that all the considered studies present higher accuracies than 75%, even reaching 97%.

figure 3

Accuracy comparison of different studies between 2017–2022

Figure  4 shows the usage of AI, ML and deep learning model methods used by researchers in various years of the studied research works. This visualization provides a deeper insight into the use of technology based on the involvement of these models in power generation industries for forecasting and analysis purposes.

figure 4

Technology Usage of DL, ML and AI-based methods

Figure  5 compares MPPT, ELM and ANN models according to different indicators, such as rate of output, RMSE, MSE and precision. Overall results show similarities between the techniques although ANN is superior in all the parameters compared to the other architectures, demonstrating the reliability of this ML technique. Different studies demonstrated that periods for assigning weights for ELM are notably lower than standard ANN training methods, causing variations in the achieved results in precision and RMSE.

figure 5

Performance models for MPPT, ELM and ANN

2.4 Result Based Analysis

Table  3 summarizes the various research based on methods, their performance, the outcome and the significance of the work. Moreover, the PV system is an electric power system that supplies solar power through the grid and there are several methods have been used to conduct a PV system. Those are the MPPT, ANN model, ELM model, and SVM model, among others. Those models optimise the output of shaded PV arrays or un-shaded PV arrays under “static and dynamic weather conditions” [ 99 ]. PV system conducts sustainable energy sources in present history. PV penetration information is based on the long-term efficacy of different algorithms and shows better performance in forecasting accuracy. Moreover, these articles help to understand the future challenges of PV penetration.

Various models have been selected due to their applicability and novel trends in the current state of the art, e.g., ANN, SVM, ELM, MPPT, OPV, PSO-ELM3, MLP, MLA, and FCL. For this analysis, hybrid techniques and combinations of different methodologies are discarded. These techniques are shown in Fig.  6 according to the usage and implementation in the research papers considered for this study. ELM and MPPT are most widely used by researchers for forecasting the performance and accuracy of the models, due to easy implementation and reduced training periods, being PSO is the least implemented due to reduced accuracy compared to other architectures. The combination of PSO with other advanced techniques, e.g., SVM and wavelet transform, is widely applied in the current state of the art to increase the reliability of the analysis and increase the overall accuracy.

figure 6

Models used in studies

Figure  7 shows the accuracy of the models measured based on weather conditions. The considered researchers tested their models on static weather data and dynamic weather data, and based on both conditions, the forecasting accuracy was measured in each case. The MPPT, ELM, ANN, and ELM had higher accuracy but there is a slight difference for the different weather conditions. SVM achieved the best overall accuracy for forecasting and results with dynamic weather caused by its high suitability due to the implementation of different kernel functions, and high reliability in dimensional spaces and differentiated classes.

figure 7

Model performance on weather conditions

The average accuracy of MPPT, ELM, ANN and SVM is illustrated in Fig.  8 . SVM and ANN performed with higher accuracy than MPPT and ELM, even reaching 98%. It is possible to confirm the conclusion obtained from Fig.  6 , being that ANN and SVM provide better results due to advanced training periods.

figure 8

Model accuracy comparison

3 Overall Discussion

As it has been seen throughout this review, different AI techniques have been implemented for PV systems. Specifically, this work distinguishes five main fields: price prediction, operation, forecasting, costs and ML. After comparing the different methods used, it can be seen that the most commonly used are ANN, SVM, ELM, MPPT, OPV, PSO, MLP, MLA and FCL with different performances and accuracies [ 116 ]. Among all of them, SVM stands out with the best performance and accuracy across the different applications.

Different approaches for price prediction, e.g., loop, non-linear, and robust approaches, have been widely defined by several methods including time series analysis, time decomposition, STL, Holt-Winters, Bates and Granger, Aiolfi and Timmermann. LSTM is a relevant tool with high reliability for large data volumes. Different techniques have been proposed to combine LSTM with ML, DL or statistical tools, to enhance the performance of forecasting compared to basic LSTM. Another relevant methodology is FNN, due to high accuracy in the results although the main limitation is based on training requirements. This study also demonstrates that deep learning is very useful for recognising the pattern of generating PV energy. This review highlights the need for the use of AI techniques in the field of PV systems, as they improve the accuracy of previous methods by allowing the analysis of significantly larger amounts of data. In addition, ML is a breakthrough in analytical techniques as it can be applied to a range of cases in a generalised way. Operators can potentially overcome time series forecasting challenges through adept utilization of LSTM, offering an effective solution for PV generation amid current energy challenges. Different authors have combined solar irradiation predictors combining ML, statistical methods, single models, and AI approaches. In terms of analysis of costs, MPPT methods, e.g., incremental conductance method and disturbance observation method, were used to control the voltage level and restrain the current of the photovoltaic cells at the minimum value, and the actual conversion power was limited affecting the output power of the PV cells. It is concluded that solar energy is cost-effective for commercial purposes due to the reduction of the overall costs of electricity generation and consumption.

AI and ML are being widely used in electricity forecasting and PV generation domains. The primary objective of analyzed studies is to identify PV forecasting using MLP-based methods containing ANNs and statistical approaches. Global economic development continuously enhances the market demand for electricity resulting in generating a grievous environmental impact. Proposing a Ps-LSTM framework to control the fault of ML algorithms, which are applied only to a huge volume of information solar energy production forecasting by using a deep learning approach. The survey literature has a strong concern over the finding of a data-driven approach in the technological and digitalised era. Under the context of the method, different methods of artificial and ML have been used [ 64 ]. Classification, optimizations, regressions and data structure exploration have been used as a method of artificial intelligence to reveal the impact of AI on the development of power electronic systems. XAI has been integrated into various studies over recent years. The analysis proposed in this study has presented different trends in the current state of the art, showing the main implementation conditions, challenges and constraints of different XAI techniques employed. It is demonstrated that XAI proposes potential improvements for the practical use of ML techniques, but obstacles such as standardization, security concerns, and unreliable confidence levels need careful consideration. Additionally, the analyzed papers explore potential applications and future research directions concerning XAI and energy, including smart grid applications, optimal energy management, energy consumer applications, and power system monitoring. The results indicate that SHAP and LIME represent the most widely utilized XAI techniques. Moreover, traditional ML algorithms are prevalent in XAI applications, whereas DL models are seldom integrated. Another relevant technique is the Physic Constrained-LSTM model, which helps in the superior performance of the prediction of the solar PV cells in the accuracy of forecasting the temperature. It is demonstrated that IME, XAI, and SHAP tools are widely used to acquire insight into solar PV power generation forecasting utilising explainable AI tools.

4 Conclusions and Future Research

Solar photovoltaic emerges as an alternative energy capable of meeting a greater percentage of global energy needs due to novel technical advances, reduced costs and high accuracy. The photovoltaic system is an electric power system that supplies solar power through the grid, being requires novel techniques for data analytics, forecasting and control. This paper presented a systematic review of several artificial intelligence and machine learning algorithms to present the main challenges and limitations of the current state of the art. Several researchers are still working in this domain to improve the accuracy and precision of the forecasting models to enhance the competitiveness of photovoltaic solar energy. This study presents a review of recent advancements in the technologies, techniques and methods widely implemented following three different methodologies: review-based, method-based and result-based.

The comparison of the power production using the different PV cells is understood with the help of the review of the research papers in the assignment. The Physic Constrained-Long Short-Term Memory model helps in the superior performance of the prediction of the solar PV cells in the accuracy of forecasting the temperature. It is also seen that extreme learning machines and k-means have shown accuracies of around 90%. Artificial intelligence methods are demonstrating their high strength and reliability compared to conventional modelling models, with reduced computational costs and providing reliable solutions. In most of the research studies, multi-objective optimisation methods, multiple linear regressions and artificial neural network methods have been used to observe the rate of output and verify the computational efficiency of the proposed method. There are several methods have been used to conduct a photovoltaic system, e.g., Maximum Power Point Tracking, Artificial Neural Network model, Extreme Learning Machine, and Support Vector Machine, among others models. Different basic approaches, e.g., Particle Swarm Optimization or several optimization techniques, presented low accuracy compared with other Machine Learning techniques, being less implemented in the current state of the art and being necessary the combination with other advanced techniques. It is important to highlight that the Support Vector Machine is one of the most applied techniques providing high reliability and suitability in different real case studies. It is also concluded that Machine Learning-based methods have recently become relevant in the analyzed literature caused of the increased complexity of operational scenarios for solar plants together with high computational power and data availability. In future research, it is proposed the analysis of novel hybrid methods to be compared with basic Machine Learning techniques, together with novel working scenarios with large solar plants.

Abbreviations

Adaptive Neuro-Fuzzy Inference System

Artificial Intelligence

Artificial Neural Network

Auto Regressive Moving Average

Autoencoders

Autoregressive Integrated Moving Average

Cuckoo Optimisation Algorithm

Deep Convolutional Neural Networks

Deep Neural Network

Economic Load dispatching

Empirical Mode Decomposition

Event Detection

Explainable Artificial Intelligence

Extreme Learning Machine

Feed Forward neural networks

Floating Solar PV

Fuzzy Logic

Fuzzy Logic Controller

Gaussian Process Regression

Generalised Regression Neural Network

Genetic Algorithm

Local Interpretable ModA24: B46el-Agnostic Explanations

Local Interpretable Model-Agnostic Explanations

Long short-term memory

Mean Absolute Error

Machine Learning

Markov chain Monte Carlo

Maximum Power Point

Maximum Power Point Tracking

Multi-Layer Feedforward Neural Network

Multi-Layer Perceptron

Multiple Linear Regression

Organice Photovoltaic

Partial Shading Conditions

Particle Swarm Optimization

Photovoltaic

Perturb and Observe

Recurrent Neural Network

Root Mean Squared Error

Seawater Desalination Plant

Shapely Additive exPlanations

Support Vector Machines

Support Vector Regression

Thermoelectric Module

Unit Commitment

Balachandran GB, David PW, Alexander AB, Athikesavan MM, Chellam PVP, Kumar KKS, Palanichamy V, Kabeel AE, Sathyamurthy R, Marquez FP (2021).G. A relative study on energy and exergy analysis between conventional single slope and novel stepped absorbable plate solar stills. Environ Sci Pollut Res 28:57602–57618

Article   Google Scholar  

Chandrika VS, Attia MEH, Manokar AM, Marquez FPG, Driss Z, Sathyamurthy R (2021) Performance enhancements of conventional solar still using reflective aluminium foil sheet and reflective glass mirrors: energy and exergy analysis. Environ Sci Pollut Res 28:32508–32516

Lee W, Kim K, Park J, Kim J, Kim Y (2018) Forecasting solar power using long-short term memory and convolutional neural networks. IEEE Access 6:73068–73080

Pedregal DJ, García FP, Roberts C (2009) An algorithmic approach for maintenance management based on advanced state space systems and harmonic regressions. Ann Oper Res 166:109–124

Article   MathSciNet   Google Scholar  

Salam RA, Amber KP, Ratyal NI, Alam M, Akram N, Gómez Muñoz CQ, García Márquez FP (2020) An overview on energy and development of energy integration in major south Asian countries: the building sector. Energies 13:5776

Navid Q, Hassan A, Fardoun AA, Ramzan R, Alraeesi A (2021) Fault diagnostic methodologies for utility-scale photovoltaic power plants: a state of the art review. Sustainability 13:1629

Muñoz CQG, Marquez FPG, Lev B, Arcos A (2017) New pipe notch detection and location method for short distances employing ultrasonic guided waves. Acta Acustica United Acustica 103:772–781

de la Gonzalez H, Márquez RR, Dimlaye FPG, Ruiz-Hernández V (2014) Pattern recognition by wavelet transforms using macro fibre composites transducers. Mech Syst Signal Process 48:339–350

Zazoum B (2022) Solar photovoltaic power prediction using different machine learning methods. Energy Rep 8:19–25

Chankaya M, Hussain I, Ahmad A, Malik H, García Márquez FP (2021) Generalized normal distribution algorithm-based control of 3-phase 4-wire grid-tied pv-hybrid energy storage system. Energies 14:4355

Acaroğlu H, Márquez FPG (2022) A life-cycle cost analysis of high voltage direct current utilization for solar energy systems: the case study in Turkey. J Clean Prod 360:132128

Ram Babu N, Bhagat SK, Saikia LC, Chiranjeevi T, Devarapalli R (2022) García Márquez, F.P. A comprehensive review of recent strategies on automatic generation control/load frequency control in power systems. Arch Comput Methods Eng 1–30

Tived A (2020) Artificial intelligence in the solar pv value chain: Current applications and future prospects

Garcia Marquez FP, Gomez Munoz CQ (2020) A new approach for fault detection, location and diagnosis by ultrasonic testing. Energies 13:1192

García Márquez FP, Segovia Ramírez I, Pliego Marugán A (2019) Decision making using logical decision tree and binary decision diagrams: a real case study of wind turbine manufacturing. Energies 12:1753

Trappey AJ, Chen PP, Trappey CV, Ma L (2019) A machine learning approach for solar power technology review and patent evolution analysis. Appl Sci 9:1478

Segovia Ramirez I, Das B, Garcia Marquez FP (2022) Fault detection and diagnosis in photovoltaic panels by radiometric sensors embedded in unmanned aerial vehicles. Prog Photovoltaics Res Appl 30:240–256

de la Hermosa González RR, Márquez FPG, Dimlaye V (2015) Maintenance management of wind turbines structures via mfcs and wavelet transforms. Renew Sustain Energy Rev 48:472–482

Devarapalli R, Sinha NK, García Márquez FP (2022) A review on the computational methods of power system stabilizer for damping power network oscillations. Arch Comput Methods Eng 1–27

Sukno FM, Waddington JL, Whelan PF (2014) 3-d facial landmark localization with asymmetry patterns and shape regression from incomplete local features. IEEE Trans Cybernetics 45:1717–1730

Anandaraj S, Ayyasamy M, Marquez FPG, Athikesavan MM (2023) Experimental studies of different operating parameters on the photovoltaic thermal system using a flattened geometrical structure. Environ Sci Pollut Res 30:1116–1132

Ramírez IS, Chaparro JRP, Márquez FPG (2022) Unmanned aerial vehicle integrated real time kinematic in infrared inspection of photovoltaic panels. Measurement 188:110536

Rasachak S, Khan RSU, Kumar L, Zahid T, Ghafoor U, Selvaraj J, Nasrin R, Ahmad MS (2022) Effect of tin oxide/black paint coating on absorber plate temperature for improved solar still production: A controlled indoor and outdoor investigation. International Journal of Photoenergy 2022

Maria M, Yassine C (2020) Machine learning based approaches for modeling the output power of photovoltaic array in real outdoor conditions. Electronics 9:315

Li B, Delpha C, Diallo D, Migan-Dubois A (2021) Application of artificial neural networks to photovoltaic fault detection and diagnosis: a review. Renew Sustain Energy Rev 138:110512

Chen Q, Zhang Y, Liu S, Han T, Chen X, Xu Y, Meng Z, Zhang G, Zheng X, Zhao J (2020) Switchable perovskite photovoltaic sensors for bioinspired adaptive machine vision. Adv Intell Syst 2:2000122

Dass PMA, Fathima AP (2020) In Grid integration of photovoltaic system interfaced with artificial intelligence based modified universal power quality conditioning system, Journal of Physics: Conference Series,; IOP Publishing: p 012010

Ahmed R, Sreeram V, Mishra Y, Arif M (2020) A review and evaluation of the state-of-the-art in pv solar power forecasting: techniques and optimization. Renew Sustain Energy Rev 124:109792

Chenini L, Aissat A (2020) Theoretical study of quantum well gaasp (n)/gap structures for solar cells. In A practical guide for advanced methods in solar photovoltaic systems, Springer: pp 67–80

Gailhofer P, Herold A, Schemmel JP, Scherf C-S, de Stebelski CU, Köhler AR, Braungardt S (2021) The role of artificial intelligence in the European green deal. European Parliament Luxembourg, Belgium

Google Scholar  

Rodríguez F, Martín F, Fontán L, Galarza A (2021) Ensemble of machine learning and spatiotemporal parameters to forecast very short-term solar irradiation to compute photovoltaic generators’ output power. Energy 229:120647

Faiz Minai A, Khan AA, Pachauri RK, Malik H, García Márquez FP (2022) Arcos Jiménez, A. Performance evaluation of solar pv-based z-source cascaded multilevel inverter with optimized switching scheme. Electronics 11:3706

Mohamad Radzi PNL, Akhter MN, Mekhilef S (2023) Mohamed Shah, N. Review on the application of photovoltaic forecasting using machine learning for very short-to long-term forecasting. Sustainability 15:2942

Khan MJ, Kumar D, Narayan Y, Malik H, García Márquez FP (2022) Gómez Muñoz, C.Q. A novel artificial intelligence maximum power point tracking technique for integrated pv-wt-fc frameworks. Energies 15:3352

García Márquez FP, Segovia Ramírez I, Mohammadi-Ivatloo B, Marugán AP (2020) Reliability dynamic analysis by fault trees and binary decision diagrams. Information 11:324

Chankaya M, Hussain I, Malik H, Ahmad A, Alotaibi MA, Márquez (2022) F.P.G. Seamless capable pv power generation system without battery storage for rural residential load. Electronics 11:2413

Camargo LR, Schmidt J (2020) Simulation of multi-annual time series of solar photovoltaic power: is the era5-land reanalysis the next big step? Sustain Energy Technol Assess 42:100829

Guo X, Gao Y, Zheng D, Ning Y, Zhao Q (2020) Study on short-term photovoltaic power prediction model based on the stacking ensemble learning. Energy Rep 6:1424–1431

Ogawa S, Mori H (2020) Integration of deep boltzmann machine and generalized radial basis function network for photovoltaic generation output forecasting. IFAC-PapersOnLine 53:12163–12168

Dorokhova M, Martinson Y, Ballif C, Wyrsch N (2021) Deep reinforcement learning control of electric vehicle charging in the presence of photovoltaic generation. Appl Energy 301:117504

Beltrán S, Castro A, Irizar I, Naveran G, Yeregui I (2022) Framework for collaborative intelligence in forecasting day-ahead electricity price. Appl Energy 306:118049

Luo X, Zhang D, Zhu X (2021) Deep learning based forecasting of photovoltaic power generation by incorporating domain knowledge. Energy 225:120240

Sharma AK, Pachauri RK, Choudhury S, Minai AF, Alotaibi MA, Malik H, Márquez FPG (2023) Role of metaheuristic approaches for implementation of integrated mppt-pv systems: a comprehensive study. Mathematics 11:269

Yin W, Ming Z, Wen T, Zhang C, Retracted (2022) Photovoltaic curve management using demand response with long and short-term memory. Elsevier

Visser L, AlSkaif T, van Sark W (2022) Operational day-ahead solar power forecasting for aggregated pv systems with a varying spatial distribution. Renewable Energy 183:267–282

Gowid S, Massoud A (2020) A robust experimental-based artificial neural network approach for photovoltaic maximum power point identification considering electrical, thermal and meteorological impact. Alexandria Eng J 59:3699–3707

Tian L, Huang Y, Liu S, Sun S, Deng J, Zhao H (2021) Application of photovoltaic power generation in rail transit power supply system under the background of energy low carbon transformation. Alexandria Eng J 60:5167–5174

Sun L, You F (2021) Machine learning and data-driven techniques for the control of smart power generation systems: an uncertainty handling perspective. Engineering 7:1239–1247

Behera MK, Nayak N (2020) A comparative study on short-term pv power forecasting using decomposition based optimized extreme learning machine algorithm. Eng Sci Technol Int J 23:156–167

Khodayar M, Khodayar ME, Jalali SMJ (2021) Deep learning for pattern recognition of photovoltaic energy generation. Electricity J 34:106882

Goswami A, Sadhu PK (2022) Nature inspired evolutionary algorithm integrated performance assessment of floating solar photovoltaic module for low-carbon clean energy generation. Sustainable Oper Computers 3:67–82

Feng H-J, Wu K, Deng Z-Y (2020) Predicting inorganic photovoltaic materials with efficiencies > 26% via structure-relevant machine learning and density functional calculations. Cell Rep Phys Sci 1:100179

Liang L, Duan Z, Li G, Zhu H, Shi Y, Cui Q, Chen B, Hu W (2021) Status evaluation method for arrays in large-scale photovoltaic power stations based on extreme learning machine and k-means. Energy Rep 7:2484–2492

Kalogirou S, Sencan A (2010) Artificial intelligence techniques in solar energy applications. Solar Collectors Panels Theory Appl 15:315–340

Maycock PD (1994) International photovoltaic markets, developments and trends forecast to 2010. Renewable Energy 5:154–161

Hossain MS, Mahmood H (2020) Short-term photovoltaic power forecasting using an lstm neural network and synthetic weather forecast. Ieee Access 8:172524–172533

Almomani M, Al-Dmour AS, Algharaibeh S Application of artificial intelligence techniques for modeling and simulation of photovoltaic arrays

Das UK, Tey KS, Seyedmahmoudian M, Mekhilef S, Idris MYI, Van Deventer W, Horan B, Stojcevski A (2018) Forecasting of photovoltaic power generation and model optimization: a review. Renew Sustain Energy Rev 81:912–928

Kaliappan S, Saravanakumar R, Karthick A, Kumar PM, Venkatesh V, Mohanavel V, Rajkumar S (2021) Hourly and day ahead power prediction of building integrated semitransparent photovoltaic system. International Journal of Photoenergy 2021

Yap KY, Sarimuthu CR, Lim JM-Y (2020) Artificial intelligence based mppt techniques for solar power system: a review. J Mod Power Syst Clean Energy 8:1043–1059

Chankaya M, Hussain I, Ahmad A, Malik H, García Márquez FP (2021) Multi-objective grasshopper optimization based mppt and vsc control of grid-tied pv-battery system. Electronics 10:2770

Zhang R, Feng M, Zhang W, Lu S, Wang F (2018) In Forecast of solar energy production-a deep learning approach, IEEE International Conference on Big Knowledge (ICBK), 2018; IEEE: pp 73–82

Ai-Habahbeh O, Ai-Hrout B, Al-Hiary E, Ai-Fraihat S (2013) In Reliability investigation of photovoltaic cell using finite element modeling, 2013 9th International Symposium on Mechatronics and its Applications (ISMA), IEEE: pp 1–5

Rathore MM, Shah SA, Shukla D, Bentafat E, Bakiras S (2021) The role of Ai, machine learning, and big data in digital twinning: a systematic literature review, challenges, and opportunities. IEEE Access 9:32030–32052

Hussin NSM, Amin NAM, Safar MJA, Zulkafli RS, Majid MSA, Rojan MA, Zaman I (2018) In Performance factors of the photovoltaic system: A review, MATEC Web of Conferences,; EDP Sciences: p 03020

Gligor A, Dumitru C-D, Grif H-S (2018) Artificial intelligence solution for managing a photovoltaic energy production unit. Procedia Manuf 22:626–633

Kihlström V, Elbe J (2021) Constructing markets for solar energy—a review of literature about market barriers and government responses. Sustainability 13:3273

Kalaiarasi N, Subhranshu SD, Paramasivam S, Bharatiraja C (2021) Investigation on anfis aided mppt technique for pv fed zsi topologies in standalone applications. J Appl Sci Eng 24:261–269

Ghannam R, Klaine PV, Imran M (2019) Artificial intelligence for photovoltaic systems. In Solar photovoltaic power plants, Springer:; pp 121–142

Ben Othman A, Ouni A, Besbes M (2020) Deep learning-based estimation of pv power plant potential under climate change: a case study of El Akarit, Tunisia. Energy Sustain Soc 10:1–11

Ebrahim M, Ramadan S, Attia H, Saied E, Lehtonen M, Abdelhadi H (2021) Improving the performance of photovoltaic by using artificial intelligence optimization techniques. Int J Renew ENERGY Res 11:46–53

Khan ZA, Ahmad W, Khan UH, Alam Z, Rehman AU, Khan R (2020) In Artificial intelligence based nonlinear integral back-stepping control approach for mppt of photovoltaic system, 2020 International Conference on Emerging Trends in Smart Technologies (ICETST),; IEEE: pp 1–8

Khadka N, Bista A, Adhikari B, Shrestha A, Bista D, Adhikary B (2020) Current practices of solar photovoltaic panel cleaning system and future prospects of machine learning implementation. IEEE Access 8:135948–135962

Kuzlu M, Cali U, Sharma V, Güler Ö (2020) Gaining insight into solar photovoltaic power generation forecasting utilizing explainable artificial intelligence tools. IEEE Access 8:187814–187823

Sarp S, Kuzlu M, Cali U, Elma O, Guler O (2021) In An interpretable solar photovoltaic power generation forecasting approach using an explainable artificial intelligence tool, IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), 2021; IEEE: pp 1–5

Esen V, Sağlam Ş, Oral B (2020) Solar irradiation fundamentals and solar simulators. In A practical guide for advanced methods in solar photovoltaic systems, Springer: pp 3–28

Benghanem M, Almohammedi A (2020) Organic solar cells: a review. Practical Guide Adv Methods Solar Photovolt Syst 81–106

Abada Z, Mellit A (2017) Optical optimization of organic solar cells based on p3ht: Pcbm interpenetrating blend. 5th International Conference on Electrical Engineering - Boumerdes (ICEE-B) 2017, 1–6

Brester C, Kallio-Myers V, Lindfors AV, Kolehmainen M, Niska H (2023) Evaluating neural network models in site-specific solar pv forecasting using numerical weather prediction data and weather observations. Renewable Energy 207:266–274

Sarmas E, Spiliotis E, Stamatopoulos E, Marinakis V, Doukas H (2023) Short-term photovoltaic power forecasting using meta-learning and numerical weather prediction independent long short-term memory models. Renewable Energy 216:118997

Mazur D, Polyakova O, Artaev V, Lebedev A (2017) Novel pollutants in the moscow atmosphere in winter period: gas chromatography-high resolution time-of-flight mass spectrometry study. Environ Pollut 222:242–250

Michaels H, Rinderle M, Freitag R, Benesperi I, Edvinsson T, Socher R, Gagliardi A, Freitag M (2020) Dye-sensitized solar cells under ambient light powering machine learning: towards autonomous smart sensors for the internet of things. Chem Sci 11:2895–2906

Yousuf H, Zainal AY, Alshurideh M, Salloum SA (2021) Artificial intelligence models in power system analysis. In Artificial intelligence for sustainable development: Theory, practice and future applications, Springer: pp 231–242

Priyadarshi N, Azam F, Sharma AK, Vardia M (2020) An adaptive neuro-fuzzy inference system-based intelligent grid-connected photovoltaic power generation. In Advances in computational intelligence, Springer: pp 3–14

Omran AH, Said DM, Hussin SM, Ahmad N, Samet H (2020) A novel intelligent detection schema of series arc fault in photovoltaic (pv) system based convolutional neural network. Periodicals Eng Nat Sci (PEN) 8:1641–1653

Abo-Sennah M, El-Dabah M, Mansour AE-B (2021) Maximum power point tracking techniques for photovoltaic systems: A comparative study. International Journal of Electrical & Computer Engineering (2088–8708), 11

Li G, Xie S, Wang B, Xin J, Li Y, Du S (2020) Photovoltaic power forecasting with a hybrid deep learning approach. IEEE Access 8:175871–175880

Mellit A, Massi Pavan A, Ogliari E, Leva S, Lughi V (2020) Advanced methods for photovoltaic output power forecasting: a review. Appl Sci 10:487

Simal Pérez N, Alonso-Montesinos J, Batlles FJ (2021) Estimation of soiling losses from an experimental photovoltaic plant using artificial intelligence techniques. Appl Sci 11:1516

Kurukuru VSB, Haque A, Khan MA, Sahoo S, Malik A, Blaabjerg F (2021) A review on artificial intelligence applications for grid-connected solar photovoltaic systems. Energies 14:4690

Garud KS, Jayaraj S, Lee MY (2021) A review on modeling of solar photovoltaic systems using artificial neural networks, fuzzy logic, genetic algorithm and hybrid models. Int J Energy Res 45:6–35

Ghadami N, Gheibi M, Kian Z, Faramarz MG, Naghedi R, Eftekhari M, Fathollahi-Fard AM, Dulebenets MA, Tian G (2021) Implementation of solar energy in smart cities using an integration of artificial neural network, photovoltaic system and classical delphi methods. Sustainable Cities Soc 74:103149

Hussain M, Dhimish M, Titarenko S, Mather P (2020) Artificial neural network based photovoltaic fault detection algorithm integrating two bi-directional input parameters. Renewable Energy 155:1272–1292

Bouchouicha K, Bailek N, Razagui A, Mohamed E-S, Bellaoui M, Bachari (2020) N.E.I. comparison of artificial intelligence and empirical models for energy production estimation of 20 mwp solar photovoltaic plant at the saharan medium of Algeria. Int J Energy Sect Manage

Mahmoud K, Abdel-Nasser M, Kashef H, Puig D, Lehtonen M (2020) Machine learning based method for estimating energy losses in large-scale unbalanced distribution systems with photovoltaics. IJIMAI 6:157–163

Theocharides S, Makrides G, Livera A, Theristis M, Kaimakis P, Georghiou GE (2020) Day-ahead photovoltaic power production forecasting methodology based on machine learning and statistical post-processing. Appl Energy 268:115023

Zhou Y, Chang F-J, Chang L-C, Lee W-D, Huang A, Xu C-Y, Guo S (2020) An advanced complementary scheme of floating photovoltaic and hydropower generation flourishing water-food-energy nexus synergies. Appl Energy 275:115389

Chandra S, Gaur P, Pathak D (2020) Radial basis function neural network based maximum power point tracking for photovoltaic brushless dc motor connected water pumping system. Comput Electr Eng 86:106730

Polasek T, Čadík M (2023) Predicting photovoltaic power production using high-uncertainty weather forecasts. Appl Energy 339:120989

Olowu TO, Sundararajan A, Moghaddami M, Sarwat AI (2018) Future challenges and mitigation methods for high photovoltaic penetration: a survey. Energies 11:1782

Akhter MN, Mekhilef S, Mokhlis H (2019) Mohamed Shah, N. Review on forecasting of photovoltaic power generation based on machine learning and metaheuristic techniques. IET Renew Power Gener 13:1009–1023

Wang H, Liu Y, Zhou B, Li C, Cao G, Voropai N, Barakhtenko E (2020) Taxonomy research of artificial intelligence for deterministic solar power forecasting. Energy Conv Manag 214:112909

Al-Majidi SD, Abbod MF, Al-Raweshidy HS (2020) A particle swarm optimisation-trained feedforward neural network for predicting the maximum power point of a photovoltaic array. Eng Appl Artif Intell 92:103688

Ağbulut Ü, Gürel AE, Ergün A, Ceylan İ (2020) Performance assessment of a v-trough photovoltaic system and prediction of power output with different machine learning algorithms. J Clean Prod 268:122269

Sharadga H, Hajimirza S, Balog RS (2020) Time series forecasting of solar power generation for large-scale photovoltaic plants. Renewable Energy 150:797–807

Ammar RB, Ammar MB, Oualha A (2020) Photovoltaic power forecast using empirical models and artificial intelligence approaches for water pumping systems. Renewable Energy 153:1016–1028

Cortes B, Sánchez RT, Flores JJ (2020) Characterization of a polycrystalline photovoltaic cell using artificial neural networks. Sol Energy 196:157–167

Mellit A, Kalogirou S (2021) Artificial intelligence and internet of things to improve efficacy of diagnosis and remote sensing of solar photovoltaic systems: challenges, recommendations and future directions. Renew Sustain Energy Rev 143:110889

Zhang S, Wang J, Liu H, Tong J, Sun Z (2021) Prediction of energy photovoltaic power generation based on artificial intelligence algorithm. Neural Comput Appl 33:821–835

Meftahi N, Klymenko M, Christofferson AJ, Bach U, Winkler DA, Russo SP (2020) Machine learning property prediction for organic photovoltaic devices. Npj Comput Mater 6:1–8

Ali MN, Mahmoud K, Lehtonen M, Darwish MM (2021) Promising mppt methods combining metaheuristic, fuzzy-logic and ann techniques for grid-connected photovoltaic. Sensors 21:1244

Bendary AF, Abdelaziz AY, Ismail MM, Mahmoud K, Lehtonen M, Darwish MM (2021) Proposed anfis based approach for fault tracking, detection, clearing and rearrangement for photovoltaic system. Sensors 21:2269

Murillo-Yarce D, Alarcón-Alarcón J, Rivera M, Restrepo C, Muñoz J, Baier C, Wheeler P (2020) A review of control techniques in photovoltaic systems. Sustainability 12:10598

Ahmad R, Murtaza AF, Sher HA (2019) Power tracking techniques for efficient operation of photovoltaic array in solar applications–a review. Renew Sustain Energy Rev 101:82–102

Liao K-C, Lu J-H (2021) Using uav to detect solar module fault conditions of a solar power farm with ir and visual image analysis. Appl Sci 11:1835

Li Y, Song L, Zhang S, Kraus L, Adcox T, Willardson R, Komandur A, Lu (2023), N. A tcn-based hybrid forecasting framework for hours-ahead utility-scale pv forecasting. IEEE Trans Smart Grid

Download references

Acknowledgements

The work reported herein was supported financially by the Ministerio de Ciencia e Innovación (Spain) and the European Regional Development Fund, under the Research Grant RA4PV project (Ref.: RTC2019-007364-3).

Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.

Author information

Authors and affiliations.

Ingenium Research Group, Universidad de Castilla-La Mancha, Ciudad Real, Ciudad Real, Spain

Abhishek Kumar, Ashutosh Kumar Dubey, Isaac Segovia Ramírez, Alba Muñoz del Río & Fausto Pedro García Márquez

Dept. of CSE, Chandigarh University, Punjab, India

Abhishek Kumar

Chitkara University School of Engineering and Technology, Chitkara University, Himachal Pradesh, India

Ashutosh Kumar Dubey

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Fausto Pedro García Márquez .

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Kumar, A., Dubey, A.K., Segovia Ramírez, I. et al. Artificial Intelligence Techniques for the Photovoltaic System: A Systematic Review and Analysis for Evaluation and Benchmarking. Arch Computat Methods Eng (2024). https://doi.org/10.1007/s11831-024-10125-3

Download citation

Received : 15 September 2023

Accepted : 25 March 2024

Published : 08 May 2024

DOI : https://doi.org/10.1007/s11831-024-10125-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. How To Write A Research Paper On Artificial Intelligence?

    research paper on artificial intelligence algorithms

  2. (PDF) An overview of the applications of Artificial Intelligence in

    research paper on artificial intelligence algorithms

  3. Top 3 Artificial Intelligence Research Papers

    research paper on artificial intelligence algorithms

  4. (PDF) ARTIFICIAL INTELLIGENCE IN EDUCATION

    research paper on artificial intelligence algorithms

  5. Handbook of Research on Artificial Intelligence Techniques and

    research paper on artificial intelligence algorithms

  6. (PDF) An Overview of Artificial Intelligence and their Applications

    research paper on artificial intelligence algorithms

VIDEO

  1. Class 9 final paper of Artificial intelligence || Question paper Artificial intelligence

  2. Solution of Artificial Intelligence Question Paper || AI || 843 Class 12 || CBSE Board 2023-24

  3. The Truth behind Open AI: AGI and Human-like Intelligence

  4. Solution of Artificial Intelligence Question Paper || AI || 843 Class 12 || CBSE Board 2022-23

  5. Best Ai Website For Research #shorts #research #ai #website

  6. AI News: CoALA, Theory of Mind, Artificial Neurons, Swarm Intelligence, and Neural Convergence

COMMENTS

  1. Artificial intelligence: A powerful paradigm for scientific research

    Artificial intelligence (AI) is a rapidly evolving field that has transformed various domains of scientific research. This article provides an overview of the history, applications, challenges, and opportunities of AI in science. It also discusses how AI can enhance scientific creativity, collaboration, and communication. Learn more about the potential and impact of AI in science by reading ...

  2. Machine Learning: Algorithms, Real-World Applications and Research

    Artificial intelligence (AI), particularly, machine learning (ML) have grown rapidly in recent years in the context of data analysis and computing that typically allows the applications to function in an intelligent manner [].ML usually provides systems with the ability to learn and enhance from experience automatically without being specifically programmed and is generally referred to as the ...

  3. Scientific discovery in the age of artificial intelligence

    Fig. 1: Science in the age of artificial intelligence. Scientific discovery is a multifaceted process that involves several interconnected stages, including hypothesis formation, experimental ...

  4. Artificial intelligence in information systems research: A systematic

    Followed by a discussion, implications, and a research agenda for the future. The paper ends with a conclusion and directions for future research. 2. Background and related work. ... We define artificial intelligence (AI) as algorithms that perform perceptual, cognitive, and conversational functions typical of the human mind: P84:

  5. Artificial intelligence and machine learning research ...

    Artificial intelligence and machine learning research: towards digital transformation at a global scale ... Teaching-Learning based Optimization Algorithm. ... Brahimi, T. et al. Artificial intelligence and machine learning research: towards digital transformation at a global scale. J Ambient Intell Human Comput 13, 3319-3321 (2022) . https ...

  6. Medical Diagnostic Systems Using Artificial Intelligence (AI

    Medical Diagnostic Systems Using Artificial Intelligence (AI) Algorithms: Principles and Perspectives ... and Deep Learning. This research paper aims to reveal some important insights into current and previous different AI techniques in the medical field used in today's medical research, particularly in heart disease prediction, brain disease ...

  7. AI-Based Modeling: Techniques, Applications and Research ...

    Artificial intelligence (AI) is a leading technology of the current age of the Fourth Industrial Revolution (Industry 4.0 or 4IR), with the capability of incorporating human behavior and intelligence into machines or systems. Thus, AI-based modeling is the key to build automated, intelligent, and smart systems according to today's needs. To solve real-world issues, various types of AI such ...

  8. Algorithms for Artificial Intelligence

    Artificial intelligence and machine learning use a wide variety of algorithms. More than 60 years of research have shown that intelligent behavior requires a variety of methods. We discuss the major classes of algorithms and their uses.

  9. Generative AI: A Review on Models and Applications

    Generative Artificial Intelligence (AI) stands as a transformative paradigm in machine learning, enabling the creation of complex and realistic data from latent representations. This review paper comprehensively surveys the landscape of Generative AI, encompassing its foundational concepts, diverse models, training methodologies, applications, challenges, recent advancements, evaluation ...

  10. Advancing agricultural research using machine learning algorithms

    Advancing agricultural research using machine learning algorithms. Scientific Reports 11, Article number: 17879 ( 2021 ) Cite this article. Rising global population and climate change realities ...

  11. Exploring Explainable AI: Current Trends, Challenges, Techniques and

    Explainable Artificial Intelligence (XAI) is an emerging field of research that aims to create transparent and interpretable AI models. The recent years have seen a significant increase in the development of XAI methods and techniques, which has led to numerous applications in various domains, including healthcare, finance, and autonomous vehicles.

  12. Ethical principles in machine learning and artificial intelligence

    Decision-making on numerous aspects of our daily lives is being outsourced to machine-learning (ML) algorithms and artificial intelligence (AI), motivated by speed and efficiency in the decision ...

  13. Artificial intelligence in healthcare: transforming the practice of

    Artificial intelligence (AI) is a powerful and disruptive area of computer science, with the potential to fundamentally transform the practice of medicine and the delivery of healthcare. In this review article, we outline recent breakthroughs in the application of AI in healthcare, describe a roadmap to building effective, reliable and safe AI ...

  14. (PDF) Artifical Intelligence and Bias: Challenges ...

    Abstract. This paper investigates the multifaceted issue of algorithmic bias in artificial intelligence (AI) systems and explores its ethical and human rights implications. The study encompasses a ...

  15. A systematic literature review on hardware implementation of artificial

    4.3 Algorithms, tools, and platforms perspective In this section, an analysis for the collected research papers from algorithm perspective is presented in order to state the most used AI and ML algorithms in different applications (Fig. 6). Most of the research papers implemented object detection acceleration.

  16. Artificial intelligence for cybersecurity: Literature review and future

    Artificial intelligence (AI) is a powerful technology that helps cybersecurity teams automate repetitive tasks, accelerate threat detection and response, and improve the accuracy of their actions to strengthen the security posture against various security issues and cyberattacks. ... case-based reasoning, genetic algorithm, Bayesian ...

  17. Explainable artificial intelligence: a comprehensive review

    Artificial intelligence (AI) has been considered the most prevalent technology over the last couple of decades. According to a report by the International Data Corporation (IDC), the AI global expenditures are forecasted to reach nearly $100 billion in 2023, which is more than double the spending of $37.5 billion in 2019 (IDC 2020).In the meantime, Statista, which is a well-known online portal ...

  18. How artificial intelligence is transforming the world

    April 24, 2018. Artificial intelligence (AI) is a wide-ranging tool that enables people to rethink how we integrate information, analyze data, and use the resulting insights to improve decision ...

  19. Artificial Intelligence in Advertising: Advancements, Challenges, and

    With the rapid advancement of artificial intelligence (AI) technology, the advertising industry is at a crossroads of new opportunities and challenges. ... To gather data, based on the research objectives of this paper, we devised a Search String to retrieve the required literature for this study. ... Machine Learning Algorithms, and Behavioral ...

  20. A survey on performance evaluation of artificial intelligence ...

    The first research question is as follows: "which is the best AI technique or algorithm for improving IoT security?" for that, this paper performs a comprehensive study on the performance of ...

  21. Artificial intelligence: A powerful paradigm for scientific research

    This reignited AI research, and DL algorithms have become one of the most active fields of AI research. DL is a subset of ML based on multiple layers of neural networks with representation learning, 5 while ML is a part of AI that a computer or a program can use to learn and acquire intelligence without human intervention.

  22. A systematic literature review on hardware implementation of artificial

    More than 169 different research papers published between the years 2009 and 2019 are studied and analysed. Artificial intelligence (AI) and machine learning (ML) tools play a significant role in the recent evolution of smart systems. ... A systematic literature review on hardware implementation of artificial intelligence algorithms. J ...

  23. Artificial Intelligence Algorithms in Flood Prediction: A General

    Abstract. This paper presents a comprehensive general overview of the extensive literature available in the field of application of artificial intelligence (AI) in flood prediction. The initial ...

  24. Research on Artificial Intelligence Algorithm and Its Application in

    With the in-depth development of intelligent technology, game artificial intelligence (AI) has become the technical core of improving the playability of a game and the main selling point of game promotion, deepening the game experience realm. Modern computer games achieve the realism of games by integrating graphics, physics and artificial intelligence. It is difficult to define the meaning of ...

  25. [2405.14707] Artificial Intelligence (AI) in Legal Data Mining

    Artificial Intelligence (AI) in Legal Data Mining. Aniket Deroy, Naksatra Kumar Bailung, Kripabandhu Ghosh, Saptarshi Ghosh, Abhijnan Chakraborty. Despite the availability of vast amounts of data, legal data is often unstructured, making it difficult even for law practitioners to ingest and comprehend the same.

  26. Application of artificial intelligence algorithms in image processing

    This paper uses TSP30 problem as an example to verify the effect of the ant colony algorithm improvement, in which the parameters are set as follows: m = 16, Q = 100, N C _ max = 100, α = 1, β = 5, ρ = 0.5. Fig. 3 shows the evolutionary curve before and after the improved algorithm. From Fig. 3, it can be seen that without the improvement of ant colony algorithm, the algorithm will fall ...

  27. AI Is a Black Box. Anthropic Figured Out a Way to Look Inside

    Topics artificial intelligence algorithms deep learning neural networks research Read More ChatGPT Gets a Snappy, Flirty Upgrade With OpenAI's GPT-4o AI Model

  28. Essence as Algorithm: Public Perceptions of AI-Powered Avatars of Real

    This paper investigates the intersection of generative AI, Large Language Models (LLM), and robotics. Exemplified by systems like ChatGPT and technological marvels such as Ameca the Robot, the combination of technologies will allow humans to transcend the limitations of death. Through digital necromancy, a practice encompassing the technological resurrection of deceased individuals, the ...

  29. New Anthropic Research Sheds Light on AI's 'Black Box'

    Newly published research from Anthropic, one of the top companies in the AI industry, attempts to shed some light on the more confounding aspects of AI's algorithmic behavior. On Tuesday ...

  30. Artificial Intelligence Techniques for the Photovoltaic System: A

    Novel algorithms and techniques are being developed for design, forecasting and maintenance in photovoltaic due to high computational costs and volume of data. Machine Learning, artificial intelligence techniques and algorithms provide automated, intelligent and history-based solutions for complex scenarios. This paper aims to identify through a systematic review and analysis the role of ...