• Open access
  • Published: 16 January 2024

A review of graph neural networks: concepts, architectures, techniques, challenges, datasets, applications, and future directions

  • Bharti Khemani 1 ,
  • Shruti Patil 2 ,
  • Ketan Kotecha 2 &
  • Sudeep Tanwar 3  

Journal of Big Data volume  11 , Article number:  18 ( 2024 ) Cite this article

10k Accesses

6 Citations

Metrics details

Deep learning has seen significant growth recently and is now applied to a wide range of conventional use cases, including graphs. Graph data provides relational information between elements and is a standard data format for various machine learning and deep learning tasks. Models that can learn from such inputs are essential for working with graph data effectively. This paper identifies nodes and edges within specific applications, such as text, entities, and relations, to create graph structures. Different applications may require various graph neural network (GNN) models. GNNs facilitate the exchange of information between nodes in a graph, enabling them to understand dependencies within the nodes and edges. The paper delves into specific GNN models like graph convolution networks (GCNs), GraphSAGE, and graph attention networks (GATs), which are widely used in various applications today. It also discusses the message-passing mechanism employed by GNN models and examines the strengths and limitations of these models in different domains. Furthermore, the paper explores the diverse applications of GNNs, the datasets commonly used with them, and the Python libraries that support GNN models. It offers an extensive overview of the landscape of GNN research and its practical implementations.

Introduction

Graph Neural Networks (GNNs) have emerged as a transformative paradigm in machine learning and artificial intelligence. The ubiquitous presence of interconnected data in various domains, from social networks and biology to recommendation systems and cybersecurity, has fueled the rapid evolution of GNNs. These networks have displayed remarkable capabilities in modeling and understanding complex relationships, making them pivotal in solving real-world problems that traditional machine-learning models struggle to address. GNNs’ unique ability to capture intricate structural information inherent in graph-structured data is significant. This information often manifests as dependencies, connections, and contextual relationships essential for making informed predictions and decisions. Consequently, GNNs have been adopted and extended across various applications, redefining what is possible in machine learning.

In this comprehensive review, we embark on a journey through the multifaceted landscape of Graph Neural Networks, encompassing an array of critical aspects. Our study is motivated by the ever-increasing literature and diverse perspectives within the field. We aim to provide researchers, practitioners, and students with a holistic understanding of GNNs, serving as an invaluable resource to navigate the intricacies of this dynamic field. The scope of this review is extensive, covering fundamental concepts that underlie GNNs, various architectural designs, techniques for training and inference, prevalent challenges and limitations, the diversity of datasets utilized, and practical applications spanning a myriad of domains. Furthermore, we delve into the intriguing future directions that GNN research will likely explore, shedding light on the exciting possibilities.

In recent years, deep learning (DL) has been called the gold standard in machine learning (ML). It has also steadily evolved into the most widely used computational technique in ML, producing excellent results on various challenging cognitive tasks, sometimes even matching or outperforming human ability. One benefit of DL is its capacity to learn enormous amounts of data [ 1 ]. GNN variations such as graph convolutional networks (GCNs), graph attention networks (GATs), and GraphSAGE have shown groundbreaking performance on various deep learning tasks in recent years [ 2 ].

A graph is a data structure that consists of nodes (also called vertices) and edges. Mathematically, it is defined as G = (V, E), where V denotes the nodes and E denotes the edges. Edges in a graph can be directed or undirected based on whether directional dependencies exist between nodes. A graph can represent various data structures, such as social networks, knowledge graphs, and protein–protein interaction networks. Graphs are non-Euclidean spaces, meaning that the distance between two nodes in a graph is not necessarily equal to the distance between their coordinates in an Euclidean space. This makes applying traditional neural networks to graph data difficult, as they are typically designed for Euclidean data.

Graph neural networks (GNNs) are a type of deep learning model that can be used to learn from graph data. GNNs use a message-passing mechanism to aggregate information from neighboring nodes, allowing them to capture the complex relationships in graphs. GNNs are effective for various tasks, including node classification, link prediction, and clustering.

Organization of paper

The paper is organized as follows:

The primary focus of this research is to comprehensively examine Concepts, Architectures, Techniques, Challenges, Datasets, Applications, and Future Directions within the realm of Graph Neural Networks.

The paper delves into the Evolution and Motivation behind the development of Graph Neural Networks, including an analysis of the growth of publication counts over the years.

It provides an in-depth exploration of the Message Passing Mechanism used in Graph Neural Networks.

The study presents a concise summary of GNN learning styles and GNN models, complemented by an extensive literature review.

The paper thoroughly analyzes the Advantages and Limitations of GNN models when applied to various domains.

It offers a comprehensive overview of GNN applications, the datasets commonly used with GNNs, and the array of Python libraries that support GNN models.

In addition, the research identifies and addresses specific research gaps, outlining potential future directions in the field.

" Introduction " section describes the Introduction to GNN. " Background study " section provides background details in terms of the Evolution of GNN. " Research motivation " section describes the research motivation behind GNN. Section IV describes the GNN message-passing mechanism and the detailed description of GNN with its Structure, Learning Styles, and Types of tasks. " GNN Models and Comparative Analysis of GNN Models " section describes the GNN models with their literature review details and comparative study of different GNN models. " Graph Neural Network Applications " section describes the application of GNN. And finally, future direction and conclusions are defined in " Future Directions of Graph Neural Network " and " Conclusions " sections, respectively. Figure  1 gives the overall structure of the paper.

figure 1

The overall structure of the paper

Background study

As shown in Fig.  2 below, the evolution of GNNs started in 2005. For the past 5 years, research in this area has been going into great detail. Neural graph networks are being used by practically all researchers in fields such as NLP, computer vision, and healthcare.

figure 2

Year-wise publication count of GNN (2005–2022)

Graph neural network research evolution

Graph neural networks (GNNs) were first proposed in 2005, but only recently have they begun to gain traction. GNNs were first introduced by Gori [2005] and Scarselli [2004, 2009]. A node's attributes and connected nodes in the graph serve as its natural definitions. A GNN aims to learn a state embedding h v ε R s that encapsulates each node's neighborhood data. The distribution of the expected node label is one example of the output. An s-dimension vector of node v, the state embedding h v , can be utilized to generate an output O v , such as the anticipated distribution node name. The predicted node label (O v ) distribution is created using the state embedding h v [ 30 ]. Thomas Kipf and Max Welling introduced the convolutional graph network (GCN) in 2017. A GCN layer defines a localized spectral filter's first-order approximation on graphs. GCNs can be thought of as convolutional neural networks that have been expanded to handle graph-structured data.

Graph neural network evolution

As shown in Fig.  3 below, research on graph neural networks (GNNs) began in 2005 and is still ongoing. GNNs can define a broader class of graphs that can be used for node-focused tasks, edge-focused tasks, graph-focused tasks, and many other applications. In 2005, Marco Gori introduced the concept of GNNs and defined recursive neural networks extended by GNNs [ 4 ]. Franco Scarselli also explained the concepts for ranking web pages with the help of GNNs in 2005 [ 5 ]. In 2006, Swapnil Gandhi and Anand Padmanabha Iyer of Microsoft Research introduced distributed deep graph learning at scale, which defines a deep graph neural network [ 6 ]. They explained new concepts such as GCN, GAT, etc. [ 1 ]. Pucci and Gori used GNN concepts in the recommendation system.

figure 3

Graph Neural Network Evolution

2007 Chun Guang Li, Jun Guo, and Hong-gang Zhang used a semi-supervised learning concept with GNNs [ 7 ]. They proposed a pruning method to enhance the basic GNN to resolve the problem of choosing the neighborhood scale parameter. In 2008, Ziwei Zhang introduced a new concept of Eigen-GNN [ 8 ], which works well with several GNN models. In 2009, Abhijeet V introduced the GNN concept in fuzzy networks [ 9 ], proposing a granular reflex fuzzy min–max neural network for classification. In 2010, DK Chaturvedi explained the concept of GNN for soft computing techniques [ 10 ]. Also, in 2010, GNNs were widely used in many applications. In 2010, Tanzima Hashem discussed privacy-preserving group nearest neighbor queries [ 11 ]. The first initiative to use GNNs for knowledge graph embedding is R-GCN, which suggests a relation-specific transformation in the message-passing phases to deal with various relations.

Similarly, from 2011 to 2017, all authors surveyed a new concept of GNNs, and the survey linearly increased from 2018 onwards. Our paper shows that GNN models such as GCN, GAT, RGCN, and so on are helpful [ 12 ].

Literature review

In the Table  1 describe the literature survey on graph neural networks, including the application area, the data set used, the model applied, and performance evaluation. The literature is from the years 2018 to 2023.

Research motivation

We employ grid data structures for normalization of image inputs, typically using an n*n-sized filter. The result is computed by applying an aggregation or maximum function. This process works effectively due to the inherent fixed structure of images. We position the grid over the image, move the filter across it, and derive the output vector as depicted on the left side of Fig.  4 . In contrast, this approach is unsuitable when working with graphs. Graphs lack a predefined structure for data storage, and there is no inherent knowledge of node-to-neighbor relationships, as illustrated on the right side of Fig.  4 . To overcome this limitation, we focus on graph convolution.

figure 4

CNN In Euclidean Space (Left), GNN In Euclidean Space (Right)

In the context of GCNs, convolutional operations are adapted to handle graphs’ irregular and non-grid-like structures. These operations typically involve aggregating information from neighboring nodes to update the features of a central node. CNNs are primarily used for grid-like data structures, such as images. They are well-suited for tasks where spatial relationships between neighboring elements are crucial, as in image processing. CNNs use convolutional layers to scan small local receptive fields and learn hierarchical representations. GNNs are designed for graph-structured data, where edges connect entities (nodes). Graphs can represent various relationships, such as social networks, citation networks, or molecular structures. GNNs perform operations that aggregate information from neighboring nodes to update the features of a central node. CNNs excel in processing grid-like data with spatial dependencies; GNNs are designed to handle graph-structured data with complex relationships and dependencies between entities.

Limitation of CNN over GNN

Graph Neural Networks (GNNs) draw inspiration from Convolutional Neural Networks (CNNs). Before delving into the intricacies of GNNs, it is essential to understand why Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) may not suffice for effectively handling data structured as graphs. As illustrated in Fig.  5 , Convolutional Neural Networks (CNNs) are designed for data that exhibits a grid structure, such as images. Conversely, Recurrent Neural Networks (RNNs) are tailored to sequences, like text.

figure 5

Convolution can be performed if the input is an image using an n*n mask (Left). Convolution can't be achieved if the input is a graph using an n*n mask. (Right)

Typically, we use arrays for storage when working with text data. Likewise, for image data, matrices are the preferred choice. However, as depicted in Fig.  5 , arrays and matrices fall short when dealing with graph data. In the case of graphs, we require a specialized technique known as Graph Convolution. This approach enables deep neural networks to handle graph-structured data directly, leading to a graph neural network.

Fig. 5 illustrates that we can employ masking techniques and apply filtering operations to transform the data into vector form when we have images. Conversely, traditional masking methods are not applicable when dealing with graph data as input, as shown in the right image.

Graph neural network

Graph Neural Networks, or GNNs, are a class of neural networks tailored for handling data organized in graph structures. Graphs are mathematical representations of nodes connected by edges, making them ideal for modeling relationships and dependencies in complex systems. GNNs have the inherent ability to learn and reason about graph-structured data, enabling diverse applications. In this section, we first explained the passing mechanism of GNN (" Message Passing Mechanism in Graph Neural Network Section "), then described graphs related to the structure of graphs, graph types, and graph learning styles (" Description of GNN Taxonomy " Section).

Message passing mechanism in graph neural network

Graph symmetries are maintained using a GNN, an optimizable transformation on all graph properties (nodes, edges, and global context) (permutation invariances). Because a GNN does not alter the connectivity of the input graph, the output may be characterized using the same adjacency list and feature vector count as the input graph. However, the output graph has updated embeddings because the GNN modified each node, edge, and global-context representation.

In Fig. 6 , circles are nodes, and empty boxes show aggregation of neighbor/adjacent nodes. The model aggregates messages from A's local graph neighbors (i.e., B, C, and D). In turn, the messages coming from neighbors are based on information aggregated from their respective neighborhoods, and so on. This visualization shows a two-layer version of a message-passing model. Notice that the computation graph of the GNN forms a tree structure by unfolding the neighborhood around the target node [ 17 ]. Graph neural networks (GNNs) are neural models that capture the dependence of graphs via message passing between the nodes of graphs [ 30 ].

figure 6

How a single node aggregates messages from its adjacent neighbor nodes

The message-passing mechanism of Graph Neural Networks is shown in Fig. 7 . In this, we take an input graph with a set of node features X ε R d ⇥ |V| and Use this knowledge to produce node embeddings z u . However, we will also review how the GNN framework may embed subgraphs and whole graphs.

figure 7

Message passing mechanism in GNN

At each iteration, each node collects information from the neighborhood around it. Each node embedding has more data from distant reaches of the graph as these iterations progress. After the first iteration (k = 1), each node embedding expressly retains information from its 1-hop neighborhood, which may be accessed via a path in the length graph 1. [ 31 ]. After the second iteration (k = 2), each node embedding contains data from its 2-hop neighborhood; generally, after k iterations, each node embedding includes data from its k-hop setting. The kind of “information” this message passes consists of two main parts: structural information about the graph (i.e., degree of nodes, etc.), and the other is feature-based.

In the message-passing mechanism of a neural network, each node has its message stored in the form of feature vectors, and each time, the neighbor updates the information in the form of the feature vector [ 1 ]. This process aggregates the information, which means the grey node is connected to the blue node. Both features are aggregated and form new feature vectors by updating the values to include the new message.

Equations  4.1 and 4.2 shows that h denotes the message, u represents the node number, and k indicates the iteration number. Where AGGREGATE and UPDATE are arbitrarily differentiable functions (i.e., neural networks), and mN(u) is the “message,” which is aggregated from u's graph neighborhood N(u). We employ superscripts to identify the embeddings and functions at various message-passing iterations. The AGGREGATE function receives as input the set of embeddings of the nodes in the u's graph neighborhood N (u) at each iteration k of the GNN and generates a message. \({m}_{N(u)}^{k}\) . Based on this aggregated neighborhood information. The update function first UPDATES the message and then combines the message. \({m}_{N(u)}^{k}\) with the previous message \({h}_{u}^{(k-1)}\) of node, u to generate the updated message \({h}_{u}^{k}\) .

Description of GNN taxonomy

We can see from Fig. 8 below shows that we have divided our GNN taxonomy into 3 parts [ 30 ].

figure 8

Graph Neural Network Taxonomy

1. Graph Structures 2. Graph Types 3. Graph Learning Tasks

Graph structure

The two scenarios shown in Fig. 9 typically present are structural and non-structural. Applications involving molecular and physical systems, knowledge graphs, and other objects explicitly state the graph structure in structural contexts.

figure 9

Graph Structure

Graphs are implicit in non-structural situations. As a result, we must first construct the graph from the current task. For text, we must build a fully connected “a word” graph and a scene graph for images.

Graph types

There may be more information about nodes and links in complex graph types. Graphs are typically divided into 5 categories, as shown in Fig.  10 .

figure 10

Types of Graphs

Directed/undirected graphs

A directed graph is characterized by edges with a specific direction, indicating the flow from one node to another. Conversely, in an undirected graph, the edges lack a designated direction, allowing nodes to interact bidirectionally. As illustrated in Fig. 11 (left side), the directed graph exhibits directed edges, while in Fig. 11 (right side), the undirected graph conspicuously lacks directional edges. In undirected graphs, it's important to note that each edge can be considered to comprise two directed edges, allowing for mutual interaction between connected nodes.

figure 11

Directed/Undirected Graph

Static/dynamic graphs

The term “dynamic graph” pertains to a graph in which the properties or structure of the graph change with time. In dynamic graphs shown in Fig. 12 , it is essential to account for the temporal dimension appropriately. These dynamic graphs represent time-dependent events, such as the addition and removal of nodes and edges, typically presented as an ordered sequence or an asynchronous stream.

A noteworthy example of a dynamic graph can be observed in social networks like Twitter. In such networks, a new node is created each time a new user joins, and when a user follows another individual, a following edge is established. Furthermore, when users update their profiles, the respective nodes are also modified, reflecting the evolving nature of the graph. It's worth noting that different deep-learning libraries handle graph dynamics differently. TensorFlow, for instance, employs a static graph, while PyTorch utilizes a dynamic graph.

figure 12

Static/Dynamic Graph

Homogeneous/heterogeneous graphs

Homogeneous graphs have only one type of node and one type of edge shown in Fig. 13 (Left). A homogeneous graph is one with the same type of nodes and edges, such as an online social network with friendship as edges and nodes representing people. In homogeneous networks, nodes and edges have the same types.

Heterogeneous graphs shown in Fig. 13 (Right) , however, have two or more different kinds of nodes and edges. A heterogeneous network is an online social network with various edges between nodes of the ‘person’ type, such as ‘friendship’ and ‘co-worker.’ Nodes and edges in heterogeneous graphs come in several varieties. Types of nodes and edges play critical functions in heterogeneous networks that require further consideration.

figure 13

Homogeneous (Left), Heterogeneous (Right) Graph

Knowledge graphs

An array of triples in the form of (h, r, t) or (s, r, o) can be represented as a Knowledge Graph (KG), which is a network of entity nodes and relationship edges, with each triple (h, r, t) representing a single entity node. The relationship between an entity’s head (h) and tail (t) is denoted by the r. Knowledge Graph can be considered a heterogeneous graph from this perspective. The Knowledge Graph visually depicts several real-world objects and their relationships [ 32 ]. It can be used for many new aspects, including information retrieval, knowledge-guided innovation, and answering questions [ 30 ]. Entities are objects or things that exist in the real world, including individuals, organizations, places, music tracks, movies, and people. Each relation type describes a particular relationship between various elements similarly. We can see from Fig. 14 the Knowledge graph for Mr. Sundar Pichai.

figure 14

Knowledge graph

Transductive/inductive graphs

In a transductive scenario shown in Fig. 15 (up), the entire graph is input, the label of the valid data is hidden, and finally, the label for the correct data is predicted. However, with an inductive graph shown in Fig. 15 (down), we also input the entire graph (but only sample to batch), mask the valid data’s label, and forecast the valuable data’s label. The model must forecast the labels of the given unlabeled nodes in a transductive context. In the inductive situation, it is possible to infer new unlabeled nodes from the same distribution.

figure 15

Transductive/Inductive Graphs

Transductive Graph:

In the transductive approach, the entire graph is provided as input.

This method involves concealing the labels of the valid data.

The primary objective is to predict the labels for the valid data.

Inductive Graph:

The inductive approach still uses the complete graph, but only a sample within a batch is considered.

A crucial step in this process is masking the labels of the valid data.

The key aim here is to make predictions for the labels of the valid data.

Graph learning tasks

We perform three tasks with graphs: node classification, link prediction, and Graph Classification shown in Fig. 16 .

figure 16

Node Level Prediction (e.g., social network) (LEFT), Edge Level Prediction (e.g., Next YouTube Video?) (MIDDLE), Graph Level Prediction (e.g., molecule) (Right)

Node-level task

Node-level tasks are primarily concerned with determining the identity or function of each node within a graph. The core objective of a node-level task is to predict specific properties associated with individual nodes. For example, a node-level task in social networks could involve predicting which social group a new member is likely to join based on their connections and the characteristics of their friends' memberships. Node-level tasks are typically used when working with unlabeled data, such as identifying whether a particular individual is a smoker.

Edge-level task (link prediction)

Edge-level tasks revolve around analyzing relationships between pairs of nodes in a graph. An illustrative application of an edge-level task is assessing the compatibility or likelihood of a connection between two entities, as seen in matchmaking or dating apps. Another instance of an edge-level task is evident when using platforms like Netflix, where the task involves predicting the following video to be recommended based on viewing history and user preferences.

Graph-level

In graph-level tasks, the objective is to make predictions about a characteristic or property that encompasses the entire graph. For example, using a graph-based representation, one might aim to predict attributes like the olfactory quality of a molecule or its potential to bind with a disease-associated receptor. The essence of a graph-level task is to provide predictions that pertain to the graph as a whole. For instance, when assessing a newly synthesized chemical compound, a graph-level task might seek to determine whether the molecule has the potential to be an effective drug. The summary of all three learning tasks are shown in Fig. 17 .

figure 17

Graph Learning Tasks Summary

GNN models and comparative analysis of GNN models

Graph Neural Network (GNN) models represent a category of neural networks specially crafted to process data organized in graph structures. They've garnered substantial acclaim across various domains, primarily due to their exceptional capability to grasp intricate relationships and patterns within graph data. As illustrated in Fig.  18 , we've outlined three distinct GNN models. A comprehensive description of these GNN models, specifically Graph Convolutional Networks (GCN), Graph Attention Networks (GAT/GAN), and GraphSAGE models can be found in the reference [ 33 ]. In Sect. " GNN models ", we delve into these GNN models' intricacies; in " Comparative Study of GNN Models " section, we provide an in-depth analysis that explores their theoretical and practical aspects.

figure 18

Graph convolution neural network (GCN)

GCN is one of the basic graph neural network variants. Thomas Kipf and Max Welling developed GCN networks. Convolution layers in Convolutional Neural Networks are essentially the same process as 'convolution' in GCNs. The input neurons are multiplied by weights called filters or kernels. The filters act as a sliding window across the image, allowing CNN to learn information from nearby cells. Weight sharing uses the same filter within the same layer throughout the image; when CNN is used to identify photos of cats vs. non-cats, the same filter is employed in the same layer to detect the cat's nose and ears. Throughout the image, the same weight (or kernel or filter in CNNs) is applied [ 33 ]. GCNs were first introduced in “Spectral Networks and Deep Locally Connected Networks on Graphs” [ 34 ].

GCNs, which learn features by analyzing neighboring nodes, carry out similar behaviors. The primary difference between CNNs and GNNs is that CNNs are made to operate on regular (Euclidean) ordered data. GNNs, on the other hand, are a generalized version of CNNs with different numbers of node connections and unordered nodes (irregular on non-Euclidean structured data). GCNs have been applied to solve many problems, for example, image classification [ 35 ], traffic forecasting [ 36 ], recommendation systems [ 17 ], scene graph generation [ 37 ], and visual question answering [ 38 ].

GCNs are particularly well-suited for tasks that involve data represented as graphs, such as social networks, citation networks, recommendation systems, and more. These networks are an extension of traditional CNNs, widely used for tasks involving grid-like data, such as images. The key idea behind GCNs is to perform convolution operations on the graph data. This enables them to capture and propagate information through the nodes in a graph by considering both a node’s features and those of its neighboring nodes. GCNs typically consist of several layers, each performing convolution and aggregation steps to refine the node representations in the graph. By applying these layers iteratively, GCNs can capture complex patterns and dependencies within the graph data.

Working of graph convolutional network

A Graph Convolutional Network (GCN) is a type of neural network architecture designed for processing and analyzing graph-structured data. GCNs work by aggregating and propagating information through the nodes in a graph. GCN works with the following steps shown in Fig.  19 :

Initialization:

figure 19

Working of GCN

Each node in the graph is associated with a feature vector. Depending on the application, these feature vectors can represent various attributes or characteristics of the nodes. For example, in a social network, each node might represent a user, and the features could include user profile information.

Convolution Operation:

The core of a GCN is the convolution operation, which is adapted from convolutional neural networks (CNNs). It aims to aggregate information from neighboring nodes. This is done by taking a weighted sum of the feature vectors of neighboring nodes. The graph's adjacency matrix determines the weights. The resulting aggregated information is a new feature vector for each node.

Weighted Aggregation:

The graph's adjacency matrix, typically after normalization, provides weights for the aggregation process. In this context, for a given node, the features of its neighboring nodes are scaled by the corresponding values within the adjacency matrix, and the outcomes are then accumulated. A precise mathematical elucidation of this aggregation step is described in " Equation of GCN " section.

Activation function and learning weights:

The aggregated features are typically passed through an activation function (e.g., ReLU) to introduce non-linearity. The weight matrix W used in the aggregation step is learned during training. This learning process allows the GCN to adapt to the specific graph and task it is designed for.

Stacking Layers:

GCNs are often used in multiple layers. This allows the network to capture more complex relationships and higher-level features in the graph. The output of one GCN layer becomes the input for the next, and this process is repeated for a predefined number of layers.

Task-Specific Output:

The final output of the GCN can be used for various graph-based tasks, such as node classification, link prediction, or graph classification, depending on the specific application.

Equation of GCN

The Graph Convolutional Network (GCN) is based on a message-passing mechanism that can be described using mathematical equations. The core equation of a superficial, first-order GCN layer can be expressed as follows: For a graph with N nodes, let's define the following terms:

Equation  5.1 depicts a GCN layer's design. The normalized graph adjacency matrix A' and the nodes feature matrix F serve as the layer's inputs. The bias vector b and the weight matrix W are trainable parameters for the layer.

When used with the design matrix, the normalized adjacency matrix effectively smoothes a node’s feature vector based on the feature vectors of its close graph neighbors. This matrix captures the graph structure. A’ is normalized to make each neighboring node’s contribution proportional to the network's connectivity.

The layer definition is finished by applying A'FW + b to an element-wise non-linear function, such as ReLU. The downstream node classification task requires deep neural architectures to learn a complicated hierarchy of node attributes. This layer's output matrix Z can be routed into another GCN layer or any other neural network layer to do this.

Summary of graph convolution neural network (GCN) is shown in Table 2 .

Graph attention network (gat/gan).

Graph Attention Network (GAT/GAN) is a new neural network that works with graph-structured data. It uses masked self-attentional layers to address the shortcomings of past methods that depended on graph convolutions or their approximations. By stacking layers, the process makes it possible (implicitly) to assign various nodes in a neighborhood different weights, allowing nodes to focus on the characteristics of their neighborhoods without having to perform an expensive matrix operation (like inversion) or rely on prior knowledge of the graph's structure. GAT concurrently tackles numerous significant limitations of spectral-based graph neural networks, making the model suitable for both inductive and transductive applications.

Working of GAT

The Graph Attention Network (GAT) is a neural network architecture designed for processing and analyzing graph-structured data shown in Fig. 20 . GATs are a variation of Graph Convolutional Networks (GCNs) that incorporate the concept of attention mechanisms. GAT/GAN works with the following steps shown in Fig.  21 .

figure 20

How attention Coefficients updates

As with other graph-based models, GAT starts with nodes in the graph, each associated with a feature vector. These features can represent various characteristics of the nodes.

Self-Attention Mechanism and Attention Computation:

GAT introduces an attention mechanism similar to what is used in sequence-to-sequence models in natural language processing. The attention mechanism allows each node to focus on different neighbors when aggregating information. It assigns different attention coefficients to the neighboring nodes, making the process more flexible. For each node in the graph, GAT computes attention scores for its neighboring nodes. These attention scores are based on the features of the central node and its neighbors. The attention scores are calculated using a weighted sum of the features of the central node and its neighbors.

The attention scores determine how much each neighbor’s feature contributes to the aggregation for the central node. This weighted aggregation is carried out for all neighboring nodes, resulting in a new feature vector for the central node.

Multiple Attention Heads and Output Combination:

GAT often employs multiple attention heads in parallel. Each attention head computes its attention scores and aggregation results. These multiple attention heads capture different aspects of the relationships in the graph. The outputs from the multiple attention heads are combined, typically by concatenation or averaging, to create a final feature vector for each node.

Learning Weights and Stacking Layers:

Similar to GCNs, GATs learn weight parameters during training. These weights are learned to optimize the attention mechanisms and adapt to the specific graph and task. GATs can be used in multiple layers to capture higher-level features and complex relationships in the graph. The output of one GAT layer becomes the input for the next layer.

The learning weights capture the importance of node relationships and contribute to information aggregation during the neighborhood aggregation process. The learning process in GNNs also relies on backpropagation and optimization algorithms. The stacking of GNN layers enables the model to capture higher-level abstractions and dependencies in the graph. Each layer refines the node representations based on information from the previous layer.

The final output of the GAT can be used for various graph-based tasks, such as node classification, link prediction, or graph classification, depending on the application.

Equation for GAT

GAT’s main distinctive feature is gathering data from the one-hop neighborhood [ 30 ]. A graph convolution operation in GCN produces the normalized sum of node properties of neighbors. Equation  5.2 shows the Graph attention network, which \({h}_{i}^{(l+1)}\) defines the current node output, \(\sigma\) denotes the non-linearity ReLU function, \(j\varepsilon N\left(i\right)\) one hop neighbor, \({\complement }_{i,j}\) normalized vector, \({W}^{\left(l\right)}\) weight matrix, and \({h}_{j}^{(l)}\) denotes the previous node.

Why is GAT better than GCN?

We learned from the Graph Convolutional Network (GCN) that integrating local graph structure and node-level features results in good node classification performance. The way GCN aggregates messages, on the other hand, is structure-dependent, which may limit its use.

How attention coefficients update: the attention layer has 4 parts: [ 47 ]

A linear transformation: A shared linear transformation is applied to each node in the following Equation.

where h is a set of node features. W is the weight matrix. Z is the output layer node.

Attention Coefficients: In the GAT paradigm, it is crucial because every node can now attend to every other node, discarding any structural information. The pair-wise un-normalized attention score between two neighbors is computed in the next step. It combines the 'z' embeddings of the two nodes. Where || stands for concatenation, a learnable weight vector a(l) is put through a dot product, and a LeakyReLU is used [ 1 ]. Contrary to the dot-product attention utilized in the Transformer model, this kind of attention is called additive attention. The nodes are subsequently subjected to self-attention.

Softmax: We utilize the softmax function to normalize the coefficients over all j values, improving their comparability across nodes.

Aggregation: This process is comparable to GCN. The neighborhood embeddings are combined and scaled based on the attention scores.

Summary of graph attention network (GAT) is shown in Table 3 .

GraphSAGE represents a tangible realization of an inductive learning framework shown in Fig. 22 . It exclusively considers training samples linked to the training set's edges during training. This process consists of two main steps: “Sampling” and “Aggregation.” Subsequently, the node representation vector is paired with the vector from the aggregated model and passed through a fully connected layer with a non-linear activation function. It's important to note that each network layer shares a standard aggregator and weight matrix. Thus, the consideration should be on the number of layers or weight matrices rather than the number of aggregators. Finally, a normalization step is applied to the layer's output.

Two major steps:

Sample It describes how to sample a large number of neighbors.

Aggregator refers to obtaining the neighbor node embedding and then determining how to collect these embeddings and change your embedding information.

figure 22

Working of Graph SAGE Method

Working of graphSAGE model:

First, initializes the eigenvectors of all nodes in the input graph

For each node, get its sampled neighbor nodes

The aggregation function is used to aggregate the information of neighbor nodes

And combined with embedding, Update the same by a non-linear transformation embedding Express.

Types of aggregators

In the GraphSAGE method, 4 types of Aggregators are used.

Simple neighborhood aggregator:

Mean aggregator

LSTM Aggregator: Applies LSTM to a random permutation of neighbors.

Pooling Aggregator: It applies a symmetric vector function and converts adjacent vectors.

Equation of graphSAGE

W k , B k : is learnable weight matrices.

\({W}_{k}{B}_{k}=\) is learnable wight matrices.

\({h}_{v}^{0}= {x}_{v}:initial 0-\) the layer embeddings are equal to node features.

\({h}_{u}^{k-1}=\) Generalized Aggregation.

\({z}_{v }= {h}_{v}^{k}n\) : embedding after k layers of neighborhood aggregation.

\(\sigma\) – non linearity (ReLU).

Summary of graphSAGE is shown in Table 4 .

Comparative study of gnn models, comparison based on practical implementation of gnn models.

Table 5 describes the dataset statistics for different datasets used in literature for graph type of input. The datasets are CORA, Citeseer, and Pubmed. These statistics provide information about the kind of dataset, the number of nodes and edges, the number of classes, the number of features, and the label rate for each dataset. These details are essential for understanding the characteristics and scale of the datasets used in the context of citation networks. Comparison of the GNN model with equation in shown in Fig.  23 .

figure 23

Equations of GNN Models

Table 6 shows the performance results of different Graph Neural Network (GNN) models on various datasets. Table 6 provides accuracy scores for other GNN models on different datasets. Additionally, the time taken for some models to compute results is indicated in seconds. This information is crucial for evaluating the performance of these models on specific datasets.

Comparison based on theoretical concepts of GNN models are described in Table 7 .

Graph neural network applications, graph construction.

Graph Neural Networks (GNNs) have a wide range of applications spanning diverse domains, which encompass modern recommender systems, computer vision, natural language processing, program analysis, software mining, bioinformatics, anomaly detection, and urban intelligence, among others. The fundamental prerequisite for GNN utilization is the transformation or representation of input data into a graph-like structure. In the realm of graph representation learning, GNNs excel in acquiring essential node or graph embeddings that serve as a crucial foundation for subsequent tasks [ 61 ].

The construction of a graph involves a two-fold process:

Graph creation and

Learning about graph representations

Graph Creation: The generation of graphs is essential for depicting the intricate relationships embedded within diverse incoming data. With the varied nature of input data, various applications adopt techniques to create meaningful graphs. This process is indispensable for effectively communicating the structural nuances of the data, ensuring the nodes and edges convey their semantic significance, particularly tailored to the specific task at hand.

Learning about graph representations: The subsequent phase involves utilizing the graph expression acquired from the input data. In GNN-based Learning for graph representations, some studies employ well-established GNN models like GraphSAGE, GCN, GAT, and GGNN, which offer versatility for various application tasks. However, when faced with specific tasks, it may be necessary to customize the GNN architecture to address particular challenges more effectively.

The different application which is considered a graph

Molecular Graphs: Atoms and electrons serve as the basic building blocks of matter and molecules, organized in three-dimensional structures. While all particles interact, we primarily acknowledge a covalent connection between two stable atoms when they are sufficiently spaced apart. Various atom-to-atom bond configurations exist, including single and double bonds. This three-dimensional arrangement is conveniently and commonly represented as a graph, with atoms representing nodes and covalent bonds representing edges [ 62 ].

Graphs of social networks: These networks are helpful research tools for identifying trends in the collective behavior of individuals, groups, and organizations. We may create a graph that represents groupings of people by visualizing individuals as nodes and their connections as edges [ 63 ].

Citation networks as graphs: When they publish papers, scientists regularly reference the work of other scientists. Each manuscript can be visualized as a node in a graph of these citation networks, with each directed edge denoting a citation from one publication to another. Additionally, we can include details about each document in each node, such as an abstract's word embedding [ 64 ].

Within computer vision: We may want to tag certain things in visual scenes. Then, we can construct graphs by treating these things as nodes and their connections as edges.

GNNs are used to model data as graphs, allowing for the capture of complex relationships and dependencies that traditional machine learning models may struggle to represent. This makes GNNs a valuable tool for tasks where data has an inherent graph structure or where modeling relationships is crucial for accurate predictions and analysis.

Graph neural networks (GNNs) applications in different fields

Nlp (natural language processing).

Document Classification: GNNs can be used to model the relationships between words or sentences in documents, allowing for improved document classification and information retrieval.

Text Generation: GNNs can assist in generating coherent and contextually relevant text by capturing dependencies between words or phrases.

Question Answering: GNNs can help in question-answering tasks by representing the relationships between question words and candidate answers within a knowledge graph.

Sentiment Analysis: GNNs can capture contextual information and sentiment dependencies in text, improving sentiment analysis tasks.

Computer vision

Image Segmentation: GNNs can be employed for pixel-level image segmentation tasks by modeling relationships between adjacent pixels as a graph.

Object Detection: GNNs can assist in object detection by capturing contextual information and relationships between objects in images.

Scene Understanding: GNNs are used for understanding complex scenes and modeling spatial relationships between objects in an image.

Bioinformatics

Protein-Protein Interaction Prediction: GNNs can be applied to predict interactions between proteins in biological networks, aiding in drug discovery and understanding disease mechanisms.

Genomic Sequence Analysis: GNNs can model relationships between genes or genetic sequences, helping in gene expression prediction and sequence classification tasks.

Drug Discovery: GNNs can be used for drug-target interaction prediction and molecular property prediction, which is vital in pharmaceutical research.

Table 8 offers a concise overview of various research papers that utilize Graph Neural Networks (GNNs) in diverse domains, showcasing the applications and contributions of GNNs in each study.

Table 9 highlights various applications of GNNs in Natural Language Processing, Computer Vision, and Bioinformatics domains, showcasing how GNN models are adapted and used for specific tasks within each field.

Future directions of graph neural network

The contribution of the existing literature to GNN principles, models, datasets, applications, etc., was the main emphasis of this survey. In this section, several potential future study directions are suggested. Significant challenges have been noted, including unbalanced datasets, the effectiveness of current methods, text classification, etc. We have also looked at the remedies to address these problems. We have suggested future and advanced directions to address these difficulties regarding domain adaptation, data augmentation, and improved classification. Table 10 displays future directions.

Imbalanced Datasets—Limited labeled data, domain-dependent data, and imbalanced data are currently issues with available datasets. Transfer learning and domain adaptation are solutions to these issues.

Accuracy of Existing Systems/Models—can utilize deep learning models such as GCN, GAT, and GraphSAGE approaches to increase the efficiency and precision of current systems. Additionally, training models on sizable, domain-specific datasets can enhance performance.

Enhancing Text Classification: Text classification poses another significant challenge, which is effectively addressed by leveraging advanced deep learning methodologies like graph neural networks, contributing to the improvement of text classification accuracy and performance.

The above Table  10 describes the research gaps and future directions presented in the above literature. These research gaps and future directions highlight the challenges and proposed solutions in the field of text classification and structural analysis.

Table 11 provides an overview of different research papers, their publication years, the applications they address, the graph structures they use, the graph types, the graph tasks, and the specific Graph Neural Network (GNN) models utilized in each study.

Conclusions

Graph Neural Networks (GNNs) have witnessed rapid advancements in addressing the unique challenges presented by data structured as graphs, a domain where conventional deep learning techniques, originally designed for images and text, often struggle to provide meaningful insights. GNNs offer a powerful and intuitive approach that finds broad utility in applications relying on graph structures. This comprehensive survey on GNNs offers an in-depth analysis covering critical aspects such as GNN fundamentals, the interplay with convolutional neural networks, GNN message-passing mechanisms, diverse GNN models, practical use cases, and a forward-looking perspective. Our central focus is on elucidating the foundational characteristics of GNNs, a field teeming with contemporary applications that continually enhance our comprehension and utilization of this technology.

The continuous evolution of GNN-based research has underscored the growing need to address issues related to graph analysis, which we aptly refer to as the frontiers of GNNs. In our exploration, we delve into several crucial recent research domains within the realm of GNNs, encompassing areas like link prediction, graph generation, and graph categorization, among others.

Availability of data and materials

Not applicable.

Abbreviations

Graph Neural Network

Graph Convolution Network

Graph Attention Networks

Natural Language Processing

Convolution Neural Networks

Recurrent Neural Networks

Machine Learning

Deep Learning

Knowledge Graph

Pucci A, Gori M, Hagenbuchner M, Scarselli F, Tsoi AC. Investigation into the application of graph neural networks to large-scale recommender systems, infona.pl, no. 32, no 4, pp. 17–26, 2006.

Mahmud FB, Rayhan MM, Shuvo MH, Sadia I, Morol MK. A comparative analysis of Graph Neural Networks and commonly used machine learning algorithms on fake news detection, Proc. - 2022 7th Int. Conf. Data Sci. Mach. Learn. Appl. CDMA 2022, pp. 97–102, 2022.

Cui L, Seo H, Tabar M, Ma F, Wang S, Lee D, Deterrent: Knowledge Guided Graph Attention Network for Detecting Healthcare Misinformation, Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., pp. 492–502, 2020.

Gori M, Monfardini G, Scarselli F, A new model for earning in raph domains, Proc. Int. Jt. Conf. Neural Networks, vol. 2, no. January 2005, pp. 729–734, 2005, https://doi.org/10.1109/IJCNN.2005.1555942 .

Scarselli F, Yong SL, Gori M, Hagenbuchner M, Tsoi AC, Maggini M. Graph neural networks for ranking web pages, Proc.—2005 IEEE/WIC/ACM Int. Web Intell. WI 2005, vol. 2005, no. January, pp. 666–672, 2005, doi: https://doi.org/10.1109/WI.2005.67 .

Gandhi S, Zyer AP, P3: Distributed deep graph learning at scale, Proc. 15th USENIX Symp. Oper. Syst. Des. Implementation, OSDI 2021, pp. 551–568, 2021.

Li C, Guo J, Zhang H. Pruning neighborhood graph for geodesic distance based semi-supervised classification, in 2007 International Conference on Computational Intelligence and Security (CIS 2007), 2007, pp. 428–432.

Zhang Z, Cui P, Pei J, Wang X, Zhu W, Eigen-gnn: A graph structure preserving plug-in for gnns, IEEE Trans. Knowl. Data Eng., 2021.

Nandedkar AV, Biswas PK. A granular reflex fuzzy min–max neural network for classification. IEEE Trans Neural Netw. 2009;20(7):1117–34.

Article   Google Scholar  

Chaturvedi DK, Premdayal SA, Chandiok A. Short-term load forecasting using soft computing techniques. Int’l J Commun Netw Syst Sci. 2010;3(03):273.

Google Scholar  

Hashem T, Kulik L, Zhang R. Privacy preserving group nearest neighbor queries, in Proceedings of the 13th International Conference on Extending Database Technology, 2010, pp. 489–500.

Sun Z et al. Knowledge graph alignment network with gated multi-hop neighborhood aggregation, in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, no. 01, pp. 222–229.

Zhang M, Chen Y. Link prediction based on graph neural networks. Adv Neural Inf Process Syst. 31, 2018.

Stanimirović PS, Katsikis VN, Li S. Hybrid GNN-ZNN models for solving linear matrix equations. Neurocomputing. 2018;316:124–34.

Stanimirović PS, Petković MD. Gradient neural dynamics for solving matrix equations and their applications. Neurocomputing. 2018;306:200–12.

Zhang C, Song D, Huang C, Swami A, Chawla NV. Heterogeneous graph neural network, in Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 793–803.

Fan W et al. Graph neural networks for social recommendation," in The world wide web conference, 2019, pp. 417–426.

Gui T et al. A lexicon-based graph neural network for Chinese NER," in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 1040–1050.

Qasim SR, Mahmood H, Shafait F. Rethinking table recognition using graph neural networks, in 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019, pp. 142–147

You J, Ying R, Leskovec J. Position-aware graph neural networks, in International conference on machine learning, 2019, pp. 7134–7143.

Cao D, et al. Spectral temporal graph neural network for multivariate time-series forecasting. Adv Neural Inf Process Syst. 2020;33:17766–78.

Xhonneux LP, Qu M, Tang J. Continuous graph neural networks. In International Conference on Machine Learning, 2020, pp. 10432–10441.

Zhou K, Huang X, Li Y, Zha D, Chen R, Hu X. Towards deeper graph neural networks with differentiable group normalization. Adv Neural Inf Process Syst. 2020;33:4917–28.

Gu F, Chang H, Zhu W, Sojoudi S, El Ghaoui L. Implicit graph neural networks. Adv Neural Inf Process Syst. 2020;33:11984–95.

Liu Y, Guan R, Giunchiglia F, Liang Y, Feng X. Deep attention diffusion graph neural networks for text classification. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 8142–8152.

Gasteiger J, Becker F, Günnemann S. Gemnet: universal directional graph neural networks for molecules. Adv Neural Inf Process Syst. 2021;34:6790–802.

Yao D et al. Deep hybrid: multi-graph neural network collaboration for hyperspectral image classification. Def. Technol. 2022.

Li Y, et al. Research on multi-port ship traffic prediction method based on spatiotemporal graph neural networks. J Mar Sci Eng. 2023;11(7):1379.

Djenouri Y, Belhadi A, Srivastava G, Lin JC-W. Hybrid graph convolution neural network and branch-and-bound optimization for traffic flow forecasting. Futur Gener Comput Syst. 2023;139:100–8.

Zhou J, et al. Graph neural networks: a review of methods and applications. AI Open. 2020;1(January):57–81. https://doi.org/10.1016/j.aiopen.2021.01.001 .

Rong Y, Huang W, Xu T, Huang J. Dropedge: Towards deep graph convolutional networks on node classification. arXiv preprint arXiv:1907.10903. 2019.

Abu-Salih B, Al-Qurishi M, Alweshah M, Al-Smadi M, Alfayez R, Saadeh H. Healthcare knowledge graph construction: a systematic review of the state-of-the-art, open issues, and opportunities. J Big Data. 2023;10(1):81.

Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. arXiv Prepr. arXiv1609.02907, 2016.

Berg RV, Kipf TN, Welling M. Graph Convolutional Matrix Completion. 2017, http://arxiv.org/abs/1706.02263

Monti F, Boscaini D, Masci J, Rodola E, Svoboda J, Bronstein MM. Geometric deep learning on graphs and manifolds using mixture model cnns. InProceedings of the IEEE conference on computer vision and pattern recognition 2017 (pp. 5115-5124).

Cui Z, Henrickson K, Ke R, Wang Y. Traffic graph convolutional recurrent neural network: a deep learning framework for network-scale traffic learning and forecasting. IEEE Trans Intell Transp Syst. 2020;21(11):4883–94. https://doi.org/10.1109/TITS.2019.2950416 .

Yang J, Lu J, Lee S, Batra D, Parikh D. Graph r-cnn for scene graph generation. InProceedings of the European conference on computer vision (ECCV) 2018 (pp. 670-685). https://doi.org/10.1007/978-3-030-01246-5_41 .

Teney D, Liu L, van Den Hengel A. Graph-structured representations for visual question answering. InProceedings of the IEEE conference on computer vision and pattern recognition 2017 (pp. 1-9). https://doi.org/10.1109/CVPR.2017.344 .

Yao L, Mao C, Luo Y. Graph convolutional networks for text classification. Proc AAAI Conf Artif Intell. 2019;33(01):7370–7.

De Cao N, Aziz W, Titov I. Question answering by reasoning across documents with graph convolutional networks. arXiv Prepr. arXiv1808.09920, 2018.

Gao H, Wang Z, Ji S. Large-scale learnable graph convolutional networks. in Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 2018, pp. 1416–1424.

Hu F, Zhu Y, Wu S, Wang L, Tan T. Hierarchical graph convolutional networks for semi-supervised node classification. arXiv Prepr. arXiv1902.06667, 2019.

Lange O, Perez L. Traffic prediction with advanced graph neural networks. DeepMind Research Blog Post, https://deepmind.google/discover/blog/traffic-prediction-with-advanced-graph-neural-networks/ . 2020.

Duan C, Hu B, Liu W, Song J. Motion capture for sporting events based on graph convolutional neural networks and single target pose estimation algorithms. Appl Sci. 2023;13(13):7611.

Balcıoğlu YS, Sezen B, Çerasi CC, Huang SH. machine design automation model for metal production defect recognition with deep graph convolutional neural network. Electronics. 2023;12(4):825.

Baghbani A, Bouguila N, Patterson Z. Short-term passenger flow prediction using a bus network graph convolutional long short-term memory neural network model. Transp Res Rec. 2023;2677(2):1331–40.

Velickovic P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y. Graph attention networks. Stat. 2017;1050(20):10–48550.

Hamilton W, Ying Z, Leskovec J. Inductive representation learning on large graphs. Advances in neural information processing systems. 2017; 30.

Ye Y, Ji S. Sparse graph attention networks. IEEE Trans Knowl Data Eng. 2021;35(1):905–16.

MathSciNet   Google Scholar  

Chen Z et al. Graph neural network-based fault diagnosis: a review. arXiv Prepr. arXiv2111.08185, 2021.

Brody S, Alon U, Yahav E. How attentive are graph attention networks? arXiv Prepr. arXiv2105.14491, 2021.

Huang J, Shen H, Hou L, Cheng X. Signed graph attention networks," in International Conference on Artificial Neural Networks. 2019, pp. 566–577.

Seraj E, Wang Z, Paleja R, Sklar M, Patel A, Gombolay M. Heterogeneous graph attention networks for learning diverse communication. arXiv preprint arXiv: 2108.09568. 2021.

Zhang Y, Wang X, Shi C, Jiang X, Ye Y. Hyperbolic graph attention network. IEEE Transactions on Big Data. 2021;8(6):1690–701.

Yang X, Ma H, Wang M. Research on rumor detection based on a graph attention network with temporal features. Int J Data Warehous Min. 2023;19(2):1–17.

Lan W, et al. KGANCDA: predicting circRNA-disease associations based on knowledge graph attention network. Brief Bioinform. 2022;23(1):bbab494.

Xiao L, Wu X, Wang G, 2019, December. Social network analysis based on graph SAGE. In 2019 12th international symposium on computational intelligence and design (ISCID) (Vol. 2, pp. 196–199). IEEE.

Chang L, Branco P. Graph-based solutions with residuals for intrusion detection: The modified e-graphsage and e-resgat algorithms. arXiv preprint arXiv:2111.13597. 2021.

Oh J, Cho K, Bruna J. Advancing graphsage with a data-driven node sampling. arXiv preprint arXiv:1904.12935. 2019.

Kapoor M, Patra S, Subudhi BN, Jakhetiya V, Bansal A. Underwater Moving Object Detection Using an End-to-End Encoder-Decoder Architecture and GraphSage With Aggregator and Refactoring. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023 (pp. 5635-5644).

Bhatti UA, Tang H, Wu G, Marjan S, Hussain A. Deep learning with graph convolutional networks: An overview and latest applications in computational intelligence. Int J Intell Syst. 2023;2023:1–28.

David L, Thakkar A, Mercado R, Engkvist O. Molecular representations in AI-driven drug discovery: a review and practical guide. J Cheminform. 2020;12(1):1–22.

Davies A, Ajmeri N. Realistic Synthetic Social Networks with Graph Neural Networks. arXiv preprint arXiv:2212.07843. 2022; 15.

Frank MR, Wang D, Cebrian M, Rahwan I. The evolution of citation graphs in artificial intelligence research. Nat Mach Intell. 2019;1(2):79–85.

Gao C, Wang X, He X, Li Y. Graph neural networks for recommender system. InProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining 2022 (pp. 1623-1625).

Wu S, Sun F, Zhang W, Xie X, Cui B. Graph neural networks in recommender systems: a survey. ACM Comput Surv. 2022;55(5):1–37.

Wu L, Chen Y, Shen K, Guo X, Gao H, Li S, Pei J, Long B. Graph neural networks for natural language processing: a survey. Found Trends Mach Learn. 2023;16(2):119–328.

Wu L, Chen Y, Ji H, Liu B. Deep learning on graphs for natural language processing. InProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval 2021 (pp. 2651-2653).

Liu X, Su Y, Xu B. The application of graph neural network in natural language processing and computer vision. In2021 3rd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI) 2021 (pp. 708-714).

Harmon SHE, Faour DE, MacDonald NE. Mandatory immunization and vaccine injury support programs: a survey of 28 GNN countries. Vaccine. 2021;39(49):7153–7.

Yan W, Zhang Z, Zhang Q, Zhang G, Hua Q, Li Q. Deep data analysis-based agricultural products management for smart public healthcare. Front Public Health. 2022;10:847252.

Hamaguchi T, Oiwa H, Shimbo M, Matsumoto Y. Knowledge transfer for out-of-knowledge-base entities: a graph neural network approach. arXiv preprint arXiv:1706.05674. 2017.

Dai D, Zheng H, Luo F, Yang P, Chang B, Sui Z. Inductively representing out-of-knowledge-graph entities by optimal estimation under translational assumptions. arXiv preprint arXiv:2009.12765.

Pradhyumna P, Shreya GP. Graph neural network (GNN) in image and video understanding using deep learning for computer vision applications. In2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC) 2021 (pp. 1183-1189).

Shi W, Rajkumar R. Point-gnn: Graph neural network for 3d object detection in a point cloud. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition 2020 (pp. 1711-1719).

Wu Y, Dai HN, Tang H. Graph neural networks for anomaly detection in industrial internet of things. IEEE Int Things J. 2021;9(12):9214–31.

Pitsik EN, et al. The topology of fMRI-based networks defines the performance of a graph neural network for the classification of patients with major depressive disorder. Chaos Solitons Fractals. 2023;167: 113041.

Liao W, Zeng B, Liu J, Wei P, Cheng X, Zhang W. Multi-level graph neural network for text sentiment analysis. Comput Electr Eng. 2021;92: 107096.

Kumar VS, Alemran A, Karras DA, Gupta SK, Dixit CK, Haralayya B. Natural Language Processing using Graph Neural Network for Text Classification. In2022 International Conference on Knowledge Engineering and Communication Systems (ICKES) 2022; (pp. 1-5).

Dara S, Srinivasulu CH, Babu CM, Ravuri A, Paruchuri T, Kilak AS, Vidyarthi A. Context-Aware auto-encoded graph neural model for dynamic question generation using NLP. ACM transactions on asian and low-resource language information processing. 2023.

Wu L, Cui P, Pei J, Zhao L, Guo X. Graph neural networks: foundation, frontiers and applications. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2022; (pp. 4840-4841).

Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G. The graph neural network model. IEEE Trans neural networks. 2008;20(1):61–80.

Cao P, Zhu Z, Wang Z, Zhu Y, Niu Q. Applications of graph convolutional networks in computer vision. Neural Comput Appl. 2022;34(16):13387–405.

You R, Yao S, Mamitsuka H, Zhu S. DeepGraphGO: graph neural network for large-scale, multispecies protein function prediction. Bioinformatics. 2021;37(Supplement_1):i262-71.

Long Y, et al. Pre-training graph neural networks for link prediction in biomedical networks. Bioinformatics. 2022;38(8):2254–62.

Wu Y, Gao M, Zeng M, Zhang J, Li M. BridgeDPI: a novel graph neural network for predicting drug–protein interactions. Bioinformatics. 2022;38(9):2571–8.

Kang C, Zhang H, Liu Z, Huang S, Yin Y. LR-GNN: a graph neural network based on link representation for predicting molecular associations. Briefings Bioinf. 2022;23(1):bbab513.

Wei X, Huang H, Ma L, Yang Z, Xu L. Recurrent Graph Neural Networks for Text Classification. in 2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS), 2020, pp. 91–97.

Schlichtkrull MS, De Cao N, Titov I. Interpreting graph neural networks for nlp with differentiable edge masking. arXiv Prepr. arXiv2010.00577, 2020.

Tu M, Huang J, He X, Zhou B. Graph sequential network for reasoning over sequences. arXiv Prepr. arXiv2004.02001, 2020.

Download references

Acknowledgements

I am grateful to all of those with whom I have had the pleasure to work during this research work. Each member has provided me extensive personal and professional guidance and taught me a great deal about scientific research and life in general.

This work was supported by the Research Support Fund (RSF) of Symbiosis International (Deemed University), Pune, India.

Author information

Authors and affiliations.

Symbiosis Institute of Technology Pune Campus, Symbiosis International (Deemed University) (SIU), Lavale, Pune, 412115, India

Bharti Khemani

Symbiosis Centre for Applied Artificial Intelligence (SCAAI), Symbiosis Institute of Technology Pune Campus, Symbiosis International (Deemed University) (SIU), Lavale, Pune, 412115, India

Shruti Patil & Ketan Kotecha

IEEE, Department of Computer Science and Engineering, Institute of Technology, Nirma University, Ahmedabad, India

Sudeep Tanwar

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization, BK and SP; methodology, BK and SP; software, BK; validation, BK, SP, KK; formal analysis, BK; investigation, BK; resources, BK; data curation, BK and SP; writing—original draft preparation, BK; writing—review and editing, SP, KK, and ST; visualization, BK; supervision, SP; project administration, SP, ST; funding acquisition, KK.

Corresponding author

Correspondence to Shruti Patil .

Ethics declarations

Ethics approval and consent to participate.

Not applicable

Consent for publication

Competing interests.

The authors declare that they have no competing interests .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

See Tables  12 and 13

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Khemani, B., Patil, S., Kotecha, K. et al. A review of graph neural networks: concepts, architectures, techniques, challenges, datasets, applications, and future directions. J Big Data 11 , 18 (2024). https://doi.org/10.1186/s40537-023-00876-4

Download citation

Received : 28 June 2023

Accepted : 27 December 2023

Published : 16 January 2024

DOI : https://doi.org/10.1186/s40537-023-00876-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Graph Neural Network (GNN)
  • Graph Convolution Network (GCN)
  • Graph Attention Networks (GAT)
  • Message Passing Mechanism
  • Natural Language Processing (NLP)

recent research paper on neural network

recent research paper on neural network

Transformer: A Novel Neural Network Architecture for Language Understanding

August 31, 2017

Posted by Jakob Uszkoreit, Software Engineer, Natural Language Understanding

Neural networks, in particular recurrent neural networks (RNNs), are now at the core of the leading approaches to language understanding tasks such as language modeling , machine translation and question answering . In “ Attention Is All You Need ”, we introduce the Transformer, a novel neural network architecture based on a self-attention mechanism that we believe to be particularly well suited for language understanding.

In our paper, we show that the Transformer outperforms both recurrent and convolutional models on academic English to German and English to French translation benchmarks. On top of higher translation quality, the Transformer requires less computation to train and is a much better fit for modern machine learning hardware, speeding up training by up to an order of magnitude.

Accuracy and Efficiency in Language Understanding

Neural networks usually process language by generating fixed- or variable-length vector-space representations. After starting with representations of individual words or even pieces of words, they aggregate information from surrounding words to determine the meaning of a given bit of language in context. For example, deciding on the most likely meaning and appropriate representation of the word “bank” in the sentence “I arrived at the bank after crossing the…” requires knowing if the sentence ends in “... road.” or “... river.”

RNNs have in recent years become the typical network architecture for translation, processing language sequentially in a left-to-right or right-to-left fashion. Reading one word at a time, this forces RNNs to perform multiple steps to make decisions that depend on words far away from each other. Processing the example above, an RNN could only determine that “bank” is likely to refer to the bank of a river after reading each word between “bank” and “river” step by step. Prior research has shown that, roughly speaking, the more such steps decisions require, the harder it is for a recurrent network to learn how to make those decisions.

The sequential nature of RNNs also makes it more difficult to fully take advantage of modern fast computing devices such as TPUs and GPUs, which excel at parallel and not sequential processing. Convolutional neural networks (CNNs) are much less sequential than RNNs, but in CNN architectures like ByteNet or ConvS2S the number of steps required to combine information from distant parts of the input still grows with increasing distance.

The Transformer

In contrast, the Transformer only performs a small, constant number of steps (chosen empirically). In each step, it applies a self-attention mechanism which directly models relationships between all words in a sentence, regardless of their respective position. In the earlier example “I arrived at the bank after crossing the river”, to determine that the word “bank” refers to the shore of a river and not a financial institution, the Transformer can learn to immediately attend to the word “river” and make this decision in a single step. In fact, in our English-French translation model we observe exactly this behavior.

More specifically, to compute the next representation for a given word - “bank” for example - the Transformer compares it to every other word in the sentence. The result of these comparisons is an attention score for every other word in the sentence. These attention scores determine how much each of the other words should contribute to the next representation of “bank”. In the example, the disambiguating “river” could receive a high attention score when computing a new representation for “bank”. The attention scores are then used as weights for a weighted average of all words’ representations which is fed into a fully-connected network to generate a new representation for “bank”, reflecting that the sentence is talking about a river bank.

The animation below illustrates how we apply the Transformer to machine translation. Neural networks for machine translation typically contain an encoder reading the input sentence and generating a representation of it. A decoder then generates the output sentence word by word while consulting the representation generated by the encoder. The Transformer starts by generating initial representations, or embeddings, for each word. These are represented by the unfilled circles. Then, using self-attention, it aggregates information from all of the other words, generating a new representation per word informed by the entire context, represented by the filled balls. This step is then repeated multiple times in parallel for all words, successively generating new representations.

The decoder operates similarly, but generates one word at a time, from left to right. It attends not only to the other previously generated words, but also to the final representations generated by the encoder.

Flow of Information

Beyond computational performance and higher accuracy, another intriguing aspect of the Transformer is that we can visualize what other parts of a sentence the network attends to when processing or translating a given word, thus gaining insights into how information travels through the network.

To illustrate this, we chose an example involving a phenomenon that is notoriously challenging for machine translation systems: coreference resolution. Consider the following sentences and their French translations:

recent research paper on neural network

It is obvious to most that in the first sentence pair “it” refers to the animal, and in the second to the street. When translating these sentences to French or German, the translation for “it” depends on the gender of the noun it refers to - and in French “animal” and “street” have different genders. In contrast to the current Google Translate model, the Transformer translates both of these sentences to French correctly. Visualizing what words the encoder attended to when computing the final representation for the word “it” sheds some light on how the network made the decision. In one of its steps, the Transformer clearly identified the two nouns “it” could refer to and the respective amount of attention reflects its choice in the different contexts.

Given this insight, it might not be that surprising that the Transformer also performs very well on the classic language analysis task of syntactic constituency parsing, a task the natural language processing community has attacked with highly specialized systems for decades.

In fact, with little adaptation, the same network we used for English to German translation outperformed all but one of the previously proposed approaches to constituency parsing.

We are very excited about the future potential of the Transformer and have already started applying it to other problems involving not only natural language but also very different inputs and outputs, such as images and video. Our ongoing experiments are accelerated immensely by the Tensor2Tensor library , which we recently open sourced. In fact, after downloading the library you can train your own Transformer networks for translation and parsing by invoking just a few commands . We hope you’ll give it a try, and look forward to seeing what the community can do with the Transformer.

Acknowledgements

This research was conducted by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser and Illia Polosukhin. Additional thanks go to David Chenell for creating the animation above.

  • Machine Intelligence
  • Machine Translation
  • Natural Language Processing

Other posts of interest

recent research paper on neural network

May 24, 2024

  • Generative AI ·

recent research paper on neural network

  • Conferences & Events ·
  • Health & Bioscience ·
  • Machine Intelligence ·
  • Product ·
  • Quantum ·
  • Responsible AI

recent research paper on neural network

May 22, 2024

Suggestions or feedback?

MIT News | Massachusetts Institute of Technology

  • Machine learning
  • Social justice
  • Black holes
  • Classes and programs

Departments

  • Aeronautics and Astronautics
  • Brain and Cognitive Sciences
  • Architecture
  • Political Science
  • Mechanical Engineering

Centers, Labs, & Programs

  • Abdul Latif Jameel Poverty Action Lab (J-PAL)
  • Picower Institute for Learning and Memory
  • Lincoln Laboratory
  • School of Architecture + Planning
  • School of Engineering
  • School of Humanities, Arts, and Social Sciences
  • Sloan School of Management
  • School of Science
  • MIT Schwarzman College of Computing

These neural networks know what they’re doing

Press contact :, media download.

autonomous smart vehicles on bridge

*Terms of Use:

Images for download on the MIT News office website are made available to non-commercial entities, press and the general public under a Creative Commons Attribution Non-Commercial No Derivatives license . You may not alter the images provided, other than to crop them to size. A credit line must be used when reproducing images; if one is not provided below, credit the images to "MIT."

autonomous smart vehicles on bridge

Previous image Next image

Neural networks can learn to solve all sorts of problems, from identifying cats in photographs to steering a self-driving car. But whether these powerful, pattern-recognizing algorithms actually understand the tasks they are performing remains an open question.

For example, a neural network tasked with keeping a self-driving car in its lane might learn to do so by watching the bushes at the side of the road, rather than learning to detect the lanes and focus on the road’s horizon.

Researchers at MIT have now shown that a certain type of neural network is able to learn the true cause-and-effect structure of the navigation task it is being trained to perform. Because these networks can understand the task directly from visual data, they should be more effective than other neural networks when navigating in a complex environment, like a location with dense trees or rapidly changing weather conditions.

In the future, this work could improve the reliability and trustworthiness of machine learning agents that are performing high-stakes tasks, like driving an autonomous vehicle on a busy highway.

“Because these machine-learning systems are able to perform reasoning in a causal way, we can know and point out how they function and make decisions. This is essential for safety-critical applications,” says co-lead author Ramin Hasani, a postdoc in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

Co-authors include electrical engineering and computer science graduate student and co-lead author Charles Vorbach; CSAIL PhD student Alexander Amini; Institute of Science and Technology Austria graduate student Mathias Lechner; and senior author Daniela Rus, the Andrew and Erna Viterbi Professor of Electrical Engineering and Computer Science and director of CSAIL. The research will be presented at the 2021 Conference on Neural Information Processing Systems (NeurIPS) in December.

An attention-grabbing result

Neural networks are a method for doing machine learning in which the computer learns to complete a task through trial-and-error by analyzing many training examples. And “ liquid” neural networks change their underlying equations to continuously adapt to new inputs.

The new research draws on previous work in which Hasani and others showed how a brain-inspired type of deep learning system called a Neural Circuit Policy (NCP), built by liquid neural network cells, is able to autonomously control a self-driving vehicle, with a network of only 19 control neurons.  

The researchers observed that the NCPs performing a lane-keeping task kept their attention on the road’s horizon and borders when making a driving decision, the same way a human would (or should) while driving a car. Other neural networks they studied didn’t always focus on the road.

“That was a cool observation, but we didn’t quantify it. So, we wanted to find the mathematical principles of why and how these networks are able to capture the true causation of the data,” he says.

They found that, when an NCP is being trained to complete a task, the network learns to interact with the environment and account for interventions. In essence, the network recognizes if its output is being changed by a certain intervention, and then relates the cause and effect together.  

During training, the network is run forward to generate an output, and then backward to correct for errors. The researchers observed that NCPs relate cause-and-effect during forward-mode and backward-mode, which enables the network to place very focused attention on the true causal structure of a task.

Hasani and his colleagues didn’t need to impose any additional constraints on the system or perform any special set up for the NCP to learn this causality.

“Causality is especially important to characterize for safety-critical applications such as flight,” says Rus. “Our work demonstrates the causality properties of Neural Circuit Policies for decision-making in flight, including flying in environments with dense obstacles such as forests and flying in formation.”

Weathering environmental changes

They tested NCPs through a series of simulations in which autonomous drones performed navigation tasks. Each drone used inputs from a single camera to navigate.

The drones were tasked with traveling to a target object, chasing a moving target, or following a series of markers in varied environments, including a redwood forest and a neighborhood. They also traveled under different weather conditions, like clear skies, heavy rain, and fog.

The researchers found that the NCPs performed as well as the other networks on simpler tasks in good weather, but outperformed them all on the more challenging tasks, such as chasing a moving object through a rainstorm.

“We observed that NCPs are the only network that pay attention to the object of interest in different environments while completing the navigation task, wherever you test it, and in different lighting or environmental conditions. This is the only system that can do this casually and actually learn the behavior we intend the system to learn,” he says.

Their results show that the use of NCPs could also enable autonomous drones to navigate successfully in environments with changing conditions, like a sunny landscape that suddenly becomes foggy.

“Once the system learns what it is actually supposed to do, it can perform well in novel scenarios and environmental conditions it has never experienced. This is a big challenge of current machine learning systems that are not causal. We believe these results are very exciting, as they show how causality can emerge from the choice of a neural network,” he says.

In the future, the researchers want to explore the use of NCPs to build larger systems. Putting thousands or millions of networks together could enable them to tackle even more complicated tasks.

This research was supported by the United States Air Force Research Laboratory, the United States Air Force Artificial Intelligence Accelerator, and the Boeing Company.

Share this news article on:

Related links.

  • Ramin Hasani
  • Daniela Rus
  • Computer Science and Artificial Intelligence Laboratory
  • Department of Electrical Engineering and Computer Science

Related Topics

  • Electrical Engineering & Computer Science (eecs)
  • Computer Science and Artificial Intelligence Laboratory (CSAIL)
  • Computer science and technology
  • Autonomous vehicles

Related Articles

A photo of four people standing outside the Stata Center, wearing facemasks

More transparency and understanding into machine behaviors

liquid network graphic

“Liquid” machine-learning system adapts to changing conditions

neural networks

A neural network learns when it should not be trusted

A simulation system invented at MIT to train driverless cars creates a photorealistic world with infinite steering possibilities, helping the cars learn to navigate a host of worse-case scenarios before cruising down real streets.

System trains driverless cars in simulation before they hit the road

recent research paper on neural network

Explained: Neural networks

Previous item Next item

More MIT News

A little girl lies on a couch under a blanket while a woman holds a thermometer to the girl's mouth.

Understanding why autism symptoms sometimes improve amid fever

Read full story →

Three rows of five portrait photos

School of Engineering welcomes new faculty

Pawan Sinha looks at a wall of about 50 square photos. The photos are pictures of children with vision loss who have been helped by Project Prakash.

Study explains why the brain can robustly recognize images, even without color

Illustration shows a red, stylized computer chip and circuit board with flames and lava around it.

Turning up the heat on next-generation semiconductors

Sarah Milholland stands in front of an MIT building on a sunny day spring day. Leaves on the trees behind her are just beginning to emerge.

Sarah Millholland receives 2024 Vera Rubin Early Career Award

Grayscale photo of Nolen Scruggs seated on a field of grass

A community collaboration for progress

  • More news on MIT News homepage →

Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA, USA

  • Map (opens in new window)
  • Events (opens in new window)
  • People (opens in new window)
  • Careers (opens in new window)
  • Accessibility
  • Social Media Hub
  • MIT on Facebook
  • MIT on YouTube
  • MIT on Instagram

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Published: 20 May 2024

Neural architecture search for in-memory computing-based deep learning accelerators

  • Olga Krestinskaya   ORCID: orcid.org/0000-0001-8038-4558 1 ,
  • Mohammed E. Fouda 2 ,
  • Hadjer Benmeziane 3 ,
  • Kaoutar El Maghraoui   ORCID: orcid.org/0000-0002-1967-8749 4 ,
  • Abu Sebastian 3 ,
  • Wei D. Lu 5 ,
  • Mario Lanza   ORCID: orcid.org/0000-0003-4756-8632 6 ,
  • Hai Li   ORCID: orcid.org/0000-0003-3228-6544 7 ,
  • Fadi Kurdahi 8 ,
  • Suhaib A. Fahmy 1 ,
  • Ahmed Eltawil   ORCID: orcid.org/0000-0003-1849-083X 1 &
  • Khaled N. Salama   ORCID: orcid.org/0000-0001-7742-1282 1  

Nature Reviews Electrical Engineering ( 2024 ) Cite this article

171 Accesses

2 Altmetric

Metrics details

  • Electrical and electronic engineering
  • Materials for devices

The rapid growth of artificial intelligence and the increasing complexity of neural network models are driving demand for efficient hardware architectures that can address power-constrained and resource-constrained deployments. In this context, the emergence of in-memory computing (IMC) stands out as a promising technology. For this purpose, several IMC devices, circuits and architectures have been developed. However, the intricate nature of designing, implementing and deploying such architectures necessitates a well-orchestrated toolchain for hardware–software co-design. This toolchain must allow IMC-aware optimizations across the entire stack, encompassing devices, circuits, chips, compilers, software and neural network design. The complexity and sheer size of the design space involved renders manual optimizations impractical. To mitigate these challenges, hardware-aware neural architecture search (HW-NAS) has emerged as a promising approach to accelerate the design of streamlined neural networks tailored for efficient deployment on IMC hardware. This Review illustrates the application of HW-NAS to the specific features of IMC hardware and compares existing optimization frameworks. Ongoing research and unresolved issues are discussed. A roadmap for the evolution of HW-NAS for IMC architectures is proposed.

Hardware-aware neural architecture search (HW-NAS) is an efficient tool in hardware–software co-design, and it can be combined with other architecture-level and system-level optimization techniques to design efficient in-memory computing (IMC) hardware for deep learning accelerators.

HW-NAS for IMC can be used for optimizing deep learning models for a specific IMC hardware, and co-optimizing a model and hardware design searching for the most efficient implementation.

In HW-NAS, it is important to define a search space, select an appropriate problem formulation technique, and consider the trade-off between performance, search speed, computation demands and scalability when selecting a search strategy and a hardware evaluation technique.

In addition to neural network model hyperparameters and quantization and pruning policies, HW-NAS for IMC can include the circuit-level and architecture-level hardware parameters in the search.

The main challenges in HW-NAS for IMC include a lack of unified framework to support different types of neural network models and different IMC hardware architectures, HW-NAS benchmarks and efficient software–hardware co-design techniques and tools.

Fully automated NAS methods capable of constructing new deep learning operations and algorithms suitable for IMC with minimal human design are needed.

You have full access to this article via your institution.

Similar content being viewed by others

recent research paper on neural network

Highly accurate protein structure prediction with AlphaFold

recent research paper on neural network

A guide to artificial intelligence for cancer researchers

recent research paper on neural network

Predicting equilibrium distributions for molecular systems with deep learning

Introduction.

The proliferation of the Internet of things is fuelling an unprecedented surge in data generation and advanced processing capabilities to cater to the intricate demands of applications leading to rapidly developing artificial intelligence (AI) systems. The design and implementation of efficient hardware solutions for AI applications are critical, as modern AI models are trained using machine learning (ML) and deep learning algorithms processed by graphic processing unit (GPU) accelerators on the cloud 1 . However, high energy consumption, latency and data privacy issues associated with cloud-based processing have increased the demand for developing efficient hardware for deep learning accelerators, especially for on-edge processing 2 . One of the most promising hardware architectures executing deep learning algorithms and neural networks at the edge is in-memory computing (IMC) 3 . This entails carrying out data processing directly within or in close proximity to the memory. The cost related to data movement is thus reduced, and notable enhancements in both latency and energy efficiency for computation and data transfer operations are obtained 4 , 5 , 6 .

Efficient and functional IMC systems require the optimization of the design parameters across devices, circuits, architectures and algorithms. Effective hardware–software co-design toolchains are needed to connect the software implementation of neural networks with IMC hardware design 7 . Hardware–software co-design requires optimizations at each level of the process, which involves hundreds of parameters and is difficult to perform manually. Hardware-aware neural architecture search (HW-NAS) is a hardware–software co-design method to optimize a neural network while considering the characteristics and constraints of the hardware on which this neural network is deployed. Depending on the goal and problem setting, HW-NAS can also be used to optimize the hardware parameters themselves, such as the design features contributing to hardware efficiency. The parameter space for state-of-the-art neural network models can easily reach the order of 10 35 (ref. 8 ), making it impossible to search for the optimal parameters manually. Traditional NAS frameworks are efficient at automating the search for optimum network parameters (for example network layers, blocks and kernel sizes) 9 . However, they do not consider hardware constraints, nor can they optimize the parameters of the hardware itself. HW-NAS extends conventional NAS by seamlessly integrating hardware parameters and characteristics, streamlining the efforts of hardware designers and software programmers alike 10 . HW-NAS can automate the optimization of neural network models given hardware constraints such as energy, latency, silicon area and memory size. Moreover, certain HW-NAS frameworks have the capability to optimize the parameters of the hardware architecture that is best suited for deploying a given neural network. HW-NAS can also help to identify trade-offs between performance and other hardware parameters 11 . Moreover, conventional NAS frameworks often incorporate operations that are not suited to IMC hardware and disregard inherent IMC hardware non-idealities such as noise or temporal drift within the search framework 12 . The absence of established IMC NAS benchmarks compounds these challenges. To close these gaps, since the beginning of the 2020s, several HW-NAS frameworks for IMC architectures have been proposed 13 , 14 , 15 , 16 , 17 , some of which support a joint search of neural network parameters and IMC hardware parameters. These IMC hardware parameters include crossbar size, the resolution of the analog-to-digital converter or digital-to-analog converter (ADC/DAC), buffer size and device precision.

Existing NAS surveys focus on the software and algorithmic perspectives, discussing search approaches, optimization strategies and search-space configurations 18 , 19 , 20 . Reviews related to HW-NAS discuss a taxonomy of HW-NAS methods, search strategies, optimization techniques, hardware parameters relating to different types of hardware, such as central processing units (CPUs), GPUs, field-programmable gate arrays (FPGAs) and traditional application-specific integrated circuits (ASICs) 9 , 10 , 11 , 20 , 21 . However, an in-depth review of HW-NAS specifically for IMC, with consideration for its unique properties and the available hardware frameworks is not available.

In this Review, we discuss HW-NAS methods and frameworks focusing on IMC architecture search space. We compare existing frameworks and identify research challenges and open problems in this area. A roadmap for HW-NAS for IMC is outlined. Moreover, we provide recommendations and best practices for effective implementation of HW-NAS frameworks for IMC architectures. Finally, we show where HW-NAS stands in the context of IMC hardware–software co-design, highlighting the importance of incorporating IMC design optimizations into HW-NAS frameworks.

In-memory computing background

In traditional von Neumann architectures (Supplementary Fig.  1a ), the energy cost of moving data between the memory and the computing units is high. Processor parallelism can alleviate this problem to some extent by performing more operations per memory transfer, but the cost of data movement remains an issue 4 . Moreover, von Neumann architectures suffer from the memory wall problem — that is, the speed of processing has improved at a much faster rate than that of traditional memories (such as dynamic random access memory (DRAM)), resulting in overall application performance being limited by memory bandwidth 4 , 22 . Power efficiency has also ceased scaling with technology node advances, resulting in stalled gains in performance computational density 23 .

In in-memory computing (IMC), one approach proposed to overcome the von Neumann bottleneck, data processing is performed within memory, by incorporating processing elements (Supplementary Fig.  1b ). This alleviates the cost of data movement and improves the latency and energy required for both computation and data transfer 4 , 24 , 25 , 26 . Tiled architectures for IMC 27 are based on a crossbar array of memory devices which can perform multiply–accumulate (MAC) or matrix–vector multiplication (MVM) operations efficiently. Because MVMs constitute the vast majority of their associated operations, the tiled architecture is ideal for hardware realization of deep neural networks 28 . To implement an efficient IMC system, device-level 4 , 29 , 30 , 31 , 32 , circuit-level 33 , 34 , 35 and architecture-level 36 design aspects should be considered 37 .

In-memory computing devices and technologies

IMC can be built using charge-based memories, such as static random access memory (SRAM) and flash memory, or using resistance-based memories, such as resistive random access memory (RRAM), phase-change memory (PCM), magnetoresistive random access memory (MRAM), ferroelectric random access memory (FeRAM) and ferroelectric field-effect transistors (FeFETs) 4 (Supplementary Fig.  1c ). SRAM is a well-developed volatile memory used for IMC applications, whereas the other IMC devices mentioned are non-volatile. Overall, non-volatile memories are less mature than traditional SRAM-based memories but have promising potential for IMC-based neural network hardware due to high storage density, scalability and non-volatility properties. The configuration and operating principle of volatile and non-volatile memories are described in Supplementary Note  1 .

The important criteria used to select devices for IMC architectures are access speed for read and write operations, write energy, endurance, scalability, cell area, and cost (Supplementary Fig.  1d ). To implement state-of-the-art neural networks, both read and write operation speeds of IMC devices are important for both training and inference, especially considering dynamic operations in some models, such as transformer-based neural networks 38 . SRAM has the lowest write latency (<1 ns) and highest endurance (>10 16 cycles), compared with non-volatile IMC memory devices (Supplementary Fig.  1d ). SRAM and some non-volatile memories, such as MRAM, FeRAM and FeFET, have low write energy (<0.1 nJ for SRAM and <1 nJ for non-volatile memories), which can contribute to faster neural network training and dynamic operations. Non-volatile memories, such as RRAM, PCM and Flash memories, have the smallest cell area (approximately 10–16 F 2 , where F is the minimum lithography feature size) and higher scalability and storage density due to the possibility of multilevel storage. In most IMC architectures, memories are organized in a 2D array, but 3D integration and 3D stacking can offer higher storage density. Overall, the most common and scalable memory devices for IMC architectures are SRAMs, RRAMs and PCMs 39 . Flash-array-based IMC accelerators also show promising results for neural networks and ML applications 40 . Other possible IMC devices include spin-torque MRAM, electrochemical RAM and memtransistors 29 . However, they are less common and are at an early stage of development.

Conventional IMC architectures for neural networks

A typical IMC architecture has several layers of hierarchy 33 , 41 , 42 , 43 , 44 (Supplementary Fig.  1e ). The highest layer is constituted by tiles typically connected through a network-on-chip that includes the routers to transmit the signal between the tiles. The weight matrix of a neural network can be stored inside a single tile or shared between several tiles. This layer also includes peripheral and interface circuits, such as the global accumulation unit, pooling unit and input–output interface circuits. A tile consists of several computing elements 42 , also called MAC units or MVM units 41 , 43 , and peripheral circuits, including accumulation and activation units. Each computing element contains several crossbar arrays (processing elements) and processing circuits, including multiplexers shared by several crossbar columns, shift-and-add circuits, ADC converters, local registers and control circuits. A crossbar array contains memory cells with one or more devices depending on IMC memory type used in the design.

Some device technologies, such as RRAM and PCM, use a device in series with a switching element to mitigate sneak path currents from cell to cell (which would result in false cell programming or reading) and limit the current in the low-resistance state (to avoid damage and improve variability and endurance). This device can be a two-terminal threshold-switching selector located above or below the resistance-based memory (namely 1S1R); or a complementary metal–oxide–semiconductor transistor (1T1R). This choice typically increases the size of each cell (the resistance-based memory is integrated on the via that comes from the drain and source contacts of the transistor) 30 . State-of-the-art IMC architectures include ISAAC 41 , PUMA/PANTHER 43 , 45 , TIMELY 46 , RIMAC 47 , PIMCA 48 , HERMES 39 and SAMBA 49 . ISAAC 41 is a pipelined IMC accelerator with embedded DRAM buffer banks to store intermediate outputs of the pipeline stages. PUMA 45 is a programmable eight-core accelerator with a specialized instruction set architecture and compiler supporting complex workloads. PANTHER 43 is an extension of PUMA architecture supporting efficient training for RRAM-based IMC architectures. TIMELY 46 adopts analog local buffers inside the crossbar arrays to improve data locality, and time-domain interfaces to improve energy efficiency. RIMAC 47 is an ADC/DAC-free IMC accelerator with analog cache and computation modules. PIMCA 48 is a capacitive-coupling-based SRAM IMC accelerator with a flexible single-instruction, multiple-data processor for non-MVM operations. SAMBA 49 is a sparsity-aware RRAM-based IMC accelerator with load balancing and optimized scheduling.

An alternative to hierarchical architectures is one that combines spatially distributed analog IMC tiles and heterogeneous digital computing cores 50 . Such an architecture, based on 2D-mesh interconnect 51 , is highly programmable and supports a wide range of workloads (mapping and pipelining). TAICHI 51 is another example of a tiled RRAM-based accelerator with mesh interconnect, local arithmetic units and global co-processor targeting reconfigurability and efficiency.

Since the beginning of the 2020s, several fabricated IMC macros have been demonstrated: a reconfigurable 48-core RRAM-based IMC chip (NeuRRAM) suitable for various applications, such as image classification, speech recognition and image reconstruction 52 ; eight-core RRAM-based IMC macros 53 , 54 ; a PCM-based fabricated eight-core chip 55 ; a flash-memory-based pipelined 76-core chip with analog computing engine tiles 40 ; a SRAM-based mixed-signal 16-core IMC accelerator with configurable on-chip network, flexible dataflow and scalable cores 56 ; and a MRAM-based IMC core with readout circuits 57 . In 2023, a mixed-signal IMC chip called the IBM HERMES Project Chip, comprising 64 cores of PCM-based crossbar arrays with integrated data converters, on-chip digital processing and a digital communication fabric, was presented 39 .

Weight mapping, computing precision and non-idealities

Several software and hardware parameters need to be considered to map a software-based neural network model to IMC hardware. For example, when mapping neural network weight matrices and inputs or activations to IMC crossbars, important parameters are the matrix size, the crossbar size, the precision of weights and inputs, the precision of IMC devices, the resolution of the converters (ADCs and DACs) and the peripheral circuits. In this case, the concept of partial sums should be considered. Partial sums in IMC architectures are applied in three different cases (Supplementary Fig.  1f ): (1) when a large weight matrix does not fit into a single crossbar array; (2) when high-precision weights are mapped to low-precision crossbar cells; and (3) when high-precision inputs are streamed to the crossbar sequentially 58 (discussed in detail in Supplementary Note  2 ). Partial sums require specific ADC resolution to maintain the desirable computing precision (Supplementary Note  2 ), which contributes to on-chip area and energy consumption overhead of peripheral circuits 43 .

In IMC architectures with non-volatile memory devices, computing precision is also affected by non-idealities, including device-to-device variability, circuit nonlinearity, and conductance variation with time 38 , 59 . Such degradation of computing precision can be predicted and mitigated using hardware-aware training methods leading to robust IMC hardware designs 38 , 60 . The accuracy degradation caused by non-idealities of the devices can also be improved by periodically calibrating batch normalization parameters during the inference 59 . Overall, it is necessary to conduct a comprehensive analysis of noise effects when designing IMC hardware and include mitigation and compensation techniques for non-idealities as part of the design.

Model compression for IMC architectures

Model compression techniques used for neural network optimization, such as quantization and pruning 61 , can be applied in implementations on IMC architectures to reduce hardware costs. It is too expensive to deploy full-precision neural network weights to IMC devices 62 , so quantization is often used, reducing occupied memory, data transmission and computation latency 63 . Network pruning, which removes unnecessary neural network weights or groups of weights, can reduce energy and latency 64 .

Quantization for IMC architectures

Quantization methods are divided into uniform quantization and non-uniform quantization 65 . In uniform quantization, the quantization intervals are equally distributed across the quantized range 58 . An example of non-uniform quantization is logarithmic quantization, that is the power-of-two method 66 commonly used for SRAM-based IMC hardware or RRAM-based binarized IMC architectures. More complex quantized weight representations use bit-level sparsity to increase the number of zeros in the bit representation of weights and activations to improve energy efficiency in MAC operation during quantization 67 , 68 . Most quantization-related RRAM-based architectures focus on fixed-precision quantization with a uniform quantizer 62 .

An alternative approach is based on mixed precision quantization, where different quantization configurations are chosen for different layers 63 , 69 . This method is effective because different neural network layers and convolution filters have different sensitivity to quantization 62 and contribute differently to overall accuracy 69 . Flexible word length quantization improves compression and reduces accuracy loss compared with uniform quantization 70 .

In IMC architectures, quantization is performed by either changing the number of crossbar cells per weight or the number of bits per crossbar cell. Analog weight storage (≥2 bits per cell) allows for higher effective cell density (bits mm −2 ), leading to higher storage density 39 . However, increasing the number of bits per cell increases the effect of RRAM non-idealities and variabilities. ADC precision, and hence overhead, also increases with the precision of crossbar weights 63 .

Pruning in IMC architectures

Pruning is divided into unstructured and structured pruning. In unstructured pruning, the individual connections (weights) are removed at a fine granularity. Structured pruning implies coarse-grained pruning of groups of weights (kernel or channel pruning) 64 , 71 . In IMC hardware, weight pruning (usually unstructured pruning of individual weights) disconnects unnecessary crossbar cells, leading to sparse neural network hardware implementation. Structured pruning is implemented by removing or disconnecting whole rows or columns of the crossbar array and corresponding peripheral circuits.

Sparsity due to unstructured pruning in IMC architectures can improve energy efficiency and throughput, but it can also lead to unnecessary computation overhead, difficulty in parallelizing processing, and low hardware utilization. Nevertheless, mapping structured row-wise or column-wise pruning to IMC architecture leads to higher crossbar utilization than unstructured pruning 64 . One of the ways to reach a desired compression ratio via structured pruning is to incorporate several rounds of weight grouping, determining the importance of these groups, followed by fine-grained row-wise pruning 64 .

Most work on RRAM-based neural network pruning uses heuristics or rules to prune network weights. This can sometimes prune non-trivial weights and preserve trivial weights, leading to sub-optimal solutions. Also, hardware constraints and hardware feedback are not always considered in RRAM-based network pruning 72 .

Hardware-aware neural architecture search

In the hardware-aware neural architecture search (HW-NAS) process, the inputs are neural network parameters, model compression parameters and hardware parameters (Fig.  1a ). In some cases, the search space includes hardware search space and model compression options to be optimized. Optimal neural network designs and optimal hardware parameters are searched for within this space using a search strategy (algorithm or heuristic). The main difference between HW-NAS and traditional NAS is the consideration of hardware limitations and constraints in the search. Problem formulation methods are used to define an objective function for the search and a method to incorporate optimization constraints. Some frameworks search only for optimum neural network designs considering hardware constraints, whereas others can incorporate the search of optimal hardware parameters to find the most efficient hardware design. Performance is evaluated using performance metrics and hardware metrics. In this Review, performance metrics refer to a neural network performance characteristic such as accuracy or performance error, while hardware metrics refer to the metrics describing hardware efficiency, such as energy, latency and on-chip area. To evaluate hardware performance, various hardware cost estimation methods can be used.

figure 1

a , Overview of hardware-aware neural architecture search (HW-NAS). b , Efficient deep learning methods for the design space exploration involved in HW-NAS for in-memory computing (IMC). ADC, analog-to-digital converter; DAC, digital-to-analog converter; S&H, sample and hold.

HW-NAS basics

HW-NAS for IMC incorporates four efficient deep learning methods for design space exploration (Fig.  1b ), which allow the optimal design of neural network model and hardware to be found: model compression, neural network model search, hyperparameter search and hardware optimization. Model compression techniques, such as quantization and pruning, can be viewed as HW-NAS problems and are often included in HW-NAS flows 9 . Neural network model search implies searching for neural network layers and corresponding operations, as well as the connections between them 73 , 74 . Hyperparameter search includes searching for the optimized parameters for a fixed network — that is, the number of filters in a convolution layer or the kernel size 9 . Hardware optimization is the optimization of hardware components such as tiling parameters, buffer sizes and other parameters included in the hardware search space. For IMC architectures, hardware optimization may include crossbar-related parameters (such as ADC/DAC precision and crossbar size) that can have an effect on the performance of the architecture, in terms of energy consumption, processing speed, on-chip area and performance accuracy 15 , 75 .

The search space in HW-NAS refers to the set of network operations and hyperparameters searched to optimize the network and hardware architecture. The search space can be fixed or hardware-aware 11 . In a fixed search space, neural network operations are designed manually without considering the hardware. In hardware-aware search, the interlayer and intralayer operations in the network are adapted depending on the hardware.

From the perspective of the search parameters, the search space can be divided into the neural network search space and hardware architecture search space 9 . The first covers the selection of neural network operations, blocks, layers and the connections between them. The second considers the hardware design, for example IP core reuse on FPGA, quantization schemes, tiling, or selection of buffer sizes. The hardware architecture search space depends either on hardware platform configurations or on predefined templates for different operations to optimize certain hardware parameters 9 . For IMC architectures, the search space should be extended to include specific hardware-related details, such as crossbar size and precision of the converters (discussed later).

Depending on the HW-NAS objectives, frameworks can optimize a specific neural network or set of network models for a specific or multiple hardware architectures 11 . Depending on the target hardware, hardware constraints can vary. Hardware constraints can be categorized into implicit constraints and explicit constraints. Implicit constraints are those that do not describe desired hardware metrics directly but affect them implicitly, such as bits per operation. Explicit constraints are the evaluated metrics related to hardware deployment, including energy consumption, latency, on-chip area, memory and available hardware blocks. Typical constraints for IMC architectures include energy consumption, latency and the number of available resources (for example crossbar tiles on a chip).

Problem formulation in HW-NAS

A problem formulation in HW-NAS defines the objective function, optimization constraints and how the problem is formulated. The problem formulation method is selected according to the target output and available information about hardware resources. For example, the HW-NAS target can be either a single architecture with optimized performance and hardware parameters or a set of architectures optimizing hardware parameters with a certain priority. HW-NAS problem formulation is divided into single-objective optimization and multi-objective optimization methods (Fig. 2 ). The selection of a HW-NAS problem formulation method depends on the objectives of the search. Two-stage methods are suitable for deploying well-performing models to a specific hardware or getting a sub-model from an existing model followed by a specific hardware deployment. Constrained optimization is useful for the case of specified hardware constraints or designing a neural network model for a specific hardware platform. Scalarization methods could be used when setting up the priority of a certain objective function and the Pareto-based optimization for finding the trade-off between the performance metrics and hardware metrics.

figure 2

The methods search for a neural network (NN) model α from the search space A that maximizes a performance metric f or several performance metrics f n for a dataset δ . Constrained optimization uses a hardware (HW) constraint g i with a threshold T i from a set of hardware constraints I . h ( ) is a combination of several performance metrics that can represent weighted summation, weighted exponential sum or weighted product. Data derived from ref. 9 .

Single-objective optimization

These methods are categorized into two-stage methods and constrained optimization 9 . In two-stage optimization, HW-NAS first selects a set of well-performing high-accuracy network models and then performs hardware optimization and selects the most hardware-efficient design. It is useful to transform well-performing neural networks for implementation on different hardware platforms, or to optimize networks for a specific hardware platform. Hardware constraints are included in the second stage of HW-NAS. The drawback of such methods is that the selected network models in the first stage tend to be large to maximize accuracy and may not always fit the hardware constraints of a specific platform. In constrained optimization, hardware parameters are considered when searching for a neural network model. This allows filtering out of network models that do not fit within hardware constraints during the search process, thus speeding up HW-NAS. The challenge of constrained optimization is the difficulty of including hardware constraints directly in the search algorithms, especially in gradient-based methods or reinforcement learning. Therefore, the problem is often transformed into an unconstrained optimization that includes the hardware constraints in the objective function 9 , 76 .

Multi-objective optimization

These methods are categorized into scalarization methods and Pareto optimization typically using the NSGA-II algorithm 9 . The first approach is a multi-objective optimization method when several objective functions are combined via weighted summation, weighted exponential sum or weighted product to set up the significance of the objective term. This approach is useful to set the priority for certain objective terms while not ignoring others, and modifying this weighting based on requirements. During a search, the weights are usually fixed, and multiple runs are required to find the Pareto optimal set or to get a ‘truly’ optimum network model. Therefore, speed is slow and depends on the number of search iterations with different weights. In the second method, a set of Pareto optimal neural network models is searched. The search can be implemented with the evolutionary algorithm NSGA-II 9 , where the problem is formulated as a linear combination of different objective functions. This method is useful to find trade-offs between performance, accuracy and hardware metrics, especially when searching for the optimal network model for different hardware platforms. Search speed is slow compared with the previous methods, as the whole set of network models on the Pareto curve is searched.

Search strategies: algorithms for HW-NAS

After defining a problem formulation method, a search strategy should be selected (Fig.  3 ). The search algorithm is a core component of NAS and defines the flow of parameter search. There are three main optimization algorithms used for HW-NAS: reinforcement learning, evolutionary algorithms, and gradient-based methods, such as differentiable search. Less common algorithms include random search and Bayesian optimization. The search algorithm is independent of the problem formulation methods shown in Fig.  2 . For example, two different search algorithms can be used in a two-stage problem formulation method in different optimization stages, in a hybrid approach 9 . Constrained optimization and scalarization method-based problem formulation can be combined with most search strategies, for example reinforcement learning, evolutionary algorithms and Bayesian optimization. Differentiable search is easier to apply for differentiable parameters search, for example, in the first stage of two-state methods when optimizing the neural network model parameters, while using any other algorithm (reinforcement learning or evolutionary algorithm) to optimize the hardware or fine-tuning the model to fit hardware constraints 8 . Pareto optimization problems are mainly addressed in the literature by evolutionary algorithm approaches, such as NSGA-II 77 ; however, the Pareto optimal set can also be found using other methods 78 .

figure 3

Summary and comparison of search strategies and their performance parameters, computation and resource requirements. Search strategies in hardware-aware neural architecture search (HW-NAS) refer to the algorithms and search techniques used in the search for the optimum parameters. Scalability refers to how scalable the algorithm is when the search space grows. Even though the evolutionary algorithms are not scalable when the search space increases, they are often used as a second step in a two-state optimization, where the first step is a scalable gradient-based method with a super-network search space. Gradient-based methods are scalable, as the search time does not increase exponentially with search space size. Computational demand and memory requirements refer to the resources required to run the algorithm. The scalability of each algorithm in terms of memory consumption depends on the specific algorithm and may vary within the same subset of methods.

In reinforcement-learning-based NAS search, an agent interacts with the environment and learns the best policy to take an action using a trial-and-error approach. The environment is modelled with a Markov decision process 63 . The algorithm aims to maximize the reward function. The main drawback of reinforcement learning is slow search speed and high computational cost. Reinforcement learning can be applied to RRAM-based architecture 63 or used to find the best-performing network models (considering the optimization of hardware resource utilization in an RRAM-based architecture) 69 . Reinforcement-learning-based automated weight pruning can also be applied for RRAM-based crossbars 72 , 79 .

One of the best-known evolutionary algorithms is the genetic algorithm, which is used in several HW-NAS frameworks 9 . The first step of an evolutionary algorithm is the initialization of a population of networks with a random combination of parameters to start the search. The performance of the networks is then evaluated and scored based on an objective function (also called the fitness function). The best-performing networks are used in a mutation and crossover process, where the parameters of these networks are mixed and some of the parameters are randomly changed to new ones to create a new population. The process is repeated starting from evaluation for several iterations, called generations, until convergence to a set of well-performing network models 70 , 80 .

Gradient-based and differentiable search methods use the one-shot NAS approach and weight sharing in a super-network (an over-parametrized network) 81 . This approach combines all possible neural network models with different parameters in a single super-network using a weight-sharing technique. Compared with reinforcement learning and evolutionary algorithm methods, where the evaluated networks are randomly sampled at the beginning of a search from the existing parameters, the weights in a super-network in gradient-based methods are trained at the same time as the search is performed. After training, the super-network is pruned to eliminate inefficient connections and blocks, creating a single optimum network. Differentiable search is the only scalable approach with a differentiable search space, where the search space does not grow exponentially with the increased number of searched hyperparameters. This approach is faster than other methods, as it does not require the training of every single neural network model and therefore has low computational demand. However, differentiable search has high memory requirements for storing an over-parameterized network. An example of differentiable NAS applied to RRAM-based architecture is CMQ 62 . The super-network search can be performed by evolutionary algorithm or Bayesian optimization methods, which are also suitable for a discrete search space 8 .

Bayesian optimization 82 excels in managing complex search spaces by constructing and iteratively refining a probabilistic model of the objective function. It balances exploration of new architectures and exploitation of known effective configurations. This makes Bayesian optimization a strong contender for optimizing hyperparameters within a set macro-architecture of a neural network. However, it is important to note that the search speed of Bayesian optimization is relatively slow, comparable to that of reinforcement learning methods. Complementing Bayesian optimization, other strategies, such as random search 83 , the multi-armed bandit approach 84 and simulated annealing 85 , also contribute to the field. Random search, with its simplicity and unpredictability, offers a baseline performance metric and can be effective in certain high-dimensional search spaces. The multi-armed bandit approach, adept at efficiently navigating decisions under uncertainty, and simulated annealing, inspired by metallurgical cooling processes, both provide unique mechanisms for exploring the search space. These methods are valuable for their distinctive ways of handling search space complexities and often find use in scenarios where more advanced techniques may not be as suitable or necessary.

Overall, reinforcement learning 16 and evolutionary algorithms 12 can produce good results, but they are slow and not scalable. The search space and search time required for reinforcement learning and evolutionary algorithms increase exponentially with the number of searched hyperparameters. This problem is addressed by differentiable search 14 , which is scalable and faster than reinforcement learning and evolutionary algorithms. Therefore, differentiable search is useful for a large search space with many parameters. However, it is important to consider the gradient estimation techniques for non-differentiable parameters. Bayesian optimization is also a promising search strategy. Nevertheless, the application of Bayesian optimization for IMC hardware has not been explored yet.

Hardware cost estimation methods

A major factor in HW-NAS is the estimation methods for hardware performance. There are four different methods for evaluation of a hardware cost (Fig.  4 ): real-time estimation, lookup table (LUT)-based methods, analytical estimation, and prediction models.

figure 4

Model scalability refers to the scalability across different neural network models. In analytical estimation, model scalability depends on how similar a new model is to the one with already estimated hardware cost metrics. Transferability refers to transferability across different hardware platforms. Transferability of lookup table (LUT) models and prediction models is moderate, as it requires regeneration of LUTs and machine learning (ML) models for prediction, respectively. Transferability of the analytical estimation depends on how similar new hardware is to the already estimated one. Search scalability refers to the scalability with increasing size of the search space, when the number of search hyperparameters grows. ADC, analog to digital converter; NAS, neural architecture search; Xbar, crossbar. Data derived from ref. 9 .

In real-time measurement-based methods, the hardware evaluation is performed directly on target hardware in real time. In FPGA-based or microcontroller-based designs, this implies the deployment of the network model to real or simulated hardware 9 . For IMC architectures, this can be performed directly on IMC chip or using circuit-level simulations such as SPICE, which is difficult to automate. This method ensures highly accurate performance estimation, and it is also scalable for different models and across different hardware platforms; however, it is slow, inefficient and impractical.

LUT-based methods involve the separate evaluation of hardware metrics for every hardware parameter and its storage in a large LUT 8 , 86 . During hardware evaluation in HW-NAS, LUTs are used to calculate the total hardware metrics, which are the total energy, the latency or the on-chip area, using the stored results. LUT-based methods are less accurate than real-time measurements and prediction methods, especially when the communication between crossbar tiles or other hardware blocks in IMC architecture is not considered. LUT-based methods are less scalable than other methods, as with the increased search space, the number of required measurements grows combinatorically with the number of parameters. In addition, LUT-based methods are moderately scalable across different neural network models and require regeneration when transferred to other hardware platforms.

Analytical estimation methods imply computing rough estimates of hardware metrics using mathematical equations (such as DNN+NeuroSim 87 , MNSIM 88 , 89 , AIHWKit 90 and PUMAsim 45 for IMC-based hardware). Such a method is fast and highly scalable when the search space is increased. The scalability of this method across different neural network models depends on how similar a new model is to the already estimated one. The transferability across hardware platforms also depends on the similarity of the hardware architectures. However, it still is not as accurate as real-time measurements. It also requires the initial estimation of hardware metrics from real-time hardware or circuit-level simulations.

Prediction-based methods are based on ML and are trained to use a linear regression or neural-network-based approach to predict hardware metrics 91 , 92 , 93 . These methods require an initial set of hardware parameters and hardware metrics stored in LUTs to train ML models and are fast and highly scalable when adding new hyperparameters to the search space. The scalability across neural network models depends on the similarity of the models. Prediction-based methods support differentiable NAS methods and are more accurate than analytical estimation and LUT-based methods.

Other HW-NAS considerations

Each search strategy and algorithm contains a sampling part, where the neural network models are sampled. In evolutionary algorithm and reinforcement learning-based optimization frameworks, the network models are sampled before the search. In contrast, in differentiable NAS, the best-performing models are sampled after training a supernet. The most common sampling method is uniform and random-uniform sampling, which is simple and effective 94 . Other methods include Monte Carlo sampling, which guarantees good performance and diversity owing to the randomness of the sampling process, and the Stein variational gradient descent algorithm with regularized diversity, which provides a controllable trade-off between the single-model performance and the predictive diversity 95 . More advanced sampling methods include attentive sampling aiming to produce a better Pareto front, such as AttentiveNAS framework 96 , and dynamic adaptive sampling 97 , where the sampling probability is adjusted according to the hardware constraints.

As search parameters of a neural network models can be non-differentiable, one of the main issues in differentiable NAS is the relaxation of non-differentiable parameters when applying differentiable search methods. As these methods require gradient calculation, the search space should be differentiable. When it comes to HW-NAS and searching for the hardware parameters, this becomes an even more critical issue, as most of the hardware parameters and hardware metrics are non-differentiable 98 . These relaxation methods allow gradient computation over discrete variables. The most common methods include estimated continuous function, the REINFORCE algorithm, and application of the Gumbel Softmax function using random Gumbel noise in a computation 9 .

The main challenge in HW-NAS is search speed and runtime performance. To improve the search speed of HW-NAS, several techniques, including early stopping, hot start, proxy datasets and accurate prediction models, are used 9 . In early stopping, the change in a loss function is monitored for the first few training iterations instead of training the neural network modes completely. In the hot start technique, the search starts from the well-performing efficient network models rather than random ones. In the application of the proxy dataset, small simple datasets are used in the search first, and then the search results are fine-tuned for more complex datasets. To speed up the search, accuracy prediction methods can also be used for accuracy estimation instead of training every sampled network 9 .

HW-NAS for IMC architectures

State-of-the-art hw-nas frameworks for imc.

Manually searching for both the optimum design and processing of in-memory architecture is unrealistic, as a search space becomes huge when adding the architecture parameters 16 (Supplementary Fig.  2a ). Besides neural network blocks, hyperparameter search and optimized compression techniques, the search space for IMC architectures can be expanded to search for IMC crossbar-related hardware components 15 , 16 , 17 , 63 . The IMC hardware search space considered in these frameworks includes IMC crossbar size, ADC/DAC precision, device precision and buffer size.

Between 2020 and 2023, several HW-NAS frameworks for IMC-based neural network architectures have been introduced (Supplementary Fig.  2b–d and Table  1 ). Based on which parameters are searched (Supplementary Fig.  2b ), the HW-NAS methods for IMC architectures can be divided into three main categories: (1) frameworks containing the ‘true’ NAS searching for the neural network components and hyperparameters 12 , 13 , 14 , 15 , 16 , 17 , 80 , 99 , 100 , 101 ; (2) frameworks in which quantization is presented as an HW-NAS problem, and optimum bit-width is searched considering the hardware feedback 62 , 69 , 70 ; and (3) frameworks searching for optimum pruning, formulating the problem as HW-NAS 72 , 79 . Compared with ‘true’ NAS approaches, frameworks focused on only quantization or pruning search for optimized model compression techniques while using HW-NAS problem formulation techniques.

Based on the consideration of hardware parameters in a search (Supplementary Fig.  2c ), HW-NAS frameworks for IMC can be divided into three categories: (1) frameworks for a fixed IMC architecture optimizing a neural network model for a fixed hardware 12 , 13 , 14 , 62 , 69 , 70 , 72 , 79 , 80 , 99 , 100 , 101 , (2) frameworks with hardware parameters search for a fixed model optimizing IMC hardware for a certain application 102 , and (3) frameworks for optimum model and architecture search optimizing both neural network model parameters and hardware parameters 15 , 16 , 17 , 63 . The frameworks for a fixed architecture are designed to optimize the neural network model for a specific IMC hardware. This approach is the most widespread method employed for adjusting a neural network model for a specific ready IMC architecture considering hardware constraints 12 , 13 , 14 , 62 , 69 , 70 , 72 , 79 , 80 , 99 , 100 , 101 . Frameworks for IMC hardware parameters search for a fixed neural network model formulate a hardware optimization problem as a single- or multi-objective optimization problem, rather than optimizing the design manually or using brute force approaches 102 . The HW-NAS frameworks for both neural network model and IMC hardware parameters search perform co-optimization of software and hardware parameters. This approach is useful to obtain optimum hardware solutions for a particular application, especially at the initial design stages.

In addition, state-of-the-art HW-NAS frameworks can be categorized based on the algorithm used in a search (Supplementary Fig.  2d ). The detailed description of each framework and mathematical representation of the problem formulation can be found in Supplementary Note  3 .

An HW-NAS framework for IMC that can simultaneously prune, quantize and perform NAS in one flow has not been reported yet. The baseline and optimization functions for the state-of-the-art HW-NAS frameworks for IMC are different, and these frameworks focus on different neural network models and different search strategies (Table  1 ). Therefore, comparing the performance and search speed of these frameworks is difficult. It is important to note that the state-of-the-art frameworks for HW-NAS for IMC are mostly designed for different types of convolutional neural networks, and it is still an open problem to apply HW-NAS techniques to other types of neural network architectures implemented on IMC hardware.

Two-stage optimization versus joint optimization

The HW-NAS can be divided into the search of an optimized model for a specific hardware architecture taking hardware constraints into account, and co-optimization of a neural network model and hardware parameters (Supplementary Fig.  2c ). The second is useful when designing IMC hardware for a specific application, especially when hardware efficiency is critical. As illustrated in Table  1 , only a few HW-NAS frameworks for IMC include hardware parameters in the search and perform hardware–software co-optimization. The hardware parameter search can help to design a more efficient hardware implementation of an IMC architecture. To include IMC hardware parameters in the search, there are two possible scenarios of HW-NAS frameworks: two-stage optimization and joint optimization. Comparing it to the problem formulation techniques of HW-NAS shown in Fig.  2 , two-stage optimization falls into the category of two-state methods, whereas joint optimization refers to the rest of the HW-NAS problem formulation methods.

In the two-stage optimization, a neural network model search space and hardware search space are separated. After defining the neural network model search space, the set of networks is sampled followed by HW-NAS to select a set of models with high-performance accuracy using a certain search algorithm. When the best-performing networks are selected, the set of networks is passed to the second stage of optimization. In the second optimization stage, the optimum hardware parameters are searched from the set of sampled hardware parameters. Finally, the second search stage outputs the optimum neural network model(s) and optimum hardware parameters.

In joint optimization, a large joint search space consisting of neural network models and hardware parameters is sampled to create a set of random neural network models. Then, HW-NAS is performed, searching for the optimum neural network model and hardware parameters simultaneously. Both performance accuracy and hardware metrics are used to evaluate the performance sampled networks and find the most optimum design.

Two-stage optimization can simplify the search, as the best-performing models are selected in the first stage only based on performance accuracy. This makes the search space smaller in the second stage, where hardware parameters are selected. However, this approach can lead to local optimization and might not explore the search space fully. In joint optimization, the search space is large, which can make the search slower and more complex. However, it also allows the selection of the best-performing models considering design parameters and has more probability of reaching the global solution. Also, as shown in ref.  15 , there is a correlation between the hardware parameters and performance accuracy. In addition, the problem formulation methods and end goal of HW-NAS should be considered when selecting the methods to add the hardware parameters to the search.

Outlook and recommendations

Even though methods and frameworks for hardware–software co-design techniques for IMC, and HW-NAS in particular, have already been developed, there are still several open challenges in HW-NAS for IMC to be addressed. This section covers the open issues and future research directions for HW-NAS for IMC and provides recommendations for hardware evaluation techniques, mapping neural network models to hardware, and IMC system co-optimization.

Open problems and challenges in HW-NAS for IMC

A roadmap for HW-NAS for IMC architectures, including state-of-the-art frameworks, open problems and future development, is illustrated in Fig.  5 . One of the main challenges is the lack of a unified framework searching for both neural network design parameters and hardware parameters. Moreover, none of the reported HW-NAS frameworks for IMC can prune, quantize and perform NAS in one flow. Combining these three optimizations in a single framework and optimizing a search time for such a large search space is an open challenge for IMC architectures. One example of a similar existing framework is APQ, which targets a constrained NAS problem but for a digital neural network accelerator 8 .

figure 5

Summary of what has been accomplished by state-of-the-art hardware-aware neural architecture search (HW-NAS) frameworks for in-memory computing (IMC), and main perspectives and directions for future development. AI, artificial intelligence; ASIC, application-specific integrated circuit; CPU, central processing unit; FPGA, field-programmable gate array; GPU, graphical processing unit; GA, genetic algorithm.

Different frameworks focus on different hardware implementations and parameters, and different neural network designs (Table  1 ). Most of the frameworks focus only on specific issues without considering various HW aspects, such as the study of the correlation between crossbar size and convolution kernel sizes in the search engine NAX 15 . Therefore, a fair comparison between methods for HW-NAS for IMC is not possible, which leads to a lack of benchmarking of various HW-NAS techniques and search strategies. For the end user, it is still challenging to understand which search algorithm will perform better, what possible speed-up could be provided by certain algorithms, and which techniques of HW-NAS are the most efficient for IMC applications. There is a lack of quantitative comparison of HW-NAS methods, especially considering various hardware parameters in the search.

Moreover, state-of-the-art HW-NAS frameworks for IMC architectures focus mostly on different types of convolutional neural networks for computer vision applications, such as ResNet or VGG. However, there are many other neural network types to which the HW-NAS approach for IMC architectures has not yet been applied, such as transformer-based networks 103 or graph networks 104 . There are open challenges related to hardware–software co-design of such models and IMC architectures for various applications, for example biomedical tasks 105 , language processing 106 or recommender systems 81 .

In addition to the lack of HW-NAS frameworks for IMC focusing on diverse neural network models, the same applies to HW-NAS benchmarks. In NAS, benchmarks are the datasets describing the accuracy of all the sampled architectures in a certain search space 107 . NAS benchmarks are important to ensure reproducibility of the experiments and comparison of different methods and algorithms, avoiding extensive computations when generating a search space. These include NAS-Bench-101 (ref.  108 ), NAS-Bench-201 (ref.  109 ) and NAS-Bench-360 (ref.  110 ). It is also important to extend such benchmarks to the hardware domain. For example, HW-NAS benchmarks, including energy and latency, for different types of edge devices, such as mobile devices, ASIC and FPGA, have been demonstrated 86 . An HW-NAS benchmark for IMC architectures is still an open problem, which is essential to be addressed for further development of HW-NAS algorithms for IMC applications.

From the hardware perspective, most existing frameworks for HW-NAS for IMC focus on the standard mixed-signal IMC architecture 14 , 17 , 70 , 99 , because of the availability of open-source frameworks for hardware evaluation, such as DNN+NeuroSim 87 . However, there are a lot of other IMC architectures, where different design parameters, devices and technologies are used. The main issue is the adaptation of state-of-the-art HW-NAS frameworks for the other IMC hardware architectures without implementing the frameworks from scratch.

Further development is required in hardware–software co-design techniques to transfer the neural network model from software to hardware. Techniques to speed up the hardware simulation are needed. Even though most HW-NAS frameworks for IMC use software-level simulations (such as Python or C++) to approximate and speed up simulations of circuits and architectures (as for SPICE-level simulations), further development and improvements in hardware–software co-design frameworks are required 7 . Moreover, hardware–software co-design includes IMC-related compiler optimizations, which can translate deep learning operations to IMC-friendly hardware implementations.

The complexity and runtime issues of the search algorithms should also be addressed. Although there are various methods to speed up HW-NAS, the architecture search is still complex and slow, especially when a lot of search parameters are considered. Moreover, the complexity of HW-NAS increases even further when IMC hardware search space is considered. In most of the search strategies, except the differentiable search, the search time and search space increase exponentially with the number of search parameters.

Finally, the step further is to create fully automated NAS methods capable of constructing new deep learning operations and algorithms suitable for IMC with minimal human design. One such approach for general ML algorithms and neural networks is illustrated by AutoML-Zero 111 , which automatically searches for the whole ML algorithm, including the model, optimization procedures and initialization techniques, with minimum restriction on the form and only simple mathematical operations. Such algorithms aim to reduce human intervention in the design and allow constructing of an algorithm and neural network model without predefined complex building blocks, which can adapt to different tasks. Adding IMC-awareness to such methods is a step forward in fully automating IMC-based neural network hardware design.

Hardware evaluation frameworks

The frameworks for HW-NAS require the estimation of the hardware metrics, even when not searching for the hardware parameters. Several open-source hardware estimation frameworks can be used to estimate hardware metrics for standard IMC architecture, such as DNN+NeuroSim 87 and PUMAsim 45 . Both frameworks allow setting up of several computation cores (crossbars) and architecture-related parameters. NeuroSim also includes several hardware non-idealities and precision parameters. In addition, adding hardware non-idealities and noise effects to the search framework is an essential step toward developing highly effective surrogate models, as they can affect the performance accuracy and hardware metrics, and also can be compensated by the selected neural network model parameters 15 , 80 , 100 . One of the open-source frameworks that can be used for IMC hardware non-idealities simulations is GENIEx 112 . For the non-standard designs, there is still a lack of open-source frameworks, so designers are still required to create custom hardware evaluation frameworks for non-trivial IMC architecture or customize the existing ones. This challenge can be addressed by developing a framework generating IMC hardware description automatically from a neural network model.

Mapping deep neural network models to IMC hardware

Another drawback of the state-of-the-art HW-NAS models for IMC is the lack of consideration of dataflow optimization methods and mapping techniques of ML model to the IMC hardware. Dataflow optimization involves consideration of data movement, including transmission of inputs, outputs, intermediate values and control instructions within the IMC architecture and to the external system components, while mapping covers the split of the neural network model across the available hardware resources. In addition to the several layers of the hierarchy of the IMC architecture, including processing elements, computing elements, and tiles (Supplementary Fig.  1 ), which should be considered for efficient mapping, the IMC accelerator is also connected to the external memory and cache system (Supplementary Fig.  3 ). Global registers, SRAM global buffer, cache and DRAM are used for storing and fetching inputs, outputs and neural network weights (in the case of larger deep neural networks when all the layers cannot fit into the accelerator processing elements). Some IMC systems can also have a local cache, such as the RIMAC IMC accelerator that contains a local analog cache to reduce the overhead of ADCs and DACs 47 . The benefits of IMC — energy efficiency and low latency — can fully be exploited only if the data-path and data-mapping to the external memory are optimized 113 .

Depending on the workloads and types of layers, the optimum mapping varies. For example, there are several ways to map the convolution layer to the crossbar, including dense and sparse mapping 75 . However, depth-wise separable convolution layers should be converted to a dense layer form, which yields highly inefficient performance 114 . Mapping of neural network layers to processing elements, computing elements and tiles is considered in the existing IMC hardware simulators. The design space exploration framework, for example, supports different types of mapping and investigates its effects on the RRAM-based IMC architecture performance 75 . HW-NAS frameworks based on NeuroSim 87 and AIHWKit 90 consider the mapping to the real hardware models, and NeuroSim performs the optimization of hardware utilization. Nevertheless, the types of supported layers are still limited, and the hardware architecture is fixed.

In contrast, optimization of data movement from and to the external memory, fetching and storing inputs and outputs considering different levels of the external memory hierarchy, is rarely done in the existing frameworks 113 . The trade-off between the global buffer or cache sizes and the data movement time affects IMC hardware efficiency. Spatial and temporal data reuse can also reduce latency and improve energy efficiency. Therefore, mapping ML and neural network models to IMC hardware considering the data path and the external memory based on the workloads is a separate optimization problem. One of the existing frameworks that can be used for these purposes is ZigZag 115 , which aims to optimize even and uneven mapping to large balanced and unbalanced memory hierarchy design space and is suitable for IMC hardware accelerators 113 .

HW-NAS and IMC co-optimization

As of the beginning of 2024, HW-NAS frameworks for IMC applications mostly cover device-related, circuit-related and algorithm-related optimizations (Fig.  6 ). However, to design the optimized IMC hardware, the architecture- and system-level design should also be optimized. Therefore, it is important to combine HW-NAS with other optimization techniques.

figure 6

Hardware–software co-design flow covers the optimization of devices, circuits, architectures, systems and algorithms. Hardware-aware neural architecture search (HW-NAS) can be involved in device-level, circuit-level and algorithm-level optimizations. Including optimization of architecture and system-related parameters in HW-NAS is a potential direction of future research, which could help to automate hardware–software co-design and further improve the optimized solutions.

In most state-of-the-art HW-NAS optimization frameworks, the architecture is fixed, including mapping and communication between tiles. However, architecture-related details in IMC architectures should also be optimized. For example, effective on-chip communication and optimized interconnect choice are critical for IMC hardware 44 .

The system-level design is the part responsible for translating the algorithm to the hardware, which also requires optimization. From the programming perspective, there are various high-level hardware–software co-design concepts requiring design and optimization, including communication of the IMC accelerator with the CPU, programming commands and instructions, issues with shared off-chip memory, and automated scheduling. These challenges related to accelerator programming are rarely considered by IMC hardware designers, even though there are many design aspects to optimize 116 .

In most cases, IMC accelerators are not used as a standalone system and are considered as co-processors that need to communicate with the host CPU. Control and instruction units on both sides, CPU and IMC accelerator, and extending instruction set architecture are thus needed. Extending instruction set architecture is an abstraction of hardware–software interface defining the main hardware operations and control commands. This interface is a key to support the implementation of different algorithms on hardware 117 , including mapping different types of neural networks to IMC hardware. Creating instruction set architectures and optimizing the programming of an IMC architecture using high-level languages is important for IMC architectures 118 . Hardware compilers and automated translation of the software commands to hardware implementation are also open challenges for IMC architectures.

Often external DRAM should be shared between the CPU and IMC accelerator to support storage of large models and large tensors that do not fit on on-chip memories. This point brings in other design issues to consider and optimize, including data sharing, virtual address translation (to translate the memory address used by IMC architecture to the global DRAM address) and cache coherence (if the IMC accelerator shares the cache with the CPU). Also, as swapping and re-loading neural network weights to IMC architecture can reintroduce data movement issues, it can be more practical to split the large network models between the CPU and weight-stationary IMC architecture when it does not fit an IMC accelerator. This point might require additional optimization. Another optimization challenge is the automated runtime scheduling of IMC tasks 116 . Therefore, consideration of the higher-level programming perspective, automating and simplifying the translation of the software algorithms to hardware, is the next optimization step in IMC hardware accelerators for ML and neural network applications. In general, it is crucial to combine HW-NAS with other optimization techniques to design efficient IMC hardware for AI and ML applications.

Bengio, Y., Lecun, Y. & Hinton, G. Deep learning for AI. Commun. ACM 64 , 58–65 (2021). This work provides an overview of deep learning methods for artificial intelligence applications and related future directions.

Article   Google Scholar  

Krestinskaya, O., James, A. P. & Chua, L. O. Neuromemristive circuits for edge computing: a review. IEEE Trans. Neural Netw. Learn. Syst. 31 , 4–23 (2019).

Article   MathSciNet   Google Scholar  

Ielmini, D. & Wong, H.-S. P. In-memory computing with resistive switching devices. Nat. Electron. 1 , 333–343 (2018).

Sebastian, A., Le Gallo, M., Khaddam-Aljameh, R. & Eleftheriou, E. Memory devices and applications for in-memory computing. Nat. Nanotechnol. 15 , 529–544 (2020). This work explains the importance of and highlights the application landscape of in-memory computing, and also includes an overview of in-memory computing devices.

Xia, Q. & Yang, J. J. Memristive crossbar arrays for brain-inspired computing. Nat. Mater. 18 , 309–323 (2019).

Yang, J. J., Strukov, D. B. & Stewart, D. R. Memristive devices for computing. Nat. Nanotechnol. 8 , 13–24 (2013).

Zhang, W. et al. Neuro-inspired computing chips. Nat. Electron. 3 , 371–382 (2020). This work benchmarks in-memory computing architectures, presents the requirements for device metrics based on different applications and provides an in-memory computing roadmap.

Wang, T. et al. Apq: joint search for network architecture, pruning and quantization policy. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 2078–2087 (IEEE/CVF, 2020).

Benmeziane, H. et al. A comprehensive survey on hardware-aware neural architecture search. Preprint at https://doi.org/10.48550/arXiv.2101.09336 (2021).

Chitty-Venkata, K. T. & Somani, A. K. Neural architecture search survey: a hardware perspective. ACM Comput. Surv. 55 , 1–36 (2022).

Benmeziane, H. et al. Hardware-aware neural architecture search: survey and taxonomy. In Proc. Thirtieth International Joint Conference on Artificial Intelligence 4322–4329 (IJCAI, 2021).

Benmeziane, H. et al. AnalogNAS: a neural network design framework for accurate inference with analog in-memory computing. In 2023 IEEE International Conference on Edge Computing and Communications (EDGE) 233–244 (IEEE, 2023).

Yuan, Z. et al. NAS4RRAM: neural network architecture search for inference on RRAM-based accelerators. Sci. China Inf. Sci. 64 , 160407 (2021).

Guan, Z. et al. A hardware-aware neural architecture search Pareto front exploration for in-memory computing. In 2022 IEEE 16th International Conference on Solid-State & Integrated Circuit Technology (ICSICT ) 1–4 (IEEE, 2022).

Negi, S., Chakraborty, I., Ankit, A. & Roy, K. NAX: neural architecture and memristive xbar based accelerator co-design. In Proc. 59th ACM/IEEE Design Automation Conference 451–456 (IEEE, 2022).

Sun, H. et al. Gibbon: efficient co-exploration of NN model and processing-in-memory architecture. In 2022 Design, Automation and Test in Europe Conference and Exhibition (DATE) 867–872 (IEEE, 2022).

Jiang, W. et al. Device-circuit-architecture co-exploration for computing-in-memory neural accelerators. IEEE Trans. Comput. 70 , 595–605 (2020).

Elsken, T., Metzen, J. H. & Hutter, F. Neural architecture search: a survey. J. Mach. Learn. Res. 20 , 1997–2017 (2019).

MathSciNet   Google Scholar  

Ren, P. et al. A comprehensive survey of neural architecture search: challenges and solutions. ACM Comput. Surv. 54 , 1–34 (2021). This work provides a survey on neural architecture search from the software, algorithms and frameworks perspective.

Google Scholar  

Sekanina, L. Neural architecture search and hardware accelerator co-search: a survey. IEEE Access 9 , 151337–151362 (2021).

Zhang, X., Jiang, W., Shi, Y. & Hu, J. When neural architecture search meets hardware implementation: from hardware awareness to co-design. In 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) 25–30 (IEEE, 2019).

Efnusheva, D., Cholakoska, A. & Tentov, A. A survey of different approaches for overcoming the processor-memory bottleneck. Int. J. Comput. Sci. Inf. Technol. 9 , 151–163 (2017).

Liu, B. et. al. Hardware acceleration for neuromorphic computing: An evolving view. In 15th Non-Volatile Memory Technology Symposium (NVMTS) 1–4 (IEEE, 2015).

Yantır, H. E., Eltawil, A. M. & Salama, K. N. IMCA: an efficient in-memory convolution accelerator. IEEE Trans. Very Large Scale Integr. Syst. 29 , 447–460 (2021).

Fouda, M. E., Yantır, H. E., Eltawil, A. M. & Kurdahi, F. In-memory associative processors: tutorial, potential, and challenges. IEEE Trans. Circuits Syst. II Express Briefs 69 , 2641–2647 (2022).

Yantır, H. E., Eltawil, A. M. & Salama, K. N. A hardware/software co-design methodology for in-memory processors. J. Parallel Distrib. Comput. 161 , 63–71 (2022).

Lotfi-Kamran, P. et al. Scale-out processors. ACM SIGARCH Comput. Archit. N. 40 , 500–511 (2012).

Ali, M. et al. Compute-in-memory technologies and architectures for deep learning workloads. IEEE Trans. Very Large Scale Integr. Syst. 30 , 1615–1630 (2022).

Ielmini, D. & Pedretti, G. Device and circuit architectures for in‐memory computing. Adv. Intell. Syst. 2 , 2000040 (2020). This work provides an extensive overview of in-memory computing devices.

Lanza, M. et al. Memristive technologies for data storage, computation, encryption, and radio-frequency communication. Science 376 , eabj9979 (2022). This work reviews in-memory computing devices, related computations and their applications.

Mannocci, P. et al. In-memory computing with emerging memory devices: status and outlook. APL Mach. Learn. 1 , 010902 (2023).

Sun, Z. et al. A full spectrum of computing-in-memory technologies. Nat. Electron . https://doi.org/10.1038/s41928-023-01053-4 (2023).

Smagulova, K., Fouda, M. E., Kurdahi, F., Salama, K. N. & Eltawil, A. Resistive neural hardware accelerators. Proc. IEEE 111 , 500–527 (2023). This work provides an overview of in-memory computing-based deep learning accelerators.

Rasch, M. Neural network accelerator design with resistive crossbars: opportunities and challenges. IBM J. Res. Dev. 63 , 10:11–10:13 (2019).

Ankit, A., Chakraborty, I., Agrawal, A., Ali, M. & Roy, K. Circuits and architectures for in-memory computing-based machine learning accelerators. IEEE Micro 40 , 8–22 (2020).

Gebregiorgis, A. et al. A survey on memory-centric computer architectures. ACM J. Emerg. Technol. Comput. Syst. 18 , 1–50 (2022).

Aguirre, F. et al. Hardware implementation of memristor-based artificial neural networks. Nat. Commun. 15 , 1974 (2024).

Rasch, M. J. et al. Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators. Nat. Commun. 14 , 5282 (2023).

Le Gallo, M. et al. A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference. Nat. Electron. 6 , 680–693 (2023).

Fick, L., Skrzyniarz, S., Parikh, M., Henry, M. B. & Fick, D. Analog matrix processor for edge AI real-time video analytics. In 2022 IEEE International Solid-State Circuits Conference (ISSCC) 260–262 (IEEE, 2022).

Shafiee, A. et al. ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Comput. Archit. Netw. 44 , 14–26 (2016).

Krishnan, G. et al. SIAM: chiplet-based scalable in-memory acceleration with mesh for deep neural networks. ACM Trans. Embedded Comput. Syst. 20 , 1–24 (2021). This work provides an overview of the hierarchical system-level design of in-memory computing accelerators for deep neural networks.

Ankit, A. et al. Panther: a programmable architecture for neural network training harnessing energy-efficient reram. IEEE Trans. Comput. 69 , 1128–1142 (2020).

Krishnan, G. et al. Impact of on-chip interconnect on in-memory acceleration of deep neural networks. ACM J. Emerg. Technol. Comput. Syst. 18 , 1–22 (2021).

Ankit, A. et al. PUMA: a programmable ultra-efficient memristor-based accelerator for machine learning inference. In Proc 24th International Conference on Architectural Support for Programming Languages and Operating Systems 715–731 (ACM, 2019).

Li, W. et al. TIMELY: pushing data movements and interfaces in PIM accelerators towards local and in time domain. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) 832–845 (IEEE, 2020).

Chen, P., Wu, M., Ma, Y., Ye, L. & Huang, R. RIMAC: an array-level ADC/DAC-free ReRAM-based in-memory DNN processor with analog cache and computation. In Proc. 28th Asia and South Pacific Design Automation Conference 228–233 (ACM, 2023).

Zhang, B. et al. PIMCA: a programmable in-memory computing accelerator for energy-efficient dnn inference. IEEE J. Solid-State Circ. 58 , 1436–1449 (2022).

Kim, D. E., Ankit, A., Wang, C. & Roy, K. SAMBA: sparsity aware in-memory computing based machine learning accelerator. IEEE Trans. Comput . 72 , 2615–2627 (2023).

Jain, S. et al. A heterogeneous and programmable compute-in-memory accelerator architecture for analog-AI using dense 2-D mesh. IEEE Trans. Very Large Scale Integr. Syst. 31 , 114–127 (2022).

Wang, X. et al. TAICHI: a tiled architecture for in-memory computing and heterogeneous integration. IEEE Trans. Circ. Syst. II Express Briefs 69 , 559–563 (2021).

Wan, W. et al. A compute-in-memory chip based on resistive random-access memory. Nature 608 , 504–512 (2022).

Hung, J.-M. et al. A four-megabit compute-in-memory macro with eight-bit precision based on CMOS and resistive random-access memory for AI edge devices. Nat. Electron. 4 , 921–930 (2021).

Xue, C.-X. et al. A 22nm 4Mb 8b-precision ReRAM computing-in-memory macro with 11.91 to 195.7 TOPS/W for tiny AI edge devices. In IEEE International Solid-State Circuits Conference (ISSCC) 245–247 (IEEE, 2021).

Khwa, W.-S. et al. A 40-nm, 2M-cell, 8b-precision, hybrid SLC-MLC PCM computing-in-memory macro with 20.5-65.0 TOPS/W for tiny-Al edge devices. In IEEE International Solid-State Circuits Conference (ISSCC) 1–3 (IEEE, 2022).

Jia, H. et al. Scalable and programmable neural network inference accelerator based on in-memory computing. IEEE J. Solid State Circuits 57 , 198–211 (2021).

Jung, S. et al. A crossbar array of magnetoresistive memory devices for in-memory computing. Nature 601 , 211–216 (2022).

Krestinskaya, O., Zhang, L. & Salama, K. N. Towards efficient RRAM-based quantized neural networks hardware: state-of-the-art and open issues. In IEEE 22nd International Conference on Nanotechnology (NANO) 465–468 (IEEE, 2022).

Joshi, V. et al. Accurate deep neural network inference using computational phase-change memory. Nat. Commun. 11 , 2473 (2020).

Cao, T. et al. A non-idealities aware software–hardware co-design framework for edge-ai deep neural network implemented on memristive crossbar. IEEE J. Emerg. Sel. Top. Circuits Syst. 12 , 934–943 (2022).

Wen, W., Wu, C., Wang, Y., Chen, Y. & Li, H. Learning structured sparsity in deep neural networks. Adv. Neural Inf. Proc. Syst . 29 , 2074–2082 (2016).

Peng, J. et al. CMQ: crossbar-aware neural network mixed-precision quantization via differentiable architecture search. IEEE Trans. Comput. Des. Integr. Circuits Syst. 41 , 4124–4133 (2022).

Huang, S. et al. Mixed precision quantization for ReRAM-based DNN inference accelerators. In Proc. 26th Asia and South Pacific Design Automation Conference 372–377 (ACM, 2021).

Meng, F.-H., Wang, X., Wang, Z., Lee, E. Y.-J. & Lu, W. D. Exploring compute-in-memory architecture granularity for structured pruning of neural networks. IEEE J. Emerg. Sel. Top. Circuits Syst. 12 , 858–866 (2022).

Krestinskaya, O., Zhang, L. & Salama, K. N. Towards efficient in-memory computing hardware for quantized neural networks: state-of-the-art, open challenges and perspectives. IEEE Trans. Nanotechnol. 22 , 377–386 (2023).

Li, Y., Dong, X. & Wang, W. Additive powers-of-two quantization: an efficient non-uniform discretization for neural networks. In International Conference on Learning Representations (ICRL) (ICRL, 2020).

Karimzadeh, F., Yoon, J.-H. & Raychowdhury, A. Bits-net: bit-sparse deep neural network for energy-efficient RRAM-based compute-in-memory. IEEE Trans. Circuits Syst. I: Regul. Pap. 69 , 1952–1961 (2022).

Yang, H., Duan, L., Chen, Y. & Li, H. BSQ: exploring bit-level sparsity for mixed-precision neural network quantization. In International Conference on Learning Representations (ICRL) (ICRL, 2020).

Qu, S. et al. RaQu: an automatic high-utilization CNN quantization and mapping framework for general-purpose RRAM accelerator. In 2020 57th ACM/IEEE Design Automation Conference (DAC) 1–6 (IEEE, 2020).

Kang, B. et al. Genetic algorithm-based energy-aware CNN quantization for processing-in-memory architecture. IEEE J. Emerg. Sel. Top. Circuits Syst. 11 , 649–662 (2021).

Li, S., Hanson, E., Li, H. & Chen, Y. Penni: pruned kernel sharing for efficient CNN inference. In International Conference on Machine Learning 5863–5873 (PMLR, 2020).

Yang, S. et al. AUTO-PRUNE: automated DNN pruning and mapping for ReRAM-based accelerator. In Proc. ACM International Conference on Supercomputing 304–315 (ACM, 2021).

Zhang, T. et al. Autoshrink: a topology-aware NAS for discovering efficient neural architecture. In Proc. AAAI Conference on Artificial Intelligence 6829–6836 (AAAI, 2020).

Cheng, H.-P. et al. NASGEM: neural architecture search via graph embedding method. in Proc. AAAI Conference on Artificial Intelligence 7090–7098 (AAAI, 2021).

Lammie, C. et al. Design space exploration of dense and sparse mapping schemes for RRAM architectures. In 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 1107–1111 (IEEE, 2022).

Fiacco, A. V. & McCormick, G. P. Nonlinear Programming: Sequential Unconstrained Minimization Techniques (SIAM, 1990).

Lu, Z. et al. NSGA-Net: neural architecture search using multi-objective genetic algorithm. In. Proc. Genetic and Evolutionary Computation Conference 419–427 (ACM, 2019).

Guo, Y. et al. Pareto-aware neural architecture generation for diverse computational budgets. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 2247–2257 (IEEE, 2023).

Qu, S., Li, B., Wang, Y. & Zhang, L. ASBP: automatic structured bit-pruning for RRAM-based NN accelerator. In 2021 58th ACM/IEEE Design Automation Conference (DAC) 745–750 (IEEE, 2021).

Krestinskaya, O., Salama, K. & James, A. P. Towards hardware optimal neural network selection with multi-objective genetic search. In 2020 IEEE International Symposium on Circuits and Systems (ISCAS) 1–5 (IEEE, 2020).

Zhang, T. et al. NASRec: weight sharing neural architecture search for recommender systems. In Proc. ACM Web Conference 1199–1207 (ACM, 2023).

Stolle, K., Vogel, S., van der Sommen, F. & Sanberg, W. in Joint European Conference on Machine Learning and Knowledge Discovery in Databases 463–479 (Springer).

Li, L. & Talwalkar, A. Random search and reproducibility for neural architecture search. In Uncertainty in Artificial Intelligence 367–377 (PMLR, 2020).

Huang, H., Ma, X., Erfani, S. M. & Bailey, J. Neural architecture search via combinatorial multi-armed bandit. In 2021 International Joint Conference on Neural Networks (IJCNN) 1–8 (IEEE, 2021).

Liu, C.-H. et al. FOX-NAS: fast, on-device and explainable neural architecture search. In Proc. IEEE/CVF International Conference on Computer Vision 789–797 (IEEE, 2021).

Li, C. et al. HW-NAS-Bench: hardware-aware neural architecture search benchmark. In 2021 International Conference on Learning Representations (ICRL, 2021) .

Peng, X., Huang, S., Luo, Y., Sun, X. & Yu, S. DNN+ NeuroSim: an end-to-end benchmarking framework for compute-in-memory accelerators with versatile device technologies. In 2019 IEEE International Electron Devices Meeting (IEDM) 32.35.31–32.35.34 (IEEE, 2019).

Xia, L. et al. MNSIM: Simulation platform for memristor-based neuromorphic computing system. IEEE Trans. Comput. Des. Integr. Circuits Syst. 37 , 1009–1022 (2017).

Zhu, Z. et al. MNSIM 2.0: a behavior-level modeling tool for memristor-based neuromorphic computing systems. In Proc. 2020 on Great Lakes Symposium on VLSI 83–88 (ACM, 2020).

Rasch, M. J. et al. A flexible and fast PyTorch toolkit for simulating training and inference on analog crossbar arrays. In 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS) 1–4 (IEEE, 2021).

Lee, H., Lee, S., Chong, S. & Hwang, S. J. Hardware-adaptive efficient latency prediction for nas via meta-learning. Adv. Neural Inf. Process. Syst. 34 , 27016–27028 (2021).

Laube, K. A., Mutschler, M. & Zell, A. What to expect of hardware metric predictors in NAS. In International Conference on Automated Machine Learning 13/11–13/15 (PMLR, 2022).

Hu, Y., Shen, C., Yang, L., Wu, Z. & Liu, Y. A novel predictor with optimized sampling method for hardware-aware NAS. In 2022 26th International Conference on Pattern Recognition (ICPR) 2114–2120 (IEEE, 2022).

Guo, Z. et al. Single path one-shot neural architecture search with uniform sampling. In Proc. Computer Vision — ECCV 2020: 16th European Conference Part XVI 16, 544–560 (Springer, 2020).

Shu, Y., Chen, Y., Dai, Z. & Low, B. K. H. Neural ensemble search via Bayesian sampling. In 38th Conference on Uncertainty in Artificial Intelligence (UAI) 1803-1812 (PMLR, 2022).

Wang, D., Li, M., Gong, C. & Chandra, V. AttentiveNAS: improving neural architecture search via attentive sampling. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 6418–6427 (IEEE, 2021).

Yang, Z. & Sun, Q. Efficient resource-aware neural architecture search with dynamic adaptive network sampling. In IEEE International Symposium on Circuits and Systems (ISCAS) 1–5 (IEEE, 2021).

Lyu, B. & Wen, S. TND-NAS: towards non-differentiable objectives in differentiable neural architecture search. In Proc. 3rd International Symposium on Automation, Information and Computing (INSTICC, 2022).

Li, G., Mandal, S. K., Ogras, U. Y. & Marculescu, R. FLASH: fast neural architecture search with hardware optimization. ACM Trans. Embedded Comput. Syst. 20 , 1–26 (2021).

Krestinskaya, O., Salama, K. N. & James, A. P. Automating analogue AI chip design with genetic search. Adv. Intell. Syst. 2 , 2000075 (2020).

Yan, Z., Juan, D.-C., Hu, X. S. & Shi, Y. Uncertainty modeling of emerging device based computing-in-memory neural accelerators with application to neural architecture search. In Proc. 26th Asia and South Pacific Design Automation Conference 859–864 (ACM, 2021).

Yang, X. et al. Multi-objective optimization of ReRAM crossbars for robust DNN inferencing under stochastic noise. In 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD) 1–9 (IEEE, 2021).

Chitty-Venkata, K. T., Emani, M., Vishwanath, V. & Somani, A. K. Neural architecture search for transformers: a survey. IEEE Access   10 , 108374–108412 (2022).

Oloulade, B. M., Gao, J., Chen, J., Lyu, T. & Al-Sabri, R. Graph neural architecture search: a survey. Tsinghua Sci. Technol. 27 , 692–708 (2021).

Al-Sabri, R., Gao, J., Chen, J., Oloulade, B. M. & Lyu, T. Multi-view graph neural architecture search for biomedical entity and relation extraction. IEEE/ACM Trans. Comput. Biol. Bioinform. 20 , 1221–1233 (2022).

Klyuchnikov, N. et al. NAS-Bench-NLP: neural architecture search benchmark for natural language processing. IEEE Access   10 , 45736–45747 (2022).

Chitty-Venkata, K. T., Emani, M., Vishwanath, V. & Somani, A. K. Neural architecture search benchmarks: insights and survey. IEEE Access   11 , 25217–25236 (2023).

Ying, C. et al. NAS-Bench-101: towards reproducible neural architecture search. In International Conference on Machine Learning 7105–7114 (PMLR, 2019).

Dong, X. & Yang, Y. NAS-Bench-201: extending the scope of reproducible neural architecture search. In 2020 International Conference on Learning Representations (ICLR) (ICLR, 2020).

Tu, R. et al. NAS-Bench-360: benchmarking neural architecture search on diverse tasks. Adv. Neural Inf. Process. Syst. 35 , 12380–12394 (2022).

Real, E., Liang, C., So, D. & Le, Q. AutoML-Zero: evolving machine learning algorithms from scratch. In International Conference on Machine Learning 8007–8019 (PMLR, 2020).

Chakraborty, I., Ali, M. F., Kim, D. E., Ankit, A. & Roy, K. GENIEx: a generalized approach to emulating non-ideality in memristive xbars using neural networks. In 2020 57th ACM/IEEE Design Automation Conference (DAC) 1–6 (IEEE, 2020).

Houshmand, P. et al. Assessment and optimization of analog-in-memory-compute architectures for DNN processing. In IEEE International Electron Devices Meeting (IEEE, 2020).

Zhou, C. et al. ML-HW co-design of noise-robust tinyml models and always-on analog compute-in-memory edge accelerator. IEEE Micro 42 , 76–87 (2022).

Mei, L., Houshmand, P., Jain, V., Giraldo, S. & Verhelst, M. ZigZag: enlarging joint architecture-mapping design space exploration for DNN accelerators. IEEE Trans. Comput. 70 , 1160–1174 (2021).

Ghose, S., Boroumand, A., Kim, J. S., Gómez-Luna, J. & Mutlu, O. Processing-in-memory: a workload-driven perspective. IBM J. Res. Dev. 63 , 1–3 (2019).

Liu, R. et al. FeCrypto: instruction set architecture for cryptographic algorithms based on FeFET-based in-memory computing. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 42 , 2889–2902 (2023).

Mambu, K., Charles, H.-P. & Kooli, M. Dedicated instruction set for pattern-based data transfers: an experimental validation on systems containing in-memory computing units. IEEE Trans. Comput. Aided Design Integr. Circuits Syst . 42 , 3757–3767 (2023).

Jiang, N. et al. A detailed and flexible cycle-accurate network-on-chip simulator. In 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS ) 86–96 (IEEE, 2013).

Jiang, H., Huang, S., Peng, X. & Yu, S. MINT: mixed-precision RRAM-based in-memory training architecture. In 2020 IEEE International Symposium on Circuits and Systems (ISCAS) 1–5 (IEEE, 2020).

Download references

Acknowledgements

This work was supported by the King Abdullah University of Science and Technology through the Competitive Research Grant program under grant URF/1/4704-01-01.

Author information

Authors and affiliations.

Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia

Olga Krestinskaya, Suhaib A. Fahmy, Ahmed Eltawil & Khaled N. Salama

Rain Neuromorphics, San Francisco, CA, USA

Mohammed E. Fouda

IBM Research Europe, Ruschlikon, Switzerland

Hadjer Benmeziane & Abu Sebastian

IBM T. J. Watson Research Center, Yorktown Heights, NY, USA

Kaoutar El Maghraoui

Electrical Engineering and Computer Science Department, University of Michigan, Ann Arbor, MI, USA

Physical Science and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia

Mario Lanza

Electrical and Computer Engineering Department, Duke University, Durham, NC, USA

Electrical Engineering and Computer Science Department, University of California at Irvine, Irvine, CA, USA

Fadi Kurdahi

You can also search for this author in PubMed   Google Scholar

Contributions

O.K. researched data and wrote the article. O.K., M.E.F., K.E.M., A.S., A.M.E. and K.N.S. contributed substantially to discussion of the content. O.K., M.E.F., H.B., K.E.M., A.S., W.D.L., M.L., H.L., F.K., S.A.F., A.M.E. and K.N.S. reviewed and/or edited the manuscript before submission.

Corresponding authors

Correspondence to Olga Krestinskaya or Khaled N. Salama .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Reviews Electrical Engineering thanks Arun Somani and Zheyu Yan for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, rights and permissions.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article.

Krestinskaya, O., Fouda, M.E., Benmeziane, H. et al. Neural architecture search for in-memory computing-based deep learning accelerators. Nat Rev Electr Eng (2024). https://doi.org/10.1038/s44287-024-00052-7

Download citation

Accepted : 23 April 2024

Published : 20 May 2024

DOI : https://doi.org/10.1038/s44287-024-00052-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

recent research paper on neural network

recent research paper on neural network

New Horizons for Fuzzy Logic, Neural Networks and Metaheuristics

  • © 2024
  • Oscar Castillo 0 ,
  • Patricia Melin 1

Division of Graduate Studies and Research, Tijuana Institute of Technology, Tijuana, Mexico

You can also search for this editor in PubMed   Google Scholar

Graduate Studies and Research, Tijuana Institute of Technology, Tijuana, Mexico

  • Outlines new horizons on the theoretical developments of fuzzy logic, neural networks and optimization algorithms
  • Presents applications in areas such as intelligent control and robotics, pattern recognition and medical diagnosis
  • Contains a collection of papers focused on hybrid intelligent systems based on soft computing techniques

Part of the book series: Studies in Computational Intelligence (SCI, volume 1149)

This is a preview of subscription content, log in via an institution to check access.

Access this book

  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

About this book

This book contains a collection of papers focused on hybrid intelligent systems based on soft computing techniques. In this book, new horizons on the theoretical developments of fuzzy logic, neural networks and optimization algorithms are envisioned. In addition, the abovementioned methods are discussed in application areas such as control and robotics, pattern recognition, medical diagnosis, decision-making, prediction and optimization of complex problems. There are a group of papers with the main theme of type-1, type-2 and type-3 fuzzy systems, which basically consists of papers that propose new concepts and algorithms based on type-1, type-2 and type-3 fuzzy logic and their applications. There is also a group of papers that offer theoretical concepts and applications of meta-heuristics in different areas. Another group of papers outlines diverse applications of hybrid intelligent systems in real problems. There are also a group papers that present theory and practice of neural networks in different applications. Finally, there are papers that offer theory and practice of optimization and evolutionary algorithms in different application areas.

  • Computational Intelligence

Fuzzy Logic

Neural networks.

  • Optimization Algorithms
  • Artificial Intelligence

Table of contents (29 chapters)

Front matter, fuzzy adaptation of parameters in a multi-swarm particle swarm optimization (pso) algorithm applied to the optimization of a fuzzy controller.

  • Alejandra Mancilla, Oscar Castillo, Mario García-Valdez

Fuzzifying Intrusion Detection Systems with Modified Artificial Bee Colony and Support Vector Machine Algorithms

  • Rafael Burkhalter, Mario Bischof, Edy Portmann

Type-2 Mamdani Fuzzy System Optimization for a Classification Ensemble with Black Widow Optimizer

  • Sergio Varela-Santos, Patricia Melin

Towards Designing Interval Type-3 Fuzzy PID Controllers

  • Oscar Castillo, Patricia Melin

Classification of Consumption Level in Developing Countries for Time Series Prediction Using a Hierarchical Nested Artificial Neural Network Method

  • Martha Ramirez, Patricia Melin

Computer Aided Diagnosis for COVID-19 with Quantum Computing and Transfer Learning

  • Daniel Alejandro Lopez, Oscar Montiel, Miguel Lopez-Montiel, Oscar Castillo

Prescribed-Time Trajectory Tracking Control of Wheeled Mobile Robots Using Neural Networks and Robust Control Techniques

  • Victor D. Cruz-Lares, Jesus A. Rodriguez-Arellano, Luis T. Aguilar, Roger Miranda-Colorado

Generative Models for Class Imbalance Problem on BreakHis Dataset: A Case Study

  • Angel E. Rosales-Morales, Alfredo Gutiérrez-Alfaro, Manuel Ornelas-Rodríguez, Andrés Espinal, Alfonso Rojas-Domínguez, Héctor J. Puga-Soberanes et al.

Prediction Using a Fuzzy Inference System in the Classification Layer of a Convolutional Neural Network Replacing the Softmax Function

  • Yutzil Poma, Patricia Melin

Optimization

Optimization of lithium‐ion batteries using boltzmann metaheuristics systems: towards a green artificial intelligence.

  • Juan de Anda-Suárez, Edwin D. Rico-García, Germán Pérez-Zúñiga, José L. López-Ramírez

Novel Decomposition-Based Multi-objective Evolutionary Algorithm Using Reinforcement Learning Adaptive Operator Selection (MOEA/D-QL)

  • José Alfredo Brambila-Hernández, Miguel Ángel García-Morales, Héctor Joaquín Fraire-Huacuja, Laura Cruz-Reyes, Juan Frausto-Solís

Multiobjective Particle Swarm Optimization for the Hydro–Thermal Power Scheduling Problem

  • Norberto Castillo-García, Laura Cruz–Reyes, Juan Carlos Hernández Marín, Paula Hernández-Hernández

Comparative Analysis of Metaheuristic Algorithms for Standard Dynamic Multiobjective Optimization Problems

  • Norberto Castillo-García, Laura Cruz-Reyes, Juan Carlos Hernández Marín, Paula Hernández-Hernández

Hypervolume Indicator as an Estimator for Adaptive Operator Selection in an On-Line Multi-objective Hyper-heuristic

  • Jorge A. Soria-Alcaraz, Gabriela Ochoa, Marco A. Sotelo-Figueroa, Andres Espinal

Metaheuristics: Theory and Applications

A new breeding crossover approach for evolutionary algorithms.

  • J. C. Felix-Saul, Mario García-Valdez

Editors and Affiliations

Oscar Castillo

Patricia Melin

About the editors

Patricia Melin holds the Doctor in Science degree (Doctor Habilitatus D.Sc.) in Computer Science from the Polish Academy of Sciences (with the Dissertation “Hybrid Intelligent Systems for Pattern Recognition using Soft Computing”). She is a Professor of Computer Science in the Graduate Division, Tijuana Institute of Technology, Tijuana, Mexico, since 1998. In addition, she is serving as Director of Graduate Studies in Computer Science and head of the research group on Hybrid Neural Intelligent Systems (2000-present). Currently, she is President of NAFIPS (North American Fuzzy Information Processing Society). Prof. Melin is the founding Chair of the Mexican Chapter of the IEEE Computational Intelligence Society. She is member of the IEEE Neural Network Technical Committee (2007 to present), the IEEE Fuzzy System Technical Committee (2014 to present) and in Chair of the Task Force on Hybrid Intelligent Systems (2007 to present) and she is currently Associate Editor of the Journal of Information Sciences and IEEE Transactions on Fuzzy Systems. She is member of NAFIPS, IFSA, and IEEE. She belongs to the Mexican Research System with level III. Her research interests are in Modular Neural Networks, Type-2 Fuzzy Logic, Pattern Recognition, Fuzzy Control, Neuro-Fuzzy and Genetic-Fuzzy hybrid approaches. She has published over 220 journal papers, 20 authored books, 80 edited books, and more than 300 papers in conference proceedings with h-index of 82. She has served as Guest Editor of several Special Issues in the past, in journals like: Applied Soft Computing, Intelligent Systems, Information Sciences, Non-Linear Studies, JAMRIS, Fuzzy Sets and Systems, and Engineering Letters. Finally, he recently received the Recognition as Highly Cited Researcher in 2017 and 2018 by Clarivate Analytics and Web of Science.

Bibliographic Information

Book Title : New Horizons for Fuzzy Logic, Neural Networks and Metaheuristics

Editors : Oscar Castillo, Patricia Melin

Series Title : Studies in Computational Intelligence

DOI : https://doi.org/10.1007/978-3-031-55684-5

Publisher : Springer Cham

eBook Packages : Intelligent Technologies and Robotics , Intelligent Technologies and Robotics (R0)

Copyright Information : The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2024

Hardcover ISBN : 978-3-031-55683-8 Published: 22 May 2024

Softcover ISBN : 978-3-031-55686-9 Due: 22 June 2024

eBook ISBN : 978-3-031-55684-5 Published: 21 May 2024

Series ISSN : 1860-949X

Series E-ISSN : 1860-9503

Edition Number : 1

Number of Pages : XII, 434

Number of Illustrations : 39 b/w illustrations, 114 illustrations in colour

Topics : Computational Intelligence , Artificial Intelligence

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Computer Vision
  • Federated Learning
  • Reinforcement Learning
  • Natural Language Processing
  • New Releases
  • 100s of AI Courses
  • Advisory Board Members
  • 🐝 Partnership and Promotion

Logo

Researchers from the Toyota Research Institute have introduced Scalable UPtraining for Recurrent Attention (SUPRA), a method to convert pre-trained transformers into recurrent neural networks (RNNs). This approach leverages high-quality pre-training data from transformers while employing a linearization technique that replaces softmax normalization with GroupNorm. SUPRA is unique as it combines the strengths of transformers and RNNs, achieving competitive performance with reduced computational cost.

The SUPRA methodology involves uptraining transformers such as Llama2 and Mistral-7B. The process replaces softmax normalization with GroupNorm, including a small multi-layer perceptron (MLP) for projecting queries and keys. The models were trained using the RefinedWeb dataset with 1.2 trillion tokens. Training and fine-tuning were performed using a modified version of OpenLM, and evaluations were conducted with the Eleuther evaluation harness on standard NLU benchmarks. This approach allows transformers to operate recurrently and efficiently, handling short and long-context tasks.

recent research paper on neural network

The SUPRA method showed competitive performance on various benchmarks. It outperformed RWKV and RetNet on the HellaSwag benchmark, achieving a score of 77.9 compared to 70.9 and 73.0, respectively. The model also demonstrated strong results on other tasks, with scores of 76.3 on ARC-E, 79.1 on ARC-C, and 46.3 on MMLU. Training required only 20 billion tokens, significantly less than other models. Despite some performance drops in long-context tasks, SUPRA maintained robust results within its training context length.

In conclusion, the SUPRA method successfully converts pre-trained transformers into efficient RNNs, addressing the high computational costs of traditional transformers. By replacing softmax normalization with GroupNorm and using a small MLP, SUPRA models achieve competitive performance on benchmarks like HellaSwag and ARC-C with significantly reduced training data. This research highlights the potential for scalable, cost-effective NLP models, maintaining robust performance across various tasks and paving the way for more accessible advanced language processing technologies.

Check out the  Paper and GitHub . All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on  Twitter . Join our  Telegram Channel ,   Discord Channel , and  LinkedIn Gr oup .

If you like our work, you will love our  newsletter..

Don’t Forget to join our  42k+ ML SubReddit

recent research paper on neural network

Nikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.

DIAMOND (DIffusion as a Model of Environment Dreams): A Reinforcement Learning Agent Trained in a Diffusion World Model

This machine learning paper from stanford and the university of toronto proposes observational scaling laws: highlighting the surprising predictability of complex scaling phenomena.

  • This AI Paper Introduces KernelSHAP-IQ: Weighted Least Square Optimization for Shapley Interactions
  • This AI Paper Introduces Evo: A Genomic Foundation Model that Enables Prediction and Generation Tasks from the Molecular to Genome-Scale

RELATED ARTICLES MORE FROM AUTHOR

Octo: an open-sourced large transformer-based generalist robot policy trained on 800k trajectories from the open x-embodiment dataset, fairproof: an ai system that uses zero-knowledge proofs to publicly verify the fairness of a model while maintaining confidentiality, microsoft introduces phi silica: a 3.3 billion parameter ai model transforming efficiency and performance in personal computing, pyramidinfer: allowing efficient kv cache compression for scalable llm inference, octo: an open-sourced large transformer-based generalist robot policy trained on 800k trajectories from the..., diamond (diffusion as a model of environment dreams): a reinforcement learning agent trained in..., fairproof: an ai system that uses zero-knowledge proofs to publicly verify the fairness of..., microsoft introduces phi silica: a 3.3 billion parameter ai model transforming efficiency and performance in....

  • AI Magazine
  • Privacy & TC
  • Cookie Policy

🐝 🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...

Thank You 🙌

Privacy Overview

Help | Advanced Search

Computer Science > Computer Vision and Pattern Recognition

Title: advancing spiking neural networks towards multiscale spatiotemporal interaction learning.

Abstract: Recent advancements in neuroscience research have propelled the development of Spiking Neural Networks (SNNs), which not only have the potential to further advance neuroscience research but also serve as an energy-efficient alternative to Artificial Neural Networks (ANNs) due to their spike-driven characteristics. However, previous studies often neglected the multiscale information and its spatiotemporal correlation between event data, leading SNN models to approximate each frame of input events as static images. We hypothesize that this oversimplification significantly contributes to the performance gap between SNNs and traditional ANNs. To address this issue, we have designed a Spiking Multiscale Attention (SMA) module that captures multiscale spatiotemporal interaction information. Furthermore, we developed a regularization method named Attention ZoneOut (AZO), which utilizes spatiotemporal attention weights to reduce the model's generalization error through pseudo-ensemble training. Our approach has achieved state-of-the-art results on mainstream neural morphology datasets. Additionally, we have reached a performance of 77.1% on the Imagenet-1K dataset using a 104-layer ResNet architecture enhanced with SMA and AZO. This achievement confirms the state-of-the-art performance of SNNs with non-transformer architectures and underscores the effectiveness of our method in bridging the performance gap between SNN models and traditional ANN models.

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Main Navigation

  • Contact NeurIPS
  • Code of Ethics
  • Code of Conduct
  • Create Profile
  • Journal To Conference Track
  • Diversity & Inclusion
  • Proceedings
  • Future Meetings
  • Exhibitor Information
  • Privacy Policy

NeurIPS 2024, the Thirty-eighth Annual Conference on Neural Information Processing Systems, will be held at the Vancouver Convention Center

Monday Dec 9 through Sunday Dec 15. Monday is an industry expo.

recent research paper on neural network

Registration

Pricing » Registration 2024 Registration Cancellation Policy » . Certificate of Attendance

Our Hotel Reservation page is currently under construction and will be released shortly. NeurIPS has contracted Hotel guest rooms for the Conference at group pricing, requiring reservations only through this page. Please do not make room reservations through any other channel, as it only impedes us from putting on the best Conference for you. We thank you for your assistance in helping us protect the NeurIPS conference.

Announcements

  • The call for High School Projects has been released
  • The Call For Papers has been released
  • See the Visa Information page for changes to the visa process for 2024.

Latest NeurIPS Blog Entries [ All Entries ]

Important dates.

If you have questions about supporting the conference, please contact us .

View NeurIPS 2024 exhibitors » Become an 2024 Exhibitor Exhibitor Info »

Organizing Committee

General chair, program chair, workshop chair, workshop chair assistant, tutorial chair, competition chair, data and benchmark chair, diversity, inclusion and accessibility chair, affinity chair, ethics review chair, communication chair, social chair, journal chair, creative ai chair, workflow manager, logistics and it, mission statement.

The Neural Information Processing Systems Foundation is a non-profit corporation whose purpose is to foster the exchange of research advances in Artificial Intelligence and Machine Learning, principally by hosting an annual interdisciplinary academic conference with the highest ethical standards for a diverse and inclusive community.

About the Conference

The conference was founded in 1987 and is now a multi-track interdisciplinary annual meeting that includes invited talks, demonstrations, symposia, and oral and poster presentations of refereed papers. Along with the conference is a professional exposition focusing on machine learning in practice, a series of tutorials, and topical workshops that provide a less formal setting for the exchange of ideas.

More about the Neural Information Processing Systems foundation »

IMAGES

  1. Research Paper On Basic of Artificial Neural Network

    recent research paper on neural network

  2. (PDF) Overview of Neural Networks

    recent research paper on neural network

  3. (PDF) Handwritten Text Recognition System based on Neural Network

    recent research paper on neural network

  4. (PDF) Study of Artificial Neural Network

    recent research paper on neural network

  5. (PDF) AN INTRODUCTION TO ARTIFICIAL NEURAL NETWORK

    recent research paper on neural network

  6. (PDF) Convolutional Neural Network (CNN) for Image Detection and

    recent research paper on neural network

VIDEO

  1. Neural Network For School Students

  2. An Animated Research Talk on: Neural-Network Quantum Field States

  3. Neural Network: Models of artificial neural netwok

  4. Why Are Neural Network Loss Landscapes So Weirdly Connected?

  5. Everything about Neural Network in 4 minutes

  6. [Paper Review]: Deep Neural Networks for YouTube Recommendations

COMMENTS

  1. [1404.7828] Deep Learning in Neural Networks: An Overview

    Juergen Schmidhuber. In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarises relevant work, much of it from the previous millennium. Shallow and deep learners are distinguished by the depth of their credit ...

  2. Review of deep learning: concepts, CNN architectures, challenges

    In the last few years, the deep learning (DL) computing paradigm has been deemed the Gold Standard in the machine learning (ML) community. Moreover, it has gradually become the most widely used computational approach in the field of ML, thus achieving outstanding results on several complex cognitive tasks, matching or even beating those provided by human performance. One of the benefits of DL ...

  3. Deep learning: systematic review, models, challenges, and research

    To fill these gaps, this paper provides a comprehensive survey on four types of DL models, namely, supervised, unsupervised, reinforcement, and hybrid learning. ... When a neural network is trained on new data, the optimization process may adjust the weights and connections in a way that erases the knowledge the network had about previous tasks ...

  4. Catalyzing next-generation Artificial Intelligence through NeuroAI

    Neuroscience continues to provide guidance—e.g., attention-based neural networks were loosely inspired by attention mechanisms in the brain 20,21,22,23 —but this is often based on findings ...

  5. Recent advances and applications of deep learning methods in ...

    Convolutional neural networks (CNN) 61 can be viewed as a regularized version of multilayer perceptrons with a strong inductive bias for learning translation-invariant image representations. There ...

  6. Neural networks: An overview of early research, current frameworks and

    1. Introduction and goals of neural-network research. Generally speaking, the development of artificial neural networks or models of neural networks arose from a double objective: firstly, to better understand the nervous system and secondly, to try to construct information processing systems inspired by natural, biological functions and thus gain the advantages of these systems.

  7. New Advances in Artificial Neural Networks and Machine Learning

    IWANN is a biennial conference that seeks to provide a discussion forum for scientists, engineers, educators and students about the latest ideas and realizations in the foundations, theory, models and applications of computational systems inspired on nature (neural networks, fuzzy logic and evolutionary systems) as well as in emerging areas related to the above items.

  8. Neural Networks

    Accordingly, the Neural Networks editorial board represents experts in fields including psychology, neurobiology, computer science, engineering, mathematics, and physics. The journal publishes articles, letters, and reviews, as well as letters to the editor, editorials, current events, and software surveys. Articles are published in one of four ...

  9. Evolutionary design of neural network architectures: a ...

    In parallel with the recent advances, the last period covers the Deep Learning Era, in which research direction is shifted towards configuring advanced models of deep neural networks. Finally, we propose open problems for future research in the field of neural architecture search and provide insights for fully automated machine learning.

  10. Recent Advances in Convolutional Neural Networks

    In this paper, we provide a broad survey of the recent advances in convolutional neural networks. We detailize the improvements of CNN on di erent aspects, including layer design, activation function, loss function, regularization, optimization and fast computation. Besides, we also introduce various applications of convolutional neural ...

  11. A review of graph neural networks: concepts, architectures, techniques

    Graph Attention Network (GAT/GAN) is a new neural network that works with graph-structured data. It uses masked self-attentional layers to address the shortcomings of past methods that depended on graph convolutions or their approximations. ... Table 11 provides an overview of different research papers, their publication years, the applications ...

  12. Exploring the Advancements and Future Research Directions of ...

    Artificial Neural Networks (ANNs) are machine learning algorithms inspired by the structure and function of the human brain. Their popularity has increased in recent years due to their ability to learn and improve through experience, making them suitable for a wide range of applications. ANNs are often used as part of deep learning, which enables them to learn, transfer knowledge, make ...

  13. Graph Neural Networks: A bibliometrics overview

    Recently, graph neural networks (GNNs) have become a hot topic in machine learning community. This paper presents a Scopus-based bibliometric overview of the GNNs' research since 2004 when GNN papers were first published. The study aims to evaluate GNN research trends, both quantitatively and qualitatively.

  14. Transformer: A Novel Neural Network Architecture for ...

    RNNs have in recent years become the typical network architecture for translation, processing language sequentially in a left-to-right or right-to-left fashion. Reading one word at a time, this forces RNNs to perform multiple steps to make decisions that depend on words far away from each other. Processing the example above, an RNN could only ...

  15. Artificial Neural Networks for Navigation Systems: A Review of Recent

    Several machine learning (ML) methodologies are gaining popularity as artificial intelligence (AI) becomes increasingly prevalent. An artificial neural network (ANN) may be used as a "black-box" modeling strategy without the need for a detailed system physical model. It is more reasonable to solely use the input and output data to explain the system's actions. ANNs have been extensively ...

  16. These neural networks know what they're doing

    And "liquid" neural networks change their underlying equations to continuously adapt to new inputs. The new research draws on previous work in which Hasani and others showed how a brain-inspired type of deep learning system called a Neural Circuit Policy (NCP), built by liquid neural network cells, is able to autonomously control a self ...

  17. Neural architecture search for in-memory computing-based deep learning

    Hardware-aware neural architecture search (HW-NAS) can be used to design efficient in-memory computing (IMC) hardware for deep learning accelerators. This Review discusses methodologies ...

  18. [1808.03314] Fundamentals of Recurrent Neural Network (RNN) and Long

    Because of their effectiveness in broad practical applications, LSTM networks have received a wealth of coverage in scientific journals, technical blogs, and implementation guides. However, in most articles, the inference formulas for the LSTM network and its parent, RNN, are stated axiomatically, while the training formulas are omitted altogether. In addition, the technique of "unrolling" an ...

  19. (PDF) Recent Advances in Recurrent Neural Networks

    arXiv:1801.01078v3 [cs.NE] 22 Feb 2018. Recent Advances in Recurrent Neural Netw orks. Hojjat Salehinejad, Sharan Sa nkar, Joseph Barfett, Errol Colak, and Shahrokh V alaee. Abstract —Recurrent ...

  20. (PDF) Artificial Neural Networks: An Overview

    In this paper, we propose a simple but comprehensive taxonomy for interpretability, systematically review recent studies in improving interpretability of neural networks, describe applications of ...

  21. Neural Networks

    A robust self-supervised image hashing method for content identification with forensic detection of content-preserving manipulations. Jesús Fonseca-Bustos, Kelsey Alejandra Ramírez-Gutiérrez, Claudia Feregrino-Uribe. In Press, Journal Pre-proof, Available online 3 May 2024. View PDF.

  22. Learning Models: CNN, RNN, LSTM, GRU

    The objective of this research is to provide an overview of various deep learning models and compare their performance across different applications. Section 2 discusses the different deep learning models, including Multi-Layer Perceptron (MLP), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN),

  23. (PDF) Neural Networks and Their Applications

    In neural networks, there is an interconnected network of. nodes which are named neurons and edges that join them. together. A neural network' s main function is to get an array of. inputs ...

  24. Here's what's really going on inside an LLM's neural network

    From The Center. Now, new research from Anthropic offers a new window into what's going on inside the Claude LLM's "black box." The company's new paper on "Extracting Interpretable Features from Claude 3 Sonnet" describes a powerful new method for at least partially explaining just how the model's millions of artificial neurons fire to create ...

  25. Graph neural networks: A review of methods and applications

    Recent advancement of deep neural networks, especially convolutional neural networks (CNNs) ... Papers by Zhang et al. (2018b), ... non-structural scenarios and other scenarios. In Section 9, we propose four open problems of graph neural networks as well as several future research directions. And finally, ...

  26. Deep Learning Illustrated, Part 3: Convolutional Neural Networks

    To quickly recap, we previously discussed the inner workings of neural networks by building a simple model to predict the daily revenue of an ice cream shop. We found that neural networks can handle complex problems by harnessing the combined power of several neurons. ... A recent research paper titled "KAN: Kolmogorov-Arnold Network" has ...

  27. New Horizons for Fuzzy Logic, Neural Networks and Metaheuristics

    This book contains a collection of papers focused on hybrid intelligent systems based on soft computing techniques. In this book, new horizons on the theoretical developments of fuzzy logic, neural networks and optimization algorithms are envisioned. In addition, the abovementioned methods are discussed in application areas such as control and ...

  28. This AI Paper by Toyota Research Institute Introduces SUPRA: Enhancing

    Natural language processing (NLP) has advanced significantly thanks to neural networks, with transformer models setting the standard. These models have performed remarkably well across a range of criteria. However, they pose serious problems because of their high memory requirements and high computational expense, particularly for applications that demand long-context work. This persistent ...

  29. [2405.13672] Advancing Spiking Neural Networks towards Multiscale

    Recent advancements in neuroscience research have propelled the development of Spiking Neural Networks (SNNs), which not only have the potential to further advance neuroscience research but also serve as an energy-efficient alternative to Artificial Neural Networks (ANNs) due to their spike-driven characteristics. However, previous studies often neglected the multiscale information and its ...

  30. 2024 Conference

    The Neural Information Processing Systems Foundation is a non-profit corporation whose purpose is to foster the exchange of research advances in Artificial Intelligence and Machine Learning, principally by hosting an annual interdisciplinary academic conference with the highest ethical standards for a diverse and inclusive community.