Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals

Machine learning articles from across Nature Portfolio

Machine learning is the ability of a machine to improve its performance based on previous results. Machine learning methods enable computers to learn without being explicitly programmed and have multiple applications, for example, in the improvement of data mining algorithms.

research articles on machine learning

AI networks reveal how flies find a mate

Artificial neural networks that model the visual system of a male fruit fly can accurately predict the insect’s behaviour in response to seeing a potential mate — paving the way for the building of more complex models of brain circuits.

  • Pavan Ramdya

Predicting tumour origin with cytology-based deep learning: hype or hope?

The majority of patients with cancers of unknown primary have unfavourable outcomes when they receive empirical chemotherapy. The shift towards using precision medicine-based treatment strategies involves two options: tissue-agnostic or site-specific approaches. Here, we reflect on how cytology-based deep learning tools can be leveraged in these approaches.

  • Nicholas Pavlidis

research articles on machine learning

A multidimensional dataset for structure-based machine learning

MISATO, a dataset for structure-based drug discovery combines quantum mechanics property data and molecular dynamics simulations on ~20,000 protein–ligand structures, substantially extends the amount of data available to the community and holds potential for advancing work in drug discovery.

  • Matthew Holcomb
  • Stefano Forli

Latest Research and Reviews

research articles on machine learning

An interactive atlas of genomic, proteomic, and metabolomic biomarkers promotes the potential of proteins to predict complex diseases

  • Martin Smelik
  • Mikael Benson

research articles on machine learning

Development and deployment of a histopathology-based deep learning algorithm for patient prescreening in a clinical trial

Here, the authors develop a deep-learning algorithm to predict biomarkers from histopathological imaging in advanced urothelial cancer patients. This method detects suitable patients for targeted therapy clinical trials with a significant reduction in molecular testing, providing cost and time savings in real-world clinical settings.

  • Albert Juan Ramon
  • Chaitanya Parmar
  • Kristopher A. Standish

research articles on machine learning

Decoding intelligence via symmetry and asymmetry

  • Jianjing Fu
  • Ching-an Hsiao

research articles on machine learning

External auricle temperature enhances ear-based wearable accuracy during physiological strain monitoring in the heat

  • Shawn Chee Chong Tan
  • Trinh Canh Khanh Tran
  • Ivan Cherh Chiet Low

research articles on machine learning

Reliable anti-cancer drug sensitivity prediction and prioritization

  • Kerstin Lenhof
  • Lea Eckhart
  • Hans-Peter Lenhof

research articles on machine learning

Deep learning to assess microsatellite instability directly from histopathological whole slide images in endometrial cancer

  • Ching-Wei Wang
  • Hikam Muzakky
  • Tai-Kuang Chao

Advertisement

News and Comment

research articles on machine learning

A surprising abundance of pancreatic pre-cancers

AI-based three-dimensional genomic mapping reveals a large abundance of cancer precursors in normal pancreatic tissue — prompting new insights and research directions.

  • Karen O’Leary

research articles on machine learning

Affordable and simplified whole-body MRI

A whole-body scanner developed using a permanent 0.05-tesla magnet and deep learning has demonstrated its versatility in imaging various anatomical structures, showcasing its potential to address unmet clinical needs.

  • Sonia Muliyil

research articles on machine learning

How AI could improve robotics, the cockroach’s origins, and promethium spills its secrets

We round up some recent stories from the Nature Briefing.

  • Benjamin Thompson
  • Elizabeth Gibney
  • Flora Graham

research articles on machine learning

AI assistance for planning cancer treatment

Armed with the right data, advances in machine learning could help oncologists to home in quickly on the best treatment strategies for their patients.

  • Michael Eisenstein

research articles on machine learning

Who owns your voice? Scarlett Johansson OpenAI complaint raises questions

In the age of artificial intelligence, situations are emerging that challenge the laws over rights to a persona.

  • Nicola Jones

Anglo-American bias could make generative AI an invisible intellectual cage

  • Queenie Luo
  • Michael Puett

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

research articles on machine learning

research articles on machine learning

Frequently Asked Questions

Journal of Machine Learning Research

The Journal of Machine Learning Research (JMLR), established in 2000 , provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning. All published papers are freely available online.

  • 2024.02.18 : Volume 24 completed; Volume 25 began.
  • 2023.01.20 : Volume 23 completed; Volume 24 began.
  • 2022.07.20 : New special issue on climate change .
  • 2022.02.18 : New blog post: Retrospectives from 20 Years of JMLR .
  • 2022.01.25 : Volume 22 completed; Volume 23 began.
  • 2021.12.02 : Message from outgoing co-EiC Bernhard Schölkopf .
  • 2021.02.10 : Volume 21 completed; Volume 22 began.
  • More news ...

Latest papers

Optimal Locally Private Nonparametric Classification with Public Data Yuheng Ma, Hanfang Yang , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Learning to Warm-Start Fixed-Point Optimization Algorithms Rajiv Sambharya, Georgina Hall, Brandon Amos, Bartolomeo Stellato , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Nonparametric Regression Using Over-parameterized Shallow ReLU Neural Networks Yunfei Yang, Ding-Xuan Zhou , 2024. [ abs ][ pdf ][ bib ]

Nonparametric Copula Models for Multivariate, Mixed, and Missing Data Joseph Feldman, Daniel R. Kowal , 2024. [ abs ][ pdf ][ bib ]      [ code ]

An Analysis of Quantile Temporal-Difference Learning Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney , 2024. [ abs ][ pdf ][ bib ]

Conformal Inference for Online Prediction with Arbitrary Distribution Shifts Isaac Gibbs, Emmanuel J. Candès , 2024. [ abs ][ pdf ][ bib ]      [ code ]

More Efficient Estimation of Multivariate Additive Models Based on Tensor Decomposition and Penalization Xu Liu, Heng Lian, Jian Huang , 2024. [ abs ][ pdf ][ bib ]

A Kernel Test for Causal Association via Noise Contrastive Backdoor Adjustment Robert Hu, Dino Sejdinovic, Robin J. Evans , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Assessing the Overall and Partial Causal Well-Specification of Nonlinear Additive Noise Models Christoph Schultheiss, Peter Bühlmann , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Simple Cycle Reservoirs are Universal Boyu Li, Robert Simon Fong, Peter Tino , 2024. [ abs ][ pdf ][ bib ]

On the Computational Complexity of Metropolis-Adjusted Langevin Algorithms for Bayesian Posterior Sampling Rong Tang, Yun Yang , 2024. [ abs ][ pdf ][ bib ]

Generalization and Stability of Interpolating Neural Networks with Minimal Width Hossein Taheri, Christos Thrampoulidis , 2024. [ abs ][ pdf ][ bib ]

Statistical Optimality of Divide and Conquer Kernel-based Functional Linear Regression Jiading Liu, Lei Shi , 2024. [ abs ][ pdf ][ bib ]

Identifiability and Asymptotics in Learning Homogeneous Linear ODE Systems from Discrete Observations Yuanyuan Wang, Wei Huang, Mingming Gong, Xi Geng, Tongliang Liu, Kun Zhang, Dacheng Tao , 2024. [ abs ][ pdf ][ bib ]

Robust Black-Box Optimization for Stochastic Search and Episodic Reinforcement Learning Maximilian Hüttenrauch, Gerhard Neumann , 2024. [ abs ][ pdf ][ bib ]

Kernel Thinning Raaz Dwivedi, Lester Mackey , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Optimal Algorithms for Stochastic Bilevel Optimization under Relaxed Smoothness Conditions Xuxing Chen, Tesi Xiao, Krishnakumar Balasubramanian , 2024. [ abs ][ pdf ][ bib ]

Variational Estimators of the Degree-corrected Latent Block Model for Bipartite Networks Yunpeng Zhao, Ning Hao, Ji Zhu , 2024. [ abs ][ pdf ][ bib ]

Statistical Inference for Fairness Auditing John J. Cherian, Emmanuel J. Candès , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Adjusted Wasserstein Distributionally Robust Estimator in Statistical Learning Yiling Xie, Xiaoming Huo , 2024. [ abs ][ pdf ][ bib ]

DoWhy-GCM: An Extension of DoWhy for Causal Inference in Graphical Causal Models Patrick Blöbaum, Peter Götz, Kailash Budhathoki, Atalanti A. Mastakouri, Dominik Janzing , 2024. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Flexible Bayesian Product Mixture Models for Vector Autoregressions Suprateek Kundu, Joshua Lukemire , 2024. [ abs ][ pdf ][ bib ]

A Variational Approach to Bayesian Phylogenetic Inference Cheng Zhang, Frederick A. Matsen IV , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Fat-Shattering Dimension of k-fold Aggregations Idan Attias, Aryeh Kontorovich , 2024. [ abs ][ pdf ][ bib ]

Unified Binary and Multiclass Margin-Based Classification Yutong Wang, Clayton Scott , 2024. [ abs ][ pdf ][ bib ]

Neural Feature Learning in Function Space Xiangxiang Xu, Lizhong Zheng , 2024. [ abs ][ pdf ][ bib ]      [ code ]

PyGOD: A Python Library for Graph Outlier Detection Kay Liu, Yingtong Dou, Xueying Ding, Xiyang Hu, Ruitong Zhang, Hao Peng, Lichao Sun, Philip S. Yu , 2024. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Blessings and Curses of Covariate Shifts: Adversarial Learning Dynamics, Directional Convergence, and Equilibria Tengyuan Liang , 2024. [ abs ][ pdf ][ bib ]

Fixed points of nonnegative neural networks Tomasz J. Piotrowski, Renato L. G. Cavalcante, Mateusz Gabor , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks Fanghui Liu, Leello Dadi, Volkan Cevher , 2024. [ abs ][ pdf ][ bib ]

A Survey on Multi-player Bandits Etienne Boursier, Vianney Perchet , 2024. [ abs ][ pdf ][ bib ]

Transport-based Counterfactual Models Lucas De Lara, Alberto González-Sanz, Nicholas Asher, Laurent Risser, Jean-Michel Loubes , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Adaptive Latent Feature Sharing for Piecewise Linear Dimensionality Reduction Adam Farooq, Yordan P. Raykov, Petar Raykov, Max A. Little , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Topological Node2vec: Enhanced Graph Embedding via Persistent Homology Yasuaki Hiraoka, Yusuke Imoto, Théo Lacombe, Killian Meehan, Toshiaki Yachimura , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Granger Causal Inference in Multivariate Hawkes Processes by Minimum Message Length Katerina Hlaváčková-Schindler, Anna Melnykova, Irene Tubikanec , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Representation Learning via Manifold Flattening and Reconstruction Michael Psenka, Druv Pai, Vishal Raman, Shankar Sastry, Yi Ma , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Bagging Provides Assumption-free Stability Jake A. Soloff, Rina Foygel Barber, Rebecca Willett , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Fairness guarantees in multi-class classification with demographic parity Christophe Denis, Romuald Elie, Mohamed Hebiri, François Hu , 2024. [ abs ][ pdf ][ bib ]

Regimes of No Gain in Multi-class Active Learning Gan Yuan, Yunfan Zhao, Samory Kpotufe , 2024. [ abs ][ pdf ][ bib ]

Learning Optimal Dynamic Treatment Regimens Subject to Stagewise Risk Controls Mochuan Liu, Yuanjia Wang, Haoda Fu, Donglin Zeng , 2024. [ abs ][ pdf ][ bib ]

Margin-Based Active Learning of Classifiers Marco Bressan, Nicolò Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice , 2024. [ abs ][ pdf ][ bib ]

Random Subgraph Detection Using Queries Wasim Huleihel, Arya Mazumdar, Soumyabrata Pal , 2024. [ abs ][ pdf ][ bib ]

Classification with Deep Neural Networks and Logistic Loss Zihan Zhang, Lei Shi, Ding-Xuan Zhou , 2024. [ abs ][ pdf ][ bib ]

Spectral learning of multivariate extremes Marco Avella Medina, Richard A Davis, Gennady Samorodnitsky , 2024. [ abs ][ pdf ][ bib ]

Sum-of-norms clustering does not separate nearby balls Alexander Dunlap, Jean-Christophe Mourrat , 2024. [ abs ][ pdf ][ bib ]      [ code ]

An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization Guy Kornowski, Ohad Shamir , 2024. [ abs ][ pdf ][ bib ]

Linear Distance Metric Learning with Noisy Labels Meysam Alishahi, Anna Little, Jeff M. Phillips , 2024. [ abs ][ pdf ][ bib ]      [ code ]

OpenBox: A Python Toolkit for Generalized Black-box Optimization Huaijun Jiang, Yu Shen, Yang Li, Beicheng Xu, Sixian Du, Wentao Zhang, Ce Zhang, Bin Cui , 2024. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Generative Adversarial Ranking Nets Yinghua Yao, Yuangang Pan, Jing Li, Ivor W. Tsang, Xin Yao , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Predictive Inference with Weak Supervision Maxime Cauchois, Suyash Gupta, Alnur Ali, John C. Duchi , 2024. [ abs ][ pdf ][ bib ]

Functions with average smoothness: structure, algorithms, and learning Yair Ashlagi, Lee-Ad Gottlieb, Aryeh Kontorovich , 2024. [ abs ][ pdf ][ bib ]

Differentially Private Data Release for Mixed-type Data via Latent Factor Models Yanqing Zhang, Qi Xu, Niansheng Tang, Annie Qu , 2024. [ abs ][ pdf ][ bib ]

The Non-Overlapping Statistical Approximation to Overlapping Group Lasso Mingyu Qi, Tianxi Li , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Faster Rates of Differentially Private Stochastic Convex Optimization Jinyan Su, Lijie Hu, Di Wang , 2024. [ abs ][ pdf ][ bib ]

Nonasymptotic analysis of Stochastic Gradient Hamiltonian Monte Carlo under local conditions for nonconvex optimization O. Deniz Akyildiz, Sotirios Sabanis , 2024. [ abs ][ pdf ][ bib ]

Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits Junpei Komiyama, Edouard Fouché, Junya Honda , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Stable Implementation of Probabilistic ODE Solvers Nicholas Krämer, Philipp Hennig , 2024. [ abs ][ pdf ][ bib ]

More PAC-Bayes bounds: From bounded losses, to losses with general tail behaviors, to anytime validity Borja Rodríguez-Gálvez, Ragnar Thobaben, Mikael Skoglund , 2024. [ abs ][ pdf ][ bib ]

Neural Hilbert Ladders: Multi-Layer Neural Networks in Function Space Zhengdao Chen , 2024. [ abs ][ pdf ][ bib ]

QDax: A Library for Quality-Diversity and Population-based Algorithms with Hardware Acceleration Felix Chalumeau, Bryan Lim, Raphaël Boige, Maxime Allard, Luca Grillotti, Manon Flageat, Valentin Macé, Guillaume Richard, Arthur Flajolet, Thomas Pierrot, Antoine Cully , 2024. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Random Forest Weighted Local Fréchet Regression with Random Objects Rui Qiu, Zhou Yu, Ruoqing Zhu , 2024. [ abs ][ pdf ][ bib ]      [ code ]

PhAST: Physics-Aware, Scalable, and Task-Specific GNNs for Accelerated Catalyst Design Alexandre Duval, Victor Schmidt, Santiago Miret, Yoshua Bengio, Alex Hernández-García, David Rolnick , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Unsupervised Anomaly Detection Algorithms on Real-world Data: How Many Do We Need? Roel Bouman, Zaharah Bukhsh, Tom Heskes , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Multi-class Probabilistic Bounds for Majority Vote Classifiers with Partially Labeled Data Vasilii Feofanov, Emilie Devijver, Massih-Reza Amini , 2024. [ abs ][ pdf ][ bib ]

Information Processing Equalities and the Information–Risk Bridge Robert C. Williamson, Zac Cranko , 2024. [ abs ][ pdf ][ bib ]

Nonparametric Regression for 3D Point Cloud Learning Xinyi Li, Shan Yu, Yueying Wang, Guannan Wang, Li Wang, Ming-Jun Lai , 2024. [ abs ][ pdf ][ bib ]      [ code ]

AMLB: an AutoML Benchmark Pieter Gijsbers, Marcos L. P. Bueno, Stefan Coors, Erin LeDell, Sébastien Poirier, Janek Thomas, Bernd Bischl, Joaquin Vanschoren , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Materials Discovery using Max K-Armed Bandit Nobuaki Kikkawa, Hiroshi Ohno , 2024. [ abs ][ pdf ][ bib ]

Semi-supervised Inference for Block-wise Missing Data without Imputation Shanshan Song, Yuanyuan Lin, Yong Zhou , 2024. [ abs ][ pdf ][ bib ]

Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization Peng Zhao, Yu-Jie Zhang, Lijun Zhang, Zhi-Hua Zhou , 2024. [ abs ][ pdf ][ bib ]

Scaling Speech Technology to 1,000+ Languages Vineel Pratap, Andros Tjandra, Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky, Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski, Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael Auli , 2024. [ abs ][ pdf ][ bib ]      [ code ]

MAP- and MLE-Based Teaching Hans Ulrich Simon, Jan Arne Telle , 2024. [ abs ][ pdf ][ bib ]

A General Framework for the Analysis of Kernel-based Tests Tamara Fernández, Nicolás Rivera , 2024. [ abs ][ pdf ][ bib ]

Overparametrized Multi-layer Neural Networks: Uniform Concentration of Neural Tangent Kernel and Convergence of Stochastic Gradient Descent Jiaming Xu, Hanjing Zhu , 2024. [ abs ][ pdf ][ bib ]

Sparse Representer Theorems for Learning in Reproducing Kernel Banach Spaces Rui Wang, Yuesheng Xu, Mingsong Yan , 2024. [ abs ][ pdf ][ bib ]

Exploration of the Search Space of Gaussian Graphical Models for Paired Data Alberto Roverato, Dung Ngoc Nguyen , 2024. [ abs ][ pdf ][ bib ]

The good, the bad and the ugly sides of data augmentation: An implicit spectral regularization perspective Chi-Heng Lin, Chiraag Kaushik, Eva L. Dyer, Vidya Muthukumar , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Stochastic Approximation with Decision-Dependent Distributions: Asymptotic Normality and Optimality Joshua Cutler, Mateo Díaz, Dmitriy Drusvyatskiy , 2024. [ abs ][ pdf ][ bib ]

Minimax Rates for High-Dimensional Random Tessellation Forests Eliza O'Reilly, Ngoc Mai Tran , 2024. [ abs ][ pdf ][ bib ]

Nonparametric Estimation of Non-Crossing Quantile Regression Process with Deep ReQU Neural Networks Guohao Shen, Yuling Jiao, Yuanyuan Lin, Joel L. Horowitz, Jian Huang , 2024. [ abs ][ pdf ][ bib ]

Spatial meshing for general Bayesian multivariate models Michele Peruzzi, David B. Dunson , 2024. [ abs ][ pdf ][ bib ]      [ code ]

A Semi-parametric Estimation of Personalized Dose-response Function Using Instrumental Variables Wei Luo, Yeying Zhu, Xuekui Zhang, Lin Lin , 2024. [ abs ][ pdf ][ bib ]

Learning Non-Gaussian Graphical Models via Hessian Scores and Triangular Transport Ricardo Baptista, Rebecca Morrison, Olivier Zahm, Youssef Marzouk , 2024. [ abs ][ pdf ][ bib ]      [ code ]

On the Learnability of Out-of-distribution Detection Zhen Fang, Yixuan Li, Feng Liu, Bo Han, Jie Lu , 2024. [ abs ][ pdf ][ bib ]

Win: Weight-Decay-Integrated Nesterov Acceleration for Faster Network Training Pan Zhou, Xingyu Xie, Zhouchen Lin, Kim-Chuan Toh, Shuicheng Yan , 2024. [ abs ][ pdf ][ bib ]      [ code ]

On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains Yicheng Li, Zixiong Yu, Guhan Chen, Qian Lin , 2024. [ abs ][ pdf ][ bib ]

Tight Convergence Rate Bounds for Optimization Under Power Law Spectral Conditions Maksim Velikanov, Dmitry Yarotsky , 2024. [ abs ][ pdf ][ bib ]

ptwt - The PyTorch Wavelet Toolbox Moritz Wolter, Felix Blanke, Jochen Garcke, Charles Tapley Hoyt , 2024. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Choosing the Number of Topics in LDA Models – A Monte Carlo Comparison of Selection Criteria Victor Bystrov, Viktoriia Naboka-Krell, Anna Staszewska-Bystrova, Peter Winker , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Functional Directed Acyclic Graphs Kuang-Yao Lee, Lexin Li, Bing Li , 2024. [ abs ][ pdf ][ bib ]

Unlabeled Principal Component Analysis and Matrix Completion Yunzhen Yao, Liangzu Peng, Manolis C. Tsakiris , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Distributed Estimation on Semi-Supervised Generalized Linear Model Jiyuan Tu, Weidong Liu, Xiaojun Mao , 2024. [ abs ][ pdf ][ bib ]

Towards Explainable Evaluation Metrics for Machine Translation Christoph Leiter, Piyawat Lertvittayakumjorn, Marina Fomicheva, Wei Zhao, Yang Gao, Steffen Eger , 2024. [ abs ][ pdf ][ bib ]

Differentially private methods for managing model uncertainty in linear regression Víctor Peña, Andrés F. Barrientos , 2024. [ abs ][ pdf ][ bib ]

Data Summarization via Bilevel Optimization Zalán Borsos, Mojmír Mutný, Marco Tagliasacchi, Andreas Krause , 2024. [ abs ][ pdf ][ bib ]

Pareto Smoothed Importance Sampling Aki Vehtari, Daniel Simpson, Andrew Gelman, Yuling Yao, Jonah Gabry , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Policy Gradient Methods in the Presence of Symmetries and State Abstractions Prakash Panangaden, Sahand Rezaei-Shoshtari, Rosie Zhao, David Meger, Doina Precup , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Scaling Instruction-Finetuned Language Models Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, Jason Wei , 2024. [ abs ][ pdf ][ bib ]

Tangential Wasserstein Projections Florian Gunsilius, Meng Hsuan Hsieh, Myung Jin Lee , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Learnability of Linear Port-Hamiltonian Systems Juan-Pablo Ortega, Daiying Yin , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning Ariyan Bighashdel, Daan de Geus, Pavol Jancura, Gijs Dubbelman , 2024. [ abs ][ pdf ][ bib ]      [ code ]

On Unbiased Estimation for Partially Observed Diffusions Jeremy Heng, Jeremie Houssineau, Ajay Jasra , 2024. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Improving Lipschitz-Constrained Neural Networks by Learning Activation Functions Stanislas Ducotterd, Alexis Goujon, Pakshal Bohra, Dimitris Perdios, Sebastian Neumayer, Michael Unser , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Mathematical Framework for Online Social Media Auditing Wasim Huleihel, Yehonathan Refael , 2024. [ abs ][ pdf ][ bib ]

An Embedding Framework for the Design and Analysis of Consistent Polyhedral Surrogates Jessie Finocchiaro, Rafael M. Frongillo, Bo Waggoner , 2024. [ abs ][ pdf ][ bib ]

Low-rank Variational Bayes correction to the Laplace method Janet van Niekerk, Haavard Rue , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Scaling the Convex Barrier with Sparse Dual Algorithms Alessandro De Palma, Harkirat Singh Behl, Rudy Bunel, Philip H.S. Torr, M. Pawan Kumar , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Causal-learn: Causal Discovery in Python Yujia Zheng, Biwei Huang, Wei Chen, Joseph Ramsey, Mingming Gong, Ruichu Cai, Shohei Shimizu, Peter Spirtes, Kun Zhang , 2024. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Decomposed Linear Dynamical Systems (dLDS) for learning the latent components of neural dynamics Noga Mudrik, Yenho Chen, Eva Yezerets, Christopher J. Rozell, Adam S. Charles , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Existence and Minimax Theorems for Adversarial Surrogate Risks in Binary Classification Natalie S. Frank, Jonathan Niles-Weed , 2024. [ abs ][ pdf ][ bib ]

Data Thinning for Convolution-Closed Distributions Anna Neufeld, Ameer Dharamshi, Lucy L. Gao, Daniela Witten , 2024. [ abs ][ pdf ][ bib ]      [ code ]

A projected semismooth Newton method for a class of nonconvex composite programs with strong prox-regularity Jiang Hu, Kangkang Deng, Jiayuan Wu, Quanzheng Li , 2024. [ abs ][ pdf ][ bib ]

Revisiting RIP Guarantees for Sketching Operators on Mixture Models Ayoub Belhadji, Rémi Gribonval , 2024. [ abs ][ pdf ][ bib ]

Monotonic Risk Relationships under Distribution Shifts for Regularized Risk Minimization Daniel LeJeune, Jiayu Liu, Reinhard Heckel , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks Dong-Young Lim, Sotirios Sabanis , 2024. [ abs ][ pdf ][ bib ]

Axiomatic effect propagation in structural causal models Raghav Singal, George Michailidis , 2024. [ abs ][ pdf ][ bib ]

Optimal First-Order Algorithms as a Function of Inequalities Chanwoo Park, Ernest K. Ryu , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Resource-Efficient Neural Networks for Embedded Systems Wolfgang Roth, Günther Schindler, Bernhard Klein, Robert Peharz, Sebastian Tschiatschek, Holger Fröning, Franz Pernkopf, Zoubin Ghahramani , 2024. [ abs ][ pdf ][ bib ]

Trained Transformers Learn Linear Models In-Context Ruiqi Zhang, Spencer Frei, Peter L. Bartlett , 2024. [ abs ][ pdf ][ bib ]

Adam-family Methods for Nonsmooth Optimization with Convergence Guarantees Nachuan Xiao, Xiaoyin Hu, Xin Liu, Kim-Chuan Toh , 2024. [ abs ][ pdf ][ bib ]

Efficient Modality Selection in Multimodal Learning Yifei He, Runxiang Cheng, Gargi Balasubramaniam, Yao-Hung Hubert Tsai, Han Zhao , 2024. [ abs ][ pdf ][ bib ]

A Multilabel Classification Framework for Approximate Nearest Neighbor Search Ville Hyvönen, Elias Jääsaari, Teemu Roos , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Probabilistic Forecasting with Generative Networks via Scoring Rule Minimization Lorenzo Pacchiardi, Rilwan A. Adewoyin, Peter Dueben, Ritabrata Dutta , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Multiple Descent in the Multiple Random Feature Model Xuran Meng, Jianfeng Yao, Yuan Cao , 2024. [ abs ][ pdf ][ bib ]

Mean-Square Analysis of Discretized Itô Diffusions for Heavy-tailed Sampling Ye He, Tyler Farghly, Krishnakumar Balasubramanian, Murat A. Erdogdu , 2024. [ abs ][ pdf ][ bib ]

Invariant and Equivariant Reynolds Networks Akiyoshi Sannai, Makoto Kawano, Wataru Kumagai , 2024. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Personalized PCA: Decoupling Shared and Unique Features Naichen Shi, Raed Al Kontar , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Survival Kernets: Scalable and Interpretable Deep Kernel Survival Analysis with an Accuracy Guarantee George H. Chen , 2024. [ abs ][ pdf ][ bib ]      [ code ]

On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control Amrit Singh Bedi, Anjaly Parayil, Junyu Zhang, Mengdi Wang, Alec Koppel , 2024. [ abs ][ pdf ][ bib ]

Convergence for nonconvex ADMM, with applications to CT imaging Rina Foygel Barber, Emil Y. Sidky , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Distributed Gaussian Mean Estimation under Communication Constraints: Optimal Rates and Communication-Efficient Algorithms T. Tony Cai, Hongji Wei , 2024. [ abs ][ pdf ][ bib ]

Sparse NMF with Archetypal Regularization: Computational and Robustness Properties Kayhan Behdin, Rahul Mazumder , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Deep Network Approximation: Beyond ReLU to Diverse Activation Functions Shijun Zhang, Jianfeng Lu, Hongkai Zhao , 2024. [ abs ][ pdf ][ bib ]

Effect-Invariant Mechanisms for Policy Generalization Sorawit Saengkyongam, Niklas Pfister, Predrag Klasnja, Susan Murphy, Jonas Peters , 2024. [ abs ][ pdf ][ bib ]

Pygmtools: A Python Graph Matching Toolkit Runzhong Wang, Ziao Guo, Wenzheng Pan, Jiale Ma, Yikai Zhang, Nan Yang, Qi Liu, Longxuan Wei, Hanxue Zhang, Chang Liu, Zetian Jiang, Xiaokang Yang, Junchi Yan , 2024. (Machine Learning Open Source Software Paper) [ abs ][ pdf ][ bib ]      [ code ]

Heterogeneous-Agent Reinforcement Learning Yifan Zhong, Jakub Grudzien Kuba, Xidong Feng, Siyi Hu, Jiaming Ji, Yaodong Yang , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Sample-efficient Adversarial Imitation Learning Dahuin Jung, Hyungyu Lee, Sungroh Yoon , 2024. [ abs ][ pdf ][ bib ]

Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic Gradient Descent Benjamin Gess, Sebastian Kassing, Vitalii Konarovskyi , 2024. [ abs ][ pdf ][ bib ]

Rates of convergence for density estimation with generative adversarial networks Nikita Puchkin, Sergey Samsonov, Denis Belomestny, Eric Moulines, Alexey Naumov , 2024. [ abs ][ pdf ][ bib ]

Additive smoothing error in backward variational inference for general state-space models Mathis Chagneux, Elisabeth Gassiat, Pierre Gloaguen, Sylvain Le Corff , 2024. [ abs ][ pdf ][ bib ]

Optimal Bump Functions for Shallow ReLU networks: Weight Decay, Depth Separation, Curse of Dimensionality Stephan Wojtowytsch , 2024. [ abs ][ pdf ][ bib ]

Numerically Stable Sparse Gaussian Processes via Minimum Separation using Cover Trees Alexander Terenin, David R. Burt, Artem Artemev, Seth Flaxman, Mark van der Wilk, Carl Edward Rasmussen, Hong Ge , 2024. [ abs ][ pdf ][ bib ]      [ code ]

On Tail Decay Rate Estimation of Loss Function Distributions Etrit Haxholli, Marco Lorenzi , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Deep Nonparametric Estimation of Operators between Infinite Dimensional Spaces Hao Liu, Haizhao Yang, Minshuo Chen, Tuo Zhao, Wenjing Liao , 2024. [ abs ][ pdf ][ bib ]

Post-Regularization Confidence Bands for Ordinary Differential Equations Xiaowu Dai, Lexin Li , 2024. [ abs ][ pdf ][ bib ]

On the Generalization of Stochastic Gradient Descent with Momentum Ali Ramezani-Kebrya, Kimon Antonakopoulos, Volkan Cevher, Ashish Khisti, Ben Liang , 2024. [ abs ][ pdf ][ bib ]

Pursuit of the Cluster Structure of Network Lasso: Recovery Condition and Non-convex Extension Shotaro Yagishita, Jun-ya Gotoh , 2024. [ abs ][ pdf ][ bib ]

Iterate Averaging in the Quest for Best Test Error Diego Granziol, Nicholas P. Baskerville, Xingchen Wan, Samuel Albanie, Stephen Roberts , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Nonparametric Inference under B-bits Quantization Kexuan Li, Ruiqi Liu, Ganggang Xu, Zuofeng Shang , 2024. [ abs ][ pdf ][ bib ]

Black Box Variational Inference with a Deterministic Objective: Faster, More Accurate, and Even More Black Box Ryan Giordano, Martin Ingram, Tamara Broderick , 2024. [ abs ][ pdf ][ bib ]      [ code ]

On Sufficient Graphical Models Bing Li, Kyongwon Kim , 2024. [ abs ][ pdf ][ bib ]

Localized Debiased Machine Learning: Efficient Inference on Quantile Treatment Effects and Beyond Nathan Kallus, Xiaojie Mao, Masatoshi Uehara , 2024. [ abs ][ pdf ][ bib ]      [ code ]

On the Effect of Initialization: The Scaling Path of 2-Layer Neural Networks Sebastian Neumayer, Lénaïc Chizat, Michael Unser , 2024. [ abs ][ pdf ][ bib ]

Improving physics-informed neural networks with meta-learned optimization Alex Bihlo , 2024. [ abs ][ pdf ][ bib ]

A Comparison of Continuous-Time Approximations to Stochastic Gradient Descent Stefan Ankirchner, Stefan Perko , 2024. [ abs ][ pdf ][ bib ]

Critically Assessing the State of the Art in Neural Network Verification Matthias König, Annelot W. Bosman, Holger H. Hoos, Jan N. van Rijn , 2024. [ abs ][ pdf ][ bib ]

Estimating the Minimizer and the Minimum Value of a Regression Function under Passive Design Arya Akhavan, Davit Gogolashvili, Alexandre B. Tsybakov , 2024. [ abs ][ pdf ][ bib ]

Modeling Random Networks with Heterogeneous Reciprocity Daniel Cirkovic, Tiandong Wang , 2024. [ abs ][ pdf ][ bib ]

Exploration, Exploitation, and Engagement in Multi-Armed Bandits with Abandonment Zixian Yang, Xin Liu, Lei Ying , 2024. [ abs ][ pdf ][ bib ]

On Efficient and Scalable Computation of the Nonparametric Maximum Likelihood Estimator in Mixture Models Yangjing Zhang, Ying Cui, Bodhisattva Sen, Kim-Chuan Toh , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Decorrelated Variable Importance Isabella Verdinelli, Larry Wasserman , 2024. [ abs ][ pdf ][ bib ]

Model-Free Representation Learning and Exploration in Low-Rank MDPs Aditya Modi, Jinglin Chen, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal , 2024. [ abs ][ pdf ][ bib ]

Seeded Graph Matching for the Correlated Gaussian Wigner Model via the Projected Power Method Ernesto Araya, Guillaume Braun, Hemant Tyagi , 2024. [ abs ][ pdf ][ bib ]      [ code ]

Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization Shicong Cen, Yuting Wei, Yuejie Chi , 2024. [ abs ][ pdf ][ bib ]

Power of knockoff: The impact of ranking algorithm, augmented design, and symmetric statistic Zheng Tracy Ke, Jun S. Liu, Yucong Ma , 2024. [ abs ][ pdf ][ bib ]

Lower Complexity Bounds of Finite-Sum Optimization Problems: The Results and Construction Yuze Han, Guangzeng Xie, Zhihua Zhang , 2024. [ abs ][ pdf ][ bib ]

On Truthing Issues in Supervised Classification Jonathan K. Su , 2024. [ abs ][ pdf ][ bib ]

Subscribe to the PwC Newsletter

Join the community, trending research, autocoder: enhancing code large language model with \textsc{aiev-instruct}.

bin123apple/autocoder • 23 May 2024

We introduce AutoCoder, the first Large Language Model to surpass GPT-4 Turbo (April 2024) and GPT-4o in pass@1 on the Human Eval benchmark test ($\mathbf{90. 9\%}$ vs. $\mathbf{90. 2\%}$).

FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models

ai4finance-foundation/finrobot • 23 May 2024

As financial institutions and professionals increasingly incorporate Large Language Models (LLMs) into their workflows, substantial barriers, including proprietary data and specialized knowledge, persist between the finance sector and the AI community.

EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

research articles on machine learning

The motion module can be adapted to various DiT baseline methods to generate video with different styles.

research articles on machine learning

YOLOv10: Real-Time End-to-End Object Detection

In this work, we aim to further advance the performance-efficiency boundary of YOLOs from both the post-processing and model architecture.

research articles on machine learning

InstaDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos

magic-research/instadrag • 22 May 2024

Accuracy and speed are critical in image editing tasks.

$\textit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous Driving

Photorealistic 3D reconstruction of street scenes is a critical technique for developing real-world simulators for autonomous driving.

research articles on machine learning

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

We design models based off T5-Base and T5-Large to obtain up to 7x increases in pre-training speed with the same computational resources.

research articles on machine learning

Looking Backward: Streaming Video-to-Video Translation with Feature Banks

This paper introduces StreamV2V, a diffusion model that achieves real-time streaming video-to-video (V2V) translation with user prompts.

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

To address the challenges, we present LLaVA-UHD, a large multimodal model that can efficiently perceive images in any aspect ratio and high resolution.

FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research

With the advent of Large Language Models (LLMs), the potential of Retrieval Augmented Generation (RAG) techniques have garnered considerable research attention.

Suggestions or feedback?

MIT News | Massachusetts Institute of Technology

Machine learning.

  • Social justice
  • Black holes
  • Classes and programs

Departments

  • Aeronautics and Astronautics
  • Brain and Cognitive Sciences
  • Architecture
  • Political Science
  • Mechanical Engineering

Centers, Labs, & Programs

  • Abdul Latif Jameel Poverty Action Lab (J-PAL)
  • Picower Institute for Learning and Memory
  • Lincoln Laboratory
  • School of Architecture + Planning
  • School of Engineering
  • School of Humanities, Arts, and Social Sciences
  • Sloan School of Management
  • School of Science
  • MIT Schwarzman College of Computing

Download RSS feed: News Articles / In the Media / Audio

20 identical images in a four by five grid show a robotic arm attempting to grasp a cube. Eighteen squares are green, while two are red. At left is an illustration of a black robotic arm attempting to grab a black cube with a question mark on it.

Helping robots grasp the unpredictable

MIT CSAIL’s frugal deep-learning model infers the hidden physical properties of objects, then adapts to find the most stable grasps for robots in unstructured environments like homes and fulfillment centers.

June 3, 2024

Read full story →

Four photos show, on top level, a simulation of a robot hand using a spatula, knife, hammer and wrench. The second row shows a real robot hand performing the tasks, and the bottom row shows a human hand performing the tasks.

A technique for more effective multipurpose robots

With generative AI models, researchers combined robotics data from different sources to help robots learn better.

A phone plays an abstract video. A neural network surrounds the phone and points to the video’s timeline.

Looking for a specific action in a video? This AI-based method can find it for you

A new approach could streamline virtual training processes or aid clinicians in reviewing diagnostic videos.

May 29, 2024

Three icons of a hand holding a wand transform three images into new pictures. In one, a Baby Yoda toy becomes transparent; in another, a brown purse becomes rougher in texture; and in the last, a goldfish turns white.

Controlled diffusion model can change material properties in images

“Alchemist” system adjusts the material attributes of specific objects within images to potentially modify video game models to fit different environments, fine-tune VFX, and diversify robotic training.

May 28, 2024

Three rows of five portrait photos

School of Engineering welcomes new faculty

Fifteen new faculty members join six of the school’s academic departments.

May 23, 2024

Ten portrait photos are featured in geometrical shapes on a dark blue background. Text indicates "2024 Design Fellows"

2024 MAD Design Fellows announced

The 10 Design Fellows are MIT graduate students working at the intersection of design and multiple disciplines across the Institute.

May 21, 2024

A cute robot is at the chalkboard. The chalkboard is filled with complex charts, waves and shapes.

Scientists use generative AI to answer complex questions in physics

A new technique that can automatically classify phases of physical systems could help scientists investigate novel materials.

May 16, 2024

A digital illustration featuring two stylized humanlike figures engaged in a conversation over a tabletop board game.

Using ideas from game theory to improve the reliability of language models

A new “consensus game,” developed by MIT CSAIL researchers, elevates AI’s text comprehension and generation skills.

May 14, 2024

Three orange blobs turn into the letters and spell “MIT.” Two cute cartoony blobs are in the corner smiling.

A better way to control shape-shifting soft robots

A new algorithm learns to squish, bend, or stretch a robot’s entire body to accomplish diverse tasks like avoiding obstacles or retrieving items.

May 10, 2024

Underwater photo of a large sperm whale diving with two small baby whales near her

Exploring the mysterious alphabet of sperm whales

MIT CSAIL and Project CETI researchers reveal complex communication patterns in sperm whales, deepening our understanding of animal language systems.

May 7, 2024

Sally Kornbluth and Sam Altman are sitting on stage in conversation.

President Sally Kornbluth and OpenAI CEO Sam Altman discuss the future of AI

The conversation in Kresge Auditorium touched on the promise and perils of the rapidly evolving technology.

May 6, 2024

Jonathan Ragan-Kelley stands outdoors in Budapest, with the city as a backdrop

Creating bespoke programming languages for efficient visual AI systems

Associate Professor Jonathan Ragan-Kelley optimizes how computer graphics and images are processed for the hardware of today and tomorrow.

May 3, 2024

A group of 30 people stand in Lobby 7 at MIT, a large atrium with multiple floors

HPI-MIT design research collaboration creates powerful teams

Together, the Hasso Plattner Institute and MIT are working toward novel solutions to the world’s problems as part of the Designing for Sustainability research program.

Two researchers sit at a desk looking at computer screens showing tornado radar images

An AI dataset carves new paths to tornado detection

TorNet, a public artificial intelligence dataset, could help models reveal when and why tornadoes form, improving forecasters' ability to issue warnings.

April 29, 2024

Five people seated in a row in front of an audience smile and listen intently to someone out of the frame.

MIT faculty, instructors, students experiment with generative AI in teaching and learning

At MIT’s Festival of Learning 2024, panelists stressed the importance of developing critical thinking skills while leveraging technologies like generative AI.

Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA, USA

  • Map (opens in new window)
  • Events (opens in new window)
  • People (opens in new window)
  • Careers (opens in new window)
  • Accessibility
  • Social Media Hub
  • MIT on Facebook
  • MIT on YouTube
  • MIT on Instagram

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Springer Nature - PMC COVID-19 Collection

Logo of phenaturepg

Machine Learning and Artificial Intelligence in Pharmaceutical Research and Development: a Review

Sheela kolluri.

1 Global Clinical & Real World Evidence Statistics, Global Biometrics, Teva Pharmaceuticals, 145 Brandywine Pkwy, PA 19380 West Chester, USA

Jianchang Lin

2 Statistical and Quantitative Science, Data Sciences Institute, Takeda Pharmaceutical Co. Limited, 300 Mass Ave, West Chester, PA 19380 USA

Rachael Liu

Yanwei zhang, wenwen zhang.

Over the past decade, artificial intelligence (AI) and machine learning (ML) have become the breakthrough technology most anticipated to have a transformative effect on pharmaceutical research and development (R&D). This is partially driven by revolutionary advances in computational technology and the parallel dissipation of previous constraints to the collection/processing of large volumes of data. Meanwhile, the cost of bringing new drugs to market and to patients has become prohibitively expensive. Recognizing these headwinds, AI/ML techniques are appealing to the pharmaceutical industry due to their automated nature, predictive capabilities, and the consequent expected increase in efficiency. ML approaches have been used in drug discovery over the past 15–20 years with increasing sophistication. The most recent aspect of drug development where positive disruption from AI/ML is starting to occur, is in clinical trial design, conduct, and analysis. The COVID-19 pandemic may further accelerate utilization of AI/ML in clinical trials due to an increased reliance on digital technology in clinical trial conduct. As we move towards a world where there is a growing integration of AI/ML into R&D, it is critical to get past the related buzz-words and noise. It is equally important to recognize that the scientific method is not obsolete when making inferences about data. Doing so will help in separating hope from hype and lead to informed decision-making on the optimal use of AI/ML in drug development. This manuscript aims to demystify key concepts, present use-cases and finally offer insights and a balanced view on the optimal use of AI/ML methods in R&D.

Graphical abstract

An external file that holds a picture, illustration, etc.
Object name is 12248_2021_644_Figa_HTML.jpg

Introduction

Artificial intelligence (AI) and machine learning (ML) have flourished in the past decade, driven by revolutionary advances in computational technology. This has led to transformative improvements in the ability to collect and process large volumes of data. Meanwhile, the cost of bringing new drugs to market and to patients has become prohibitively expensive. In the remainder of this paper, we use “R&D” to generally describe the research, science, and processes associated with drug development, starting with drug discovery to clinical development and conduct, and finally the life-cycle management stage.

Developing a new drug is a long and expensive process with a low success rate as evidenced by the following estimates: average R&D investment is $1.3 billion per drug [ 1 ]; median development time for each drug ranges from 5.9 to 7.2 years for non-oncology and 13.1 years for oncology; and proportion of all drug-development programs that eventually lead to approval is 13.8% [ 2 ]. Recognizing these headwinds, AI/ML techniques are appealing to the drug-development industry, due to their automated nature, predictive capabilities, and the consequent expected increase in efficiency. There is clearly a need, from a patient and a business perspective, to make drug development more efficient and thereby reduce cost, shorten the development time and increase the probability of success (POS). ML methods have been used in drug discovery for the past 15–20 years with increasing sophistication. The most recent aspect of drug development where a positive disruption from AI/ML is starting to occur, is in clinical trial design, operations, and analysis. The COVID-19 pandemic may further accelerate utilization of AI/ML in clinical trials due to increased reliance on digital technology in patient data collection. With this paper, we attempt a general review of the current status of AI/ML in drug development and also present new areas where there might be potential for a significant impact. We hope that this paper will offer a balanced perspective, help in separating hope from hype, and finally inform and promote the optimal use of AI/ML.

We begin with an overview of the basic concepts and terminology related to AI/ML. We then attempt to provide insights on when, where, and how AI/ML techniques can be optimally used in R&D, highlighting clinical trial data analysis where we compare it to traditional inference-based statistical approaches. This is followed by a summary of the current status of AI/ML in R&D with use-case illustrations including ongoing efforts in clinical trial operations. Finally, we present future perspectives and challenges.

AI And ML: Key Concepts And Terminology

In this section, we present an overview of key concepts and terminology related to AI and ML and their interdependency (see Fig.  1  and Table ​ TableI). I ). AI is a technique used to create systems with human-like behavior. ML is an application of AI, where AI is achieved by using algorithms that are trained with data. Deep learning (DL) is a type of ML vaguely inspired by the structure of the human brain, referred to as artificial neural networks.

An external file that holds a picture, illustration, etc.
Object name is 12248_2021_644_Fig1_HTML.jpg

Chronology of AI and ML

below provides simple descriptors of the basic terminology related to AI, ML, and related techniques

Human intelligence is related to the ability of the human brain to observe, understand, and react to an ever-changing external environment. The field of AI not only tries to understand how the human brain works but also tries to build intelligent systems that can react to an ever-changing external environment in a safe and effective way (see Fig. ​ Fig.2 2 for a brief overview of AI [ 3 ]). Researchers have pursued different versions of AI by focusing on either fidelity to human behavior or rationality (doing the right thing) in both thought and action. Subfields of AI can be either general focusing on perception, learning, reasoning, or specific such as playing chess. A multitude of disciplines have contributed to the creation of AI technology, including philosophy, mathematics, and neuroscience. ML, an application of AI, uses statistical methods to find patterns in data, where data can be text, images, or anything that is digitally stored. ML methods are typically classified as supervised learning, unsupervised learning, and reinforcement learning. (See Fig.  3  for a brief overview of supervised and unsupervised learning.)

An external file that holds a picture, illustration, etc.
Object name is 12248_2021_644_Fig2_HTML.jpg

Brief overview of AI

An external file that holds a picture, illustration, etc.
Object name is 12248_2021_644_Fig3_HTML.jpg

Brief overview of supervised and unsupervised learning

Current Status

AI/ML techniques have the potential to increase the likelihood of success in drug development by bringing significant improvements in multiple areas of R&D including: novel target identification, understanding of target-disease associations, drug candidate selection, protein structure predictions, molecular compound design and optimization, understanding of disease mechanisms, development of new prognostic and predictive biomarkers, biometrics data analysis from wearable devices, imaging, precision medicine, and more recently clinical trial design, conduct, and analysis. The impact of the COVID-19 pandemic on clinical trial execution will potentially accelerate the use of AI and ML in clinical trial execution due to an increased reliance on digital technology for data collection and site monitoring.

In the pre-clinical space, natural language processing (NLP) is used to help extract scientific insights from biomedical literature, unstructured electronic medical records (EMR) and insurance claims to ultimately help identify novel targets; predictive modeling is used to predict protein structures and facilitate molecular compound design and optimization for enabling selection of drug candidates with a higher probability of success. The increasing volume of high-dimensional data from genomics, imaging, and the use of digital wearable devices, has led to rapid advancements in ML methods to handle the “Large p, Small n” problem where the number of variables (“p”) is greater than the number of samples (“n”). Such methods also offer benefits to research in the post-marketing stage with the use of “big data” from real-world data sources to (i) enrich the understanding of a drug’s benefit-risk profile; (ii) better understand treatment sequence patterns; and (iii) identify subgroups of patients who may benefit more from one treatment compared with others (precision medicine).

While AI/ML have been widely used in drug discovery, translational research and the pre-clinical phase with increasing sophistication over the past two decades, their utilization in clinical trial operations and data analysis has been slower. We use “clinical trial operations” to refer to the processes involved in the execution and conduct of the clinical trials, including site selection, patient recruitment, trial monitoring, and data collection. Clinical trial data analysis refers to data management, statistical programming, and statistical analysis of participant clinical data collected from a trial. On the trial operations end, patient recruitment has been particularly challenging with an estimated 80% of trials not meeting enrollment timelines and approximately 30% of phase 3 trials terminating early due to enrollment challenges [ 4 ]. Trial site monitoring (involving in-person travel to sites) is an important and expensive quality control step mandated by regulators. Furthermore, with multi-center global trials, clinical trial monitoring has become labor-intensive, time-consuming, and costly. In addition, the duration from the “last subject last visit” trial milestone for the last phase 3 trial to the submission of the data package for regulatory approval, has been largely unchanged over the past two decades and presents a huge opportunity for positive disruption by AI/ML. Shortening this duration will have a dramatic impact on our ability to get drugs to patients faster while reducing cost. The steps in-between include cleaning and locking the trial database, generating the last phase 3 trial analysis results (frequently involving hundreds of summary tables, data listings, and figures), writing the clinical study report, completing the integrated summary of efficacy and safety, and finally creation of the data submission package. The impact of COVID-19 may further accelerate the push to integrate AI/ML into clinical trial operations due to an increased push toward 100% or partially virtual (or “decentralized”) trials and the increased use of digital technology to collect patient data. AI/ML methods can be used to enhance patient recruitment and project enrollment and also to allow real-time automated and “smart” monitoring for clinical data quality and trial site performance monitoring. We believe AI/ML hold potential to have a transformative effect on clinical trial operations and clinical trial data analyses particularly in the areas of trial data analysis, creation of clinical study reports, and regulatory submission data packages.

Case Studies

Below, we offer a few use cases to illustrate how AI/ML methods have been used or are in the process of improving existing approaches in R&D.

Case Study 1 (Drug Discovery)—DL for Protein Structure Prediction and Drug Repurposing

A protein’s biological mechanism is determined by its three-dimensional (3D) structure that is encoded in its one-dimensional (1D) string of amino acid sequence. Knowledge about protein structures is applied to understand their biological mechanisms and help discover new therapies that can inhibit or activate the proteins to treat target diseases. Protein misfolding has been known to be important in many diseases, including type II diabetes, as well as neurodegenerative diseases such as Alzheimer’s, Parkinson’s, Huntington’s, and amyotrophic lateral sclerosis [ 5 ]. Given the knowledge gap between a proteins’1D string of amino acid sequence and its 3D structure, there is significant value in developing methods that can accurately predict 3D protein structures to assist new drug discovery and an understanding of protein-folding diseases. AlphaFold [ 6 , 7 ] developed by DeepMind (Google) is an AI network used to determine a protein’s 3D shape based on its amino acid sequence. It applied a DL approach to predict the structure of the protein using its sequence. The central component of AlphaFold is a convolutional neural network that was trained on the Protein Data Bank structures to predict the distances between every pair of residues in a protein sequence, giving a probabilistic estimate of a 64 × 64 region of the distance map. These regions are then tiled together to produce distance predictions for the entire protein for generating the protein structure that conforms to the distance predictions. In 2020, AlphaFold released the structure predictions of five understudied SARS-CoV-2 targets including SARS-CoV-2 membrane protein, Nsp2, Nsp4, Nsp6, and Papain-like proteinase (C terminal domain), which will hopefully deepen the understanding of under-studied biological systems [ 8 ].

Beck et al. [ 9 ] developed a deep learning–based drug-target interaction prediction model, called Molecule Transformer-Drug Target Interaction (MT-DTI), to predict binding affinities based on chemical sequences and amino acid sequences of a target protein, without their structural information, which can be used to identify potent FDA-approved drugs that may inhibit the functions of SARS-CoV-2’s core proteins. Beck et al. computationally identified several known antiviral drugs, such as atazanavir, remdesivir, efavirenz, ritonavir, and dolutegravir, which are predicted to show an inhibitory potency against SARS-CoV-2 3C–like proteinase and can be potentially repurposed as candidate treatments of SARS-CoV-2 infection in clinical trials.

Case Study 2 (Translational Research/Precision Medicine)—Machine Learning for Developing Predictive Biomarkers

Several successful case studies have now been published to show that the biomarkers derived by the ML predictive models were used to stratify patients in clinical development. Predictive models were developed [ 10 ] to test whether the models derived from cell line screen data could be used to predict patient response to erlotinib (treatment for non-small cell lung cancer and pancreatic cancer) and sorafenib (treatment for kidney, liver, and thyroid cancer), respectively. The predictive models have IC50 values as dependent variables and gene expression data from untreated cells as independent variables. The whole-cell line panel was used as the training dataset and the gene expression data generated from tumor samples of patients treated with the same drug was used as the testing dataset. No information from the testing dataset was used in training the drug sensitivity predictive models. The BATTLE clinical trial data was used as an independent testing dataset to evaluate the performance of the drug sensitivity predictive models trained by cell line data. The best models were selected and used to predict IC50s that define the model-predicted drug-sensitive and drug-resistant groups.

Li et al. [ 10 ] applied the predictive model to stratify patients in the erlotinib arm from the BATTLE trial. The median progression-free survival (PFS) for the model-predicted erlotinib-sensitive patient group was 3.84 months while the PFS for model-predicted erlotinib-resistant patients was 1.84 months, which suggests that the erlotinib-sensitive patients predicted by the model had more than doubled PFS benefit relative to erlotinib-resistant patients. Similarly, the model-predicted sorafenib-sensitive group had a median PFS benefit of 2.66 months over the sorafenib-resistant group with a p -value of 0.006 and a hazard ratio of 0.32 (95%CI, 0.15 to 0.72). The median PFS was 4.53 and 1.87 months, for model-predicted sorafenib-sensitive and model-predicted sorafenib-resistant groups, respectively.

Case Study 3—Nonparametric Bayesian Learning for Clinical Trial Design and Analysis

Many of the existing ML methods are focused on learning a set of parameters within a class of models using the appropriate training data, which is often referred to as model selection. However, an important issue encountered in practice is the potential model over-fitting or under-fitting, as well as the discovery of an underlying data structure and related causes [ 11 ]. Examples include but are not limited to the following: selecting the number of clusters in clustering problem, the number of hidden states in a hidden Markov model, the number of latent variables in a latent variable model, or the complexity of features used in nonlinear regression. Thus, it is important to appropriately train ML methods to perform reliably under real-world conditions with trustworthy predictions. Cross-validation is commonly used as an efficient way to evaluate how well the ML methods perform in the selection of tuning parameters.

Nonparametric Bayesian learning has emerged as a powerful tool in modern ML framework due to its flexibility, providing a Bayesian framework for model selection using a nonparametric approach. More specifically, a Bayesian nonparametric model allows us to use an infinite-dimensional parameter space and involve only a finite subset of the available parameters on the given sample set. Among them, the Dirichlet process is currently a commonly used Bayesian nonparametric model, particularly in Dirichlet process mixture models (also known as infinite mixture models). Dirichlet process mixtures provide a nonparametric approach to model densities and identify latent clusters within the observed variables without pre-specification of the number of components in a mixture model. With advances in Markov Chain Monte Carlo (MCMC) techniques, sampling from infinite mixtures can be done directly or using finite truncations.

There are many applications of such Bayesian nonparametric models in clinical trial design. For example, in oncology dose-finding clinical trials, nonparametric Bayesian learning can offer efficient and effective dose selection. In oncology first in human trials, it is common to enroll patients with multiple types of cancers which causes heterogeneity. Such issues can be more prominent in immuno-oncology and cell therapies. Designs that ignore the heterogeneity of safety or efficacy profiles across various tumor types could lead to imprecise dose selection and inefficient identification of future target populations. Li et al. [ 12 ] proposed nonparametric Bayesian learning–based designs for adaptive dose finding with multiple populations. These designs based on the Bayesian logistic regression model (BLRM) allow data-driven borrowing of information, across multiple populations, while accounting for heterogeneity, thus improving the efficiency of the dose search and also the accuracy of estimation of the optimal dose level. Liu et al. [ 13 ] extended another commonly used dose-finding design, modified toxicity probability interval (mTPI) designs to BNP-mTPI and fBNP-mTPI, by utilizing Bayesian nonparametric learning across different indications. These designs use the Dirichlet process, which is more flexible in prior approximation, and can automatically group patients into similar clusters based on the learning from the emerging data.

Nonparametric Bayesian learning can also be applied in master protocols including basket, umbrella, and platform trials, which allow investigation of multiple therapies, multiple diseases, or both within a single trial [ 14 – 16 ]. With the use of nonparametric Bayesian learning, these trials have an enhanced potential to accelerate the generation of efficacy and safety data through adaptive decision-making. This can affect a reduction in the drug development timeline in an area of significant unmet medical need. For example, in the evaluation of potential COVID-19 therapies, adaptive platform trials have quickly emerged as a critical tool, e.g ., the clinical benefits of remdesivir and dexamethasone have been demonstrated using such approaches in the Adaptive COVID-19 Treatment Trial (ACTT) and the RECOVERY [ 17 ] trial.

One of the key questions in master protocols is whether borrowing across various treatments or indications is appropriate. For example, ideally, each tumor subtype in a basket trial should be tested separately; however, it is often infeasible given the rare genetic mutations. There is potential bias due to the small sample size and variability as well as the inflated type I error if there is a naïve pooling of subgroup information. Different Bayesian hierarchical models (BHMs) have been developed to overcome the limitation of using either independent testing or naïve pooling approaches, e.g ., Bayesian hierarchical mixture model (BHMM) and exchangeability-non-exchangeability (EXNEX) model. However, all these models are highly dependent on the pre-specified mixture parameters. When there is limited prior information on the heterogeneity across different disease subtypes, the misspecification of parameters can be a concern. To overcome the potential limitation of existing parametric borrowing methods, Bayesian nonparametric learning is emerging as a powerful tool to allow flexible shrinkage modeling for heterogeneity between individual subgroups and for automatically capturing the additional clustering. Bunn et al. [ 18 ] show that such models require fewer assumptions than other more commonly used methods and allow more reliable data-driven decision-making in basket trials. Hupf et al. [ 19 ] further extend these flexible Bayesian borrowing strategies to incorporate historical or real-world data.

Case Study 4—Precision Medicine with Machine Learning

Based on recent estimates, among phase 3 trials with novel therapeutics, 54% failed in clinical development, with 57% of those failures due to inadequate efficacy [ 20 ]. A major contributing factor is failure in identification of the appropriate target patient population with the right dose regimen including the right dose levels and combination partners. Thus, precision medicine has become a priority in pharmaceutical industry for drug development. One approach could be a systematic model utilizing ML applied to (a) build a probabilistic model to predict probability of success; and (b) identify subgroups of patients with a higher probability of therapeutic benefit. This will enable the optimal match of patients with the right therapy and maximize the resources and patient benefit. The training datasets can include all ongoing early-phase data, published data, and real-world evidence but are limited to the same class of drugs.

One major challenge to establish the probabilistic model is defining endpoints that can best measure therapeutic effect. Early-phase clinical trials (particularly in oncology) frequently adopt different primary efficacy endpoints compared with confirmatory pivotal trials due to a relatively shorter follow-up time and need for faster decision-making. For example, common oncology endpoints are overall response rate or complete response rate in phase I/II and progression-free survival (PFS) and/or overall survival (both measure long-term benefit) in pivotal phase III trials. In oncology, it is also common that phase I/II trials use single-arm settings to establish the proof of concept and generate the hypothesis of treatment benefit, while in pivotal trials, especially in randomized phase III trial with a control arm, the purpose is to demonstrate superior treatment benefit over available therapy. This change in the targeted endpoints from the early phase to late phase makes the prediction of POS in the pivotal trial, using early-phase data, quite challenging. Training datasets using previous trials for drugs with a similar mechanism and/or indications can help establish the relationship between the short-term endpoints and long-term endpoints, which ultimately determines the success of drug development.

Additionally, the clustering of patients can be done using unsupervised learning. For example, nonparametric Bayesian hierarchical models using the Dirichlet process enables patient grouping (without pre-specified number of clusters) with key predictive or prognostic factors, to represent various levels of treatment benefit. This DL approach will bring efficiency in patient selection for precision medicine clinical development.

Case Study 5—AI/ML-assisted Tool for Clinical Trial Oversight

Monitoring of trials by a sponsor is a critical quality control measure mandated by regulators to ensure the scientific integrity of trials and safety of subjects. With increasing complexity of data collection (increased volume, variety, and velocity), and the use of contract research organizations (CROs)/vendors, sponsor oversight of trial site performance and trial clinical data has become challenging, time-consuming, and extremely expensive. Across all study phases (excluding estimated site overhead costs and costs for sponsors to monitor the study), trial site monitoring is among the top three cost drivers of clinical trial expenditures (9–14% of total cost) [ 21 ].

For monitoring of trial site performance, risk-based monitoring (RBM) has recently emerged as a potential cost-saving and efficient alternative to traditional monitoring (where sponsors sent study monitors to visit sites for 100% source-data verification (SDV) according to a pre-specified schedule). While RBM improves on traditional monitoring, inconsistent RBM approaches used by CROs and the current prospective nature of the operational/clinical trial data reviews—has meant that sponsor’s ability to detect critical issues with site performance, may be delayed or compromised (particularly in lean organizations where CRO oversight is challenging due to limited resources).

For monitoring of trial data quality, current commonly used approaches largely rely on review of traditional subject and/or aggregate data listings and summary statistics based on known risk factors. The lack of real-time data and widely available automated tools limit the sponsor’s ability for prospective risk mitigation. This delayed review can have a significant impact on the outcome of a trial, e.g ., in an acute setting where the primary endpoint uses ePRO data—monthly transfers may be too late to prevent incomplete or incorrect data entry. The larger impact is a systemic gap in study team oversight that could result in critical data quality issues.

One potential solution is the use of AI/ML-assisted tools for monitoring trial site performance and trial data quality. Such a tool could offer an umbrella framework, overlaid on top of the CRO systems, for monitoring trial data quality and sites. With the assistance of AI/ML, study teams may be able to use an advanced form of RBM (improved prediction of risk and thresholds for signal detection) and real-time clinical data monitoring with increased efficiency/quality and reduced cost in a lean resourced environment. Such a tool could apply ML and predictive analytics to current RBM and data quality monitoring—effectively moving current study monitoring to the next generation of RBM. The use of accumulating data from the ongoing trial and available data from similar trials, to continuously improve on the data quality and site performance checks, could have a transformative effect on sponsor’s ability to protect patient safety, reduce trial duration, and trial cost.

In terms of data quality reviews, data fields, and components contributing to the key endpoints that impact the outcome of the trial would be identified by the study team. For trial data monitoring, an AI/ML-assisted tool can make use of predictive analytics and R Shiny visualization for cross-database checks and real-time “smart monitoring” of clinical data quality. By “smart monitoring,” we mean the use of AI/ML techniques that continuously learn from accumulating trial data and improve on the data quality checks, including edit checks. Similarly, for trial site performance, monitoring an AI/ML tool could begin with the Transcelerate (a non-profit cross-pharma consortium) library of key risk indicators (KRIs) and team-specified thresholds to identify problem sites based on operational data. In addition, the “smart” feature of an AI/ML tool could use accumulating data to continuously improve on the precision of the targeted site monitoring that makes up RBM. The authors of this manuscript are currently collaborating with a research team at MIT to advance research in Bayesian probabilistic programming approaches that could aid the development of an AI/ML tool with the features described above for clinical trial oversight of trial data quality and trial site performance.

AI/ML as a field has tremendous growth potential in R&D. As with most technological advances, this presents both challenges and hope. With modern-day data collection, the magnitude and dimensionality of data will continue to increase dramatically because of the use of digital technology. This will increase the opportunities for AI/ML techniques to deepen understanding of biological systems, to repurpose drugs for new indications, and also to inform study design and analysis of clinical trials in drug development.

Although the development of recent ML/AI methods represents major technological advances, the conclusions made could be misleading if we are not able to tease out the confounding factors, use reliable algorithms, look at the right data, and fully understand the clinical questions behind the endpoints and data collection. It is imperative to train ML algorithms properly to have trustworthy performance in practice using various data scenarios. Additionally, not every research question can be answered utilizing AI/ML, particularly if there is high variability, limited data, poor quality of the data collection, under-represented patient populations, or flawed trial design. The issue of under-represented patient populations is particularly concerning as it could lead to a systematic bias. Furthermore, in line with the emerging concerns in other spaces where AI/ML have been used, care and caution needs to be exercised to address patient privacy and bioethical considerations.

It is also important to be aware when DL/AI vs . ML vs . traditional inference-based statistical methods are most effective in R&D. In Fig.  4 below, we attempt to provide a recommendation based on the dimensionality of the dataset. In Fig.  5 , we attempt to provide a similar recommendation, this time based on different aspects of drug development. Although many ML algorithms are able to handle high-dimensional data with the “Large p, Small n” problem, the increased number of variables/predictors, especially those not related to the response, continues to be a challenge. As the number of irrelevant variables/predictors increases, the volume of the noise becomes greater, resulting in the reduced predictive performance of most ML algorithms.

An external file that holds a picture, illustration, etc.
Object name is 12248_2021_644_Fig4_HTML.jpg

Application of ML/AI based on the dimensionality of the data

An external file that holds a picture, illustration, etc.
Object name is 12248_2021_644_Fig5_HTML.jpg

Application of ML/AI based on various aspects of drug development

In summary, a combination of appropriate understanding of both R&D and advanced ML/AI techniques can offer huge benefits to drug development and patients. The implementation and visualization of AI/ML tools can offer user-friendly platforms to maximize efficiency and promote the use of breakthrough techniques in R&D. However, a sound understanding of the difference between causation and correlation is vital, as is the recognition that the evolution of sophisticated prediction capabilities does not render the scientific method to be obsolete. Credible inference still requires sound statistical judgment and this is particularly critical in drug development, given the direct impact on patient health and safety. This further underscores that a well-rounded understanding of ML/AI techniques along with adequate domain-specific knowledge in R&D is paramount for their optimal use in drug development.

Author Contribution

SK, JL, RL, YZ, and WZ contributed to the ideas, implementation, and interpretation of the research topic, and to the writing of the manuscript.

Declarations

Sheela K. was previously employed by Takeda Pharmaceuticals and is currently employed by Teva Pharmaceuticals (West Chester PA USA) during the development and revision of this manuscript. All other authors are employed by Takeda Pharmaceuticals during the development and revision of this manuscript.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

machine learning Recently Published Documents

Total documents.

  • Latest Documents
  • Most Cited Documents
  • Contributed Authors
  • Related Sources
  • Related Keywords

An explainable machine learning model for identifying geographical origins of sea cucumber Apostichopus japonicus based on multi-element profile

A comparison of machine learning- and regression-based models for predicting ductility ratio of rc beam-column joints, alexa, is this a historical record.

Digital transformation in government has brought an increase in the scale, variety, and complexity of records and greater levels of disorganised data. Current practices for selecting records for transfer to The National Archives (TNA) were developed to deal with paper records and are struggling to deal with this shift. This article examines the background to the problem and outlines a project that TNA undertook to research the feasibility of using commercially available artificial intelligence tools to aid selection. The project AI for Selection evaluated a range of commercial solutions varying from off-the-shelf products to cloud-hosted machine learning platforms, as well as a benchmarking tool developed in-house. Suitability of tools depended on several factors, including requirements and skills of transferring bodies as well as the tools’ usability and configurability. This article also explores questions around trust and explainability of decisions made when using AI for sensitive tasks such as selection.

Automated Text Classification of Maintenance Data of Higher Education Buildings Using Text Mining and Machine Learning Techniques

Data-driven analysis and machine learning for energy prediction in distributed photovoltaic generation plants: a case study in queensland, australia, modeling nutrient removal by membrane bioreactor at a sewage treatment plant using machine learning models, big five personality prediction based in indonesian tweets using machine learning methods.

<span lang="EN-US">The popularity of social media has drawn the attention of researchers who have conducted cross-disciplinary studies examining the relationship between personality traits and behavior on social media. Most current work focuses on personality prediction analysis of English texts, but Indonesian has received scant attention. Therefore, this research aims to predict user’s personalities based on Indonesian text from social media using machine learning techniques. This paper evaluates several machine learning techniques, including <a name="_Hlk87278444"></a>naive Bayes (NB), K-nearest neighbors (KNN), and support vector machine (SVM), based on semantic features including emotion, sentiment, and publicly available Twitter profile. We predict the personality based on the big five personality model, the most appropriate model for predicting user personality in social media. We examine the relationships between the semantic features and the Big Five personality dimensions. The experimental results indicate that the Big Five personality exhibit distinct emotional, sentimental, and social characteristics and that SVM outperformed NB and KNN for Indonesian. In addition, we observe several terms in Indonesian that specifically refer to each personality type, each of which has distinct emotional, sentimental, and social features.</span>

Compressive strength of concrete with recycled aggregate; a machine learning-based evaluation

Temperature prediction of flat steel box girders of long-span bridges utilizing in situ environmental parameters and machine learning, computer-assisted cohort identification in practice.

The standard approach to expert-in-the-loop machine learning is active learning, where, repeatedly, an expert is asked to annotate one or more records and the machine finds a classifier that respects all annotations made until that point. We propose an alternative approach, IQRef , in which the expert iteratively designs a classifier and the machine helps him or her to determine how well it is performing and, importantly, when to stop, by reporting statistics on a fixed, hold-out sample of annotated records. We justify our approach based on prior work giving a theoretical model of how to re-use hold-out data. We compare the two approaches in the context of identifying a cohort of EHRs and examine their strengths and weaknesses through a case study arising from an optometric research problem. We conclude that both approaches are complementary, and we recommend that they both be employed in conjunction to address the problem of cohort identification in health research.

Export Citation Format

Share document.

  • Reference Manager
  • Simple TEXT file

People also looked at

Editorial article, editorial: the combination of data-driven machine learning approaches and prior knowledge for robust medical image processing and analysis.

research articles on machine learning

  • 1 Department of Mathematical Sciences, School of Science, Loughborough University, Loughborough, United Kingdom
  • 2 School of Computer Science, University of Birmingham, Birmingham, United Kingdom
  • 3 Department of Electrical and Electronic Engineering & I-X, Imperial College London, London, United Kingdom
  • 4 Faculty of Computing, Harbin Institute of Technology, Harbin, Heilongjiang, China

Editorial on the Research Topic The combination of data-driven machine learning approaches and prior knowledge for robust medical image processing and analysis

Combining data-driven machine learning with prior knowledge has significantly advanced medical image processing and analysis. Deep learning, driven by large datasets and powerful GPUs, excels in tasks like image reconstruction, segmentation, and disease classification. However, these models face challenges such as high resource demands, limited generalization, and lack of interpretability. In contrast, model-driven approaches offer better generalization, interpretability, and robustness but may lack accuracy and efficiency. Combining these paradigms leverages their strengths, promising superior performance and enhanced diagnostic accuracy. This Research Topic showcases how this integration enhances medical imaging, including accurate stroke onset estimation, improved COVID-19 diagnosis and recovery assessment, and enhanced cardiac imaging techniques. These advancements highlight the potential for improved diagnostic accuracy, treatment planning, and clinical decision-making in medical imaging.

A convolutional neural network (CNN) was developed by Gao et al. to identify acute ischemic stroke patients within a 6-h window for endovascular thrombectomy using computed tomography perfusion and perfusion-weighted imaging. This CNN outperformed support vector machines and random forests, demonstrating its potential for accurate stroke onset time estimation using both CT and MR imaging.

Building on the success of deep learning in stroke diagnosis, another study by Huang et al. utilized deep learning and CT scans to assess lung recovery in COVID-19 Delta variant survivors over 6 months. The findings were promising, with ground-glass opacities disappearing and mild fibrosis in most cases, alongside improved lung prognosis compared to the original COVID-19 strain. In a similar vein, a mixed-effects deep learning model was created by Bridge et al. to diagnose COVID-19 from CT scans, achieving high accuracy and robustness. With an AUROC of 0.930 in external validation, this model outperformed other methods, showcasing potential for clinical application in automated COVID-19 diagnosis.

Transitioning to cardiac imaging, a novel Transformer-ConvNet architecture, MAE-TransRNet, was proposed by Xiao et al. for cardiac MRI registration. This method significantly improved deformable image registration accuracy by combining the strengths of convolutional neural networks (CNN) and Transformers, outperforming state-of-the-art methods on the ACDC dataset.

Extending the application of deep learning to ENT diagnostics, a multi-scale deep learning network, MIB-ANet, was developed by Bi et al. for grading adenoid hypertrophy from nasal endoscopy images. This network outperformed junior E.N.T. clinicians in accuracy and speed, demonstrating its potential for clinical application in automated adenoid hypertrophy grading.

Further advancing medical imaging, an anatomical prior-informed masking strategy for pre-training masked autoencoders was introduced by Wang et al. to enhance brain tumor segmentation. Leveraging brain structure knowledge to guide masking, this method improved efficiency and accuracy on the BraTS21 dataset, outperforming state-of-the-art self-supervised learning techniques. Similarly, a Joint 2D−3D Cross-Pseudo Supervision (JCPS) method was introduced by Zhou et al. for segmenting the carotid vessel wall in black-blood MRI images. This approach, which combines coarse and fine segmentation leveraging both labeled and unlabeled data, significantly enhanced segmentation accuracy, outperforming existing methods.

A systematic review of deep learning techniques for segmenting isointense infant brain tissues in MRI was conducted by Mhlanga and Viriri , analyzing 19 studies from 2012–2022. This review highlighted challenges due to low tissue contrast and overlapping intensity in white and gray matter, with convolutional neural networks (CNNs) being prominently used.

AI-based echocardiographic quantification of global longitudinal strain (GLS) and left ventricular ejection fraction (LVEF) in trastuzumab-treated patients was evaluated by Jiang et al. . They found moderate to strong correlations with conventional methods, suggesting AI's potential as a supplementary tool in clinical settings despite lower feasibility rates. In another study employing echocardiograms, Zhang Y. et al. introduced an automated pipeline that utilizes deep neural networks and ensemble learning to accurately quantify left ventricular ejection fraction (LVEF) and predict heart failure. Their method demonstrated high accuracy and clinical applicability, achieving a Pearson's correlation coefficient of 0.83 with expert analysis and an AUROC of 0.98 for heart failure classification. Furthermore, a semi-supervised contrastive learning network was proposed by Guo et al. for multi-structure echocardiographic segmentation. Evaluated on the CAMUS dataset, it achieved high performance, outperforming existing methods and using fewer parameters. This approach enhances cardiac disease diagnosis and reduces clinician workload.

Finally, for oncology, MRI radiomics-based machine learning models were compared for predicting glioblastoma multiforme prognosis by Zhang D. et al. The DeepSurv model outperformed traditional Cox proportional-hazards and other models, highlighting the potential of deep learning in improving GBM survival predictions.

In conclusion, the integration of data-driven machine learning approaches with prior knowledge marks a significant advancement in medical imaging. The studies reviewed herein underscore the transformative impact of these combined methodologies, offering substantial improvements in diagnostic accuracy, efficiency, and robustness across various medical imaging tasks. This Research Topic significantly contributes to the field by addressing key challenges and paving the way for more reliable and precise medical image analysis, ultimately enhancing patient outcomes and clinical decision-making.

Author contributions

DZ: Conceptualization, Writing – original draft, Writing – review & editing. JD: Writing – review & editing. CQ: Writing – review & editing. GL: Writing – review & editing.

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Keywords: deep learning, medical imaging, diagnostic accuracy, data-driven approaches, prior knowledge, robustness

Citation: Zhou D, Duan J, Qin C and Luo G (2024) Editorial: The combination of data-driven machine learning approaches and prior knowledge for robust medical image processing and analysis. Front. Med. 11:1434686. doi: 10.3389/fmed.2024.1434686

Received: 18 May 2024; Accepted: 23 May 2024; Published: 31 May 2024.

Edited and reviewed by: Giorgio Treglia , Ente Ospedaliero Cantonale (EOC), Switzerland

Copyright © 2024 Zhou, Duan, Qin and Luo. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Diwei Zhou, D.Zhou2@lboro.ac.uk ; Jinming Duan, j.duan@bham.ac.uk

This article is part of the Research Topic

The Combination of Data-Driven Machine Learning Approaches and Prior Knowledge for Robust Medical Image Processing and Analysis

At the Forefront - UChicago Medicine

Personalized oxygenation could improve outcomes for patients on ventilators

Man connected to a ventilator

Supplemental oxygen is among the most widely prescribed therapies in the world, with an estimated 13 to 20 million patients worldwide requiring oxygen delivery by mechanical ventilation each year. Mechanical ventilation — a form of life support — is a technology that moves breathable air into and out of the lungs, acting like a bellows. Ventilators have moved far beyond the “iron lung” machines some people might picture; now, apparatuses have progressed to sophisticated, compact digital machines that deliver oxygen through a small plastic tube that goes down the throat.

Despite technological advancements, the correct amount of oxygen to deliver to each patient has remained a guessing game. Clinicians prescribe oxygen levels by using devices that record SpO 2 saturation, which measures the amount of oxygen in a patient’s blood. However, prior research was unable to establish whether a higher or lower SpO 2 target is better for patients.

“The standard of care is to maintain oxygen saturation between 88 and 100; within that range, doctors have had to choose an oxygen level for ventilation without having high-quality data to inform their decision-making,” said Kevin Buell, MBBS, a pulmonary and critical care fellow at the University of Chicago Medicine. “Whether we like it or not, making that decision for each patient exposes them to the potential benefits or harms of the chosen oxygen level.”

To take the guesswork out of ventilation, Buell and a group of other researchers used a machine learning model to study whether the effects of different oxygen levels depend on individual patients’ characteristics. The results, published in JAMA , suggest that personalized oxygenation targets could reduce mortality — which could have far-reaching impacts on critical care.

Previously, some research groups conducted randomized trials to investigate whether higher or lower oxygen levels are better for patients overall, but most produced no clear answer. Buell and his collaborators hypothesized that instead of indicating that oxygen levels don’t affect patient outcomes, the neutral results might indicate that the treatment outcomes for different oxygen levels varied by patient and simply averaged to zero effect in randomized trials.

As personalized medicine continues gaining traction, there is a growing interest in using machine learning to make predictions for individual patients. In the context of mechanical ventilation, these models could potentially use specific patient characteristics to predict an ideal oxygen level for each patient. These characteristics included age, sex, heart rate, body temperature and reason for being admitted to an Intensive Care Unit (ICU).

“We set out to create an evidence-based, personalized prediction of who would benefit from a lower or higher oxygen target when they go on a ventilator,” said Buell, a joint first author on the study.

Those previous randomized trials didn’t go to waste — Buell and his collaborators used data from those studies to design and train their machine learning model. After the model was developed using trial data collected in the U.S., the collaborators applied it to data from patients across the world in Australia and New Zealand. For patients who received oxygenation that fell within the target range the machine learning model predicted to be beneficial for them, mortality could have decreased by 6.4% overall.

It’s impossible to generalize predictions based on a single characteristic — for example, not all patients with brain injuries will benefit from lower oxygen saturation even though the data skew in that direction — which is why clinicians need a tool like the researchers’ machine learning model to piece together the mosaic of each patient’s needs. However, Buell pointed out that although the algorithm itself is complicated, the variables healthcare teams would input are all familiar clinical variables, making it easy for anyone to implement this kind of tool in the future.

At UChicago Medicine, healthcare teams can already use algorithms directly integrated into the electronic health record (EHR) system to inform other areas of clinical decision-making. Buell hopes mechanical ventilation can one day function the same way. For hospitals that might not have the resources to integrate machine learning into an EHR, he even envisions creating a web-based application that would allow clinicians to type in patient characteristics and obtain a prediction that way — like an online calculator. A lot of validation, testing and refinement needs to happen before clinical implementation can become a reality, but the end goal makes that future research well worth the investment.

In an editorial that accompanied the article’s publication, renowned critical care expert Derek Angus, MD, wrote: “If the results are true and generalizable, then the consequences are staggering. If one could instantly assign every patient into their appropriate group of predicted benefit or harm and assign their oxygen target accordingly, the intervention would theoretically yield the greatest single improvement in lives saved from critical illness in the history of the field.”

The study, “ Individualized Treatment Effects of Oxygen Targets in Mechanically Ventilated Critically Ill Adults ,” was published in the Journal of the American Medical Association in March 2024. Co-authors include Kevin G. Buell, Alexandra B. Spicer, Jonathan D. Casey, Kevin P. Seitz, Edward T. Qian, Emma J. Graham Linck, Wesley H. Self, Todd W. Rice, Pratik Sinha, Paul J. Young, Matthew W. Semler and Matthew M. Churpek.

Machine learning and deep learning

  • Fundamentals
  • Open access
  • Published: 08 April 2021
  • Volume 31 , pages 685–695, ( 2021 )

Cite this article

You have full access to this open access article

research articles on machine learning

  • Christian Janiesch   ORCID: orcid.org/0000-0002-8050-123X 1 ,
  • Patrick Zschech   ORCID: orcid.org/0000-0002-1105-8086 2 &
  • Kai Heinrich 3  

163k Accesses

854 Citations

57 Altmetric

Explore all metrics

This article has been updated

Today, intelligent systems that offer artificial intelligence capabilities often rely on machine learning. Machine learning describes the capacity of systems to learn from problem-specific training data to automate the process of analytical model building and solve associated tasks. Deep learning is a machine learning concept based on artificial neural networks. For many applications, deep learning models outperform shallow machine learning models and traditional data analysis approaches. In this article, we summarize the fundamentals of machine learning and deep learning to generate a broader understanding of the methodical underpinning of current intelligent systems. In particular, we provide a conceptual distinction between relevant terms and concepts, explain the process of automated analytical model building through machine learning and deep learning, and discuss the challenges that arise when implementing such intelligent systems in the field of electronic markets and networked business. These naturally go beyond technological aspects and highlight issues in human-machine interaction and artificial intelligence servitization.

Similar content being viewed by others

research articles on machine learning

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

research articles on machine learning

Artificial Intelligence and Business Value: a Literature Review

research articles on machine learning

Artificial intelligence in Finance: a comprehensive review through bibliometric and content analysis

Avoid common mistakes on your manuscript.

Introduction

It is considered easier to explain to a child the nature of what constitutes a sports car as opposed to a normal car by showing him or her examples, rather than trying to formulate explicit rules that define a sports car.

Similarly, instead of codifying knowledge into computers, machine learning (ML) seeks to automatically learn meaningful relationships and patterns from examples and observations (Bishop 2006 ). Advances in ML have enabled the recent rise of intelligent systems with human-like cognitive capacity that penetrate our business and personal life and shape the networked interactions on electronic markets in every conceivable way, with companies augmenting decision-making for productivity, engagement, and employee retention (Shrestha et al. 2021 ), trainable assistant systems adapting to individual user preferences (Fischer et al. 2020 ), and trading agents shaking traditional finance trading markets (Jayanth Balaji et al. 2018 ).

The capacity of such systems for advanced problem solving, generally termed artificial intelligence (AI), is based on analytical models that generate predictions, rules, answers, recommendations, or similar outcomes. First attempts to build analytical models relied on explicitly programming known relationships, procedures, and decision logic into intelligent systems through handcrafted rules (e.g., expert systems for medical diagnoses) (Russell and Norvig 2021 ). Fueled by the practicability of new programming frameworks, data availability, and the broad access to necessary computing power, analytical models are nowadays increasingly built using what is generally referred to as ML (Brynjolfsson and McAfee 2017 ; Goodfellow et al. 2016 ). ML relieves the human of the burden to explicate and formalize his or her knowledge into a machine-accessible form and allows to develop intelligent systems more efficiently.

During the last decades, the field of ML has brought forth a variety of remarkable advancements in sophisticated learning algorithms and efficient pre-processing techniques. One of these advancements was the evolution of artificial neural networks (ANNs) towards increasingly deep neural network architectures with improved learning capabilities summarized as deep learning (DL) (Goodfellow et al. 2016 ; LeCun et al. 2015 ). For specific applications in closed environments, DL already shows superhuman performance by excelling human capabilities (Madani et al. 2018 ; Silver et al. 2018 ). However, such benefits also come at a price as there are several challenges to overcome for successfully implementing analytical models in real business settings. These include the suitable choice from manifold implementation options, bias and drift in data, the mitigation of black-box properties, and the reuse of preconfigured models (as a service).

Beyond its hyped appearance, scholars, as well as professionals, require a solid understanding of the underlying concepts, processes as well as challenges for implementing such technology. Against this background, the goal of this article is to convey a fundamental understanding of ML and DL in the context of electronic markets. In this way, the community can benefit from these technological achievements – be it for the purpose of examining large and high-dimensional data assets collected in digital ecosystems or for the sake of designing novel intelligent systems for electronic markets. Following recent advances in the field, this article focuses on analytical model building and challenges of implementing intelligent systems based on ML and DL. As we examine the field from a technical perspective, we do not elaborate on the related issues of AI technology adoption, policy, and impact on organizational culture (for further implications cf. e.g. Stone et al. 2016 ).

In the next section, we provide a conceptual distinction between relevant terms and concepts. Subsequently, we shed light on the process of automated analytical model building by highlighting the particularities of ML and DL. Then, we proceed to discuss several induced challenges when implementing intelligent systems within organizations or electronic markets. In doing so, we highlight environmental factors of implementation and application rather than viewing the engineered system itself as the only unit of observation. We summarize the article with a brief conclusion.

Conceptual distinction

To provide a fundamental understanding of the field, it is necessary to distinguish several relevant terms and concepts from each other. For this purpose, we first present basic foundations of AI, before we distinguish i) machine learning algorithms, ii) artificial neural networks, and iii) deep neural networks. The hierarchical relationship between those terms is summarized in Venn diagram of Fig.  1 .

figure 1

Venn diagram of machine learning concepts and classes (inspired by Goodfellow et al. 2016 , p. 9)

Broadly defined, AI comprises any technique that enables computers to mimic human behavior and reproduce or excel over human decision-making to solve complex tasks independently or with minimal human intervention (Russell and Norvig 2021 ). As such, it is concerned with a variety of central problems, including knowledge representation, reasoning, learning, planning, perception, and communication, and refers to a variety of tools and methods (e.g., case-based reasoning, rule-based systems, genetic algorithms, fuzzy models, multi-agent systems) (Chen et al. 2008 ). Early AI research focused primarily on hard-coded statements in formal languages, which a computer can then automatically reason about based on logical inference rules. This is also known as the knowledge base approach (Goodfellow et al. 2016 ). However, the paradigm faces several limitations as humans generally struggle to explicate all their tacit knowledge that is required to perform complex tasks (Brynjolfsson and McAfee 2017 ).

Machine learning overcomes such limitations. Generally speaking, ML means that a computer program’s performance improves with experience with respect to some class of tasks and performance measures (Jordan and Mitchell 2015 ). As such, it aims at automating the task of analytical model building to perform cognitive tasks like object detection or natural language translation. This is achieved by applying algorithms that iteratively learn from problem-specific training data, which allows computers to find hidden insights and complex patterns without explicitly being programmed (Bishop 2006 ). Especially in tasks related to high-dimensional data such as classification, regression, and clustering, ML shows good applicability. By learning from previous computations and extracting regularities from massive databases, it can help to produce reliable and repeatable decisions. For this reason, ML algorithms have been successfully applied in many areas, such as fraud detection, credit scoring, next-best offer analysis, speech and image recognition, or natural language processing (NLP).

Based on the given problem and the available data, we can distinguish three types of ML: supervised learning, unsupervised learning, and reinforcement learning. While many applications in electronic markets use supervised learning (Brynjolfsson and McAfee 2017 ), for example, to forecast stock markets (Jayanth Balaji et al. 2018 ), to understand customer perceptions (Ramaswamy and DeClerck 2018 ), to analyze customer needs (Kühl et al. 2020 ), or to search products (Bastan et al. 2020 ), there are implementations of all types, for example, market-making with reinforcement learning (Spooner et al. 2018 ) or unsupervised market segmentation using customer reviews (Ahani et al. 2019 ). See Table 1 for an overview of all three types.

Depending on the learning task, the field offers various classes of ML algorithms , each of them coming in multiple specifications and variants, including regressions models, instance-based algorithms, decision trees, Bayesian methods, and ANNs.

The family of artificial neural networks is of particular interest since their flexible structure allows them to be modified for a wide variety of contexts across all three types of ML. Inspired by the principle of information processing in biological systems, ANNs consist of mathematical representations of connected processing units called artificial neurons. Like synapses in a brain, each connection between neurons transmits signals whose strength can be amplified or attenuated by a weight that is continuously adjusted during the learning process. Signals are only processed by subsequent neurons if a certain threshold is exceeded as determined by an activation function. Typically, neurons are organized into networks with different layers. An input layer usually receives the data input (e.g., product images of an online shop), and an output layer produces the ultimate result (e.g., categorization of products). In between, there are zero or more hidden layers that are responsible for learning a non-linear mapping between input and output (Bishop 2006 ; Goodfellow et al. 2016 ). The number of layers and neurons, among other property choices, such as learning rate or activation function, cannot be learned by the learning algorithm. They constitute a model’s hyperparameters and must be set manually or determined by an optimization routine.

Deep neural networks typically consist of more than one hidden layer, organized in deeply nested network architectures. Furthermore, they usually contain advanced neurons in contrast to simple ANNs. That is, they may use advanced operations (e.g., convolutions) or multiple activations in one neuron rather than using a simple activation function. These characteristics allow deep neural networks to be fed with raw input data and automatically discover a representation that is needed for the corresponding learning task. This is the networks’ core capability, which is commonly known as deep learning. Simple ANNs (e.g., shallow autoencoders) and other ML algorithms (e.g., decision trees) can be subsumed under the term shallow machine learning since they do not provide such functionalities. As there is still no exact demarcation between the two concepts in literature (see also Schmidhuber 2015 ), we use a dashed line in Fig. 1 . While some shallow ML algorithms are considered inherently interpretable by humans and, thus, white boxes, the decision making of most advanced ML algorithms is per se untraceable unless explained otherwise and, thus, constitutes a black box.

DL is particularly useful in domains with large and high-dimensional data, which is why deep neural networks outperform shallow ML algorithms for most applications in which text, image, video, speech, and audio data needs to be processed (LeCun et al. 2015 ). However, for low-dimensional data input, especially in cases of limited training data availability, shallow ML can still produce superior results (Zhang and Ling 2018 ), which even tend to be better interpretable than those generated by deep neural networks (Rudin 2019 ). Further, while DL performance can be superhuman, problems that require strong AI capabilities such as literal understanding and intentionality still cannot be solved as pointedly outlined in Searle ( 1980 )'s Chinese room argument.

Process of analytical model building

In this section, we provide a framework on the process of analytical model building for explicit programming, shallow ML, and DL as they constitute three distinct concepts to build an analytical model. Due to their importance for electronic markets, we focus the subsequent discussion on the related aspects of data input, feature extraction, model building, and model assessment of shallow ML and DL (cf. Figure 2 ). With explicit programming, feature extraction and model building are performed manually by a human when handcrafting rules to specify the analytical model.

figure 2

Process of analytical model building (inspired by Goodfellow et al. 2016 , p. 10)

Electronic markets have different stakeholder touchpoints, such as websites, apps, and social media platforms. Apart from common numerical data, they generate a vast amount of versatile data, in particular unstructured and non-cross-sectional data such as time series, image, and text. This data can be exploited for analytical model building towards better decision support or business automation purposes. However, extracting patterns and relationships by hand would exceed the cognitive capacity of human operators, which is why algorithmic support is indispensable when dealing with large and high-dimensional data.

Time series data implies a sequential dependency and patterns over time that need to be detected to form forecasts, often resulting in regression problems or trend classification tasks. Typical examples involve forecasting financial markets or predicting process behavior (Heinrich et al. 2021 ). Image data is often encountered in the context of object recognition or object counting with fields of application ranging from crop detection for yield prediction to autonomous driving (Grigorescu et al. 2020 ). Text data is present when analyzing large volumes of documents such as corporate e-mails or social media posts. Example applications are sentiment analysis or machine-based translation and summarization of documents (Young et al. 2018 ).

Recent advancements in DL allow for processing data of different types in combination, often referred to as cross-modal learning. This is useful in applications where content is subject to multiple forms of representation, such as e-commerce websites where product information is commonly represented by images, brief descriptions, and other complementary text metadata. Once such cross-modal representations are learned, they can be used, for example, to improve retrieval and recommendation tasks or to detect misinformation and fraud (Bastan et al. 2020 ).

Feature extraction

An important step for the automated identification of patterns and relationships from large data assets is the extraction of features that can be exploited for model building. In general, a feature describes a property derived from the raw data input with the purpose of providing a suitable representation. Thus, feature extraction aims to preserve discriminatory information and separate factors of variation relevant to the overall learning task (Goodfellow et al. 2016 ). For example, when classifying the helpfulness of customer reviews of an online-shop, useful feature candidates could be the choice of words, the length of the review, and the syntactical properties of the text.

Shallow ML heavily relies on such well-defined features, and therefore its performance is dependent on a successful extraction process. Multiple feature extraction techniques have emerged over time that are applicable to different types of data. For example, when analyzing time-series data, it is common to apply techniques to extract time-domain features (e.g., mean, range, skewness) and frequency-domain features (e.g., frequency bands) (Goyal and Pabla 2015 ); for image analysis, suitable approaches include histograms of oriented gradients (HOG) (Dalal and Triggs 2005 ), scale-invariant feature transform (SIFT) (Lowe 2004 ), and the Viola-Jones method (Viola and Jones 2001 ); and in NLP, it is common to use term frequency-inverse document frequency (TF-IDF) vectors (Salton and Buckley 1988 ), part-of-speech (POS) tagging, and word shape features (Wu et al. 2018 ). Manual feature design is a tedious task as it usually requires a lot of domain expertise within an application-specific engineering process. For this reason, it is considered time-consuming, labor-intensive, and inflexible.

Deep neural networks overcome this limitation of handcrafted feature engineering. Their advanced architecture gives them the capability of automated feature learning to extract discriminative feature representations with minimal human effort. For this reason, DL better copes with large-scale, noisy, and unstructured data. The process of feature learning generally proceeds in a hierarchical manner, with high-level abstract features being assembled by simpler ones. Nevertheless, depending on the type of data and the choice of DL architecture, there are different mechanisms of feature learning in conjunction with the step of model building.

Model building

During automated model building, the input is used by a learning algorithm to identify patterns and relationships that are relevant for the respective learning task. As described above, shallow ML requires well-designed features for this task. On this basis, each family of learning algorithms applies different mechanisms for analytical model building. For example, when building a classification model, decision tree algorithms exploit the features space by incrementally splitting data records into increasingly homogenous partitions following a hierarchical, tree-like structure. A support vector machine (SVM) seeks to construct a discriminatory hyperplane between data points of different classes where the input data is often projected into a higher-dimensional feature space for better separability. These examples demonstrate that there are different ways of analytical model building, each of them with individual advantages and disadvantages depending on the input data and the derived features (Kotsiantis et al. 2006 ).

By contrast, DL can directly operate on high-dimensional raw input data to perform the task of model building with its capability of automated feature learning. Therefore, DL architectures are often organized as end-to-end systems combining both aspects in one pipeline. However, DL can also be applied only for extracting a feature representation, which is subsequently fed into other learning subsystems to exploit the strengths of competing ML algorithms, such as decision trees or SVMs.

Various DL architectures have emerged over time (Leijnen and van Veen 2020 ; Pouyanfar et al. 2019 ; Young et al. 2018 ). Although basically every architecture can be used for every task, some architectures are more suited for specific data such as time series or images. Architectural variants are mostly characterized by the types of layers, neural units, and connections they use. Table 2 summarizes the five groups of convolutional neural networks (CNNs), recurrent neural networks (RNNs), distributed representations, autoencoders, and generative adversarial neural networks (GANs). They provide promising applications in the field of electronic markets.

Model assessment

For the assessment of a model’s quality, multiple aspects have to be taken into account, such as performance, computational resources, and interpretability. Performance-based metrics evaluate how well a model satisfies the objective specified by the learning task. In the area of supervised learning, there are well-established guidelines for this purpose. Here, it is common practice to use k -fold cross-validation to prevent a model from overfitting and determine its performance on out-of-sample data that was not included in the training samples. Cross-validation provides the opportunity to compare the reliability of ML models by providing multiple out-of-sample data instances that enable comparative statistical testing (García and Herrera 2008 ). Regression models are evaluated by measuring estimation errors such as the root mean square error (RMSE) or the mean absolute percentage error (MAPE), whereas classification models are assessed by calculating different ratios of correctly and incorrectly predicted instances, such as accuracy, recall, precision, and F 1 score. Furthermore, it is common to apply cost-sensitive measures such as average cost per predicted observation, which is helpful in situations where prediction errors are associated with asymmetric cost structures (Shmueli and Koppius 2011 ). That is the case, for example, when analyzing transactions in financial markets, and the costs of failing to detect a fraudulent transaction are remarkably higher than the costs of incorrectly classifying a non-fraudulent transaction.

To identify a suitable prediction model for a specific task, it is reasonable to compare alternative models of varying complexities, that is, considering competing model classes as well as alternative variants of the same model class. As introduced above, a model’s complexity can be characterized by several properties such as the type of learning mechanisms (e.g., shallow ML vs. DL), the number and type of manually generated or self-extracted features, and the number of trainable parameters (e.g., network weights in ANNs). Simpler models usually do not tend to be flexible enough to capture (non-linear) regularities and patterns that are relevant for the learning task. Overly complex models, on the other hand, entail a higher risk of overfitting. Furthermore, their reasoning is more difficult to interpret (cf. next section), and they are likely to be computationally more expensive. Computational costs are expressed by memory requirements and the inference time to execute a model on new data. These criteria are particularly important when assessing deep neural networks, where several million model parameters may be processed and stored, which places special demands on hardware resources. Consequently, it is crucial for business settings with limited resources (such as environments that heavily rely on mobile devices) to not only select a model at the sweet spot between underfitting and overfitting. They should also to evaluate a model’s complexity concerning further trade-off relationships, such as accuracy vs. memory usage and speed (Heinrich et al. 2019 ).

Challenges for intelligent systems based on machine learning and deep learning

Electronic markets are at the dawn of a technology-induced shift towards data-driven insights provided by intelligent systems (Selz 2020 ). Already today, shallow ML and DL are used to build analytical models for them, and further diffusion is foreseeable. For any real-world application, intelligent systems do not only face the task of model building, system specification, and implementation. They are prone to several issues rooted in how ML and DL operate, which constitute challenges relevant to the Information Systems community. They do require not only technical knowledge but also involve human and business aspects that go beyond the system’s confinements to consider the circumstances and the ecosystem of application.

Managing the triangle of architecture, hyperparameters, and training data

When building shallow ML and DL models for intelligent systems, there are nearly endless options for algorithms or architectures, hyperparameters, and training data (Duin 1994 ; Heinrich et al. 2021 ). At the same time, there is a lack of established guidelines on how a model should be built for a specific problem to ensure not only performance and cost-efficiency but also its robustness and privacy. Moreover, as outlined above, there are often several trade-off relations to be considered in business environments with limited resources, such as prediction quality vs. computational costs. Therefore, the task of analytical model building is the most crucial since it also determines the business success of an intelligent system. For example, a model that can perform at 99.9% accuracy but takes too long to put out a classification decision is rendered useless and is equal to a 0%-accuracy model in the context of time-critical applications such as proactive monitoring or quality assurance in smart factories. Further, different implementations can only be accurately compared when varying only one of the three edges of the triangle at a time and reporting the same metrics. Ultimately, one should consider the necessary skills, available tool support, and the required implementation effort to develop and modify a particular DL architecture (Wanner et al. 2020 ).

Thus, applications with excellent accuracy achieved in a laboratory setting or on a different dataset may not translate into business success when applied in a real-world environment in electronic markets as other factors may outweigh the ML model’s theoretical achievements. This implies that researchers should be aware of the situational characteristics of a models' real-world application to develop an efficacious intelligent system. It is needless to say that researchers cannot know all factors a priori, but they should familiarize themselves with the fact that there are several architectural options with different baseline variants, which suit different scenarios, each with their characteristic properties. Furthermore, multiple metrics such as accuracy and F 1 score should be reviewed on consistent benchmarking data across models before making a choice for a model.

Awareness of bias and drift in data

In terms of automated analytical model building, one needs to be aware of (cognitive) biases that are introduced into any shallow ML or DL model by using human-generated data. These biases will be heavily adopted by the model (Fuchs 2018 ; Howard et al. 2017 ). That is, the models will exhibit the same (human-)induced tendencies that are present in the data or even amplify them. A cognitive bias is an illogical inference or belief that individuals adopt due to flawed reporting of facts or due to flawed decision heuristics (Haselton et al. 2015 ). While data-introduced bias is not a particularly new concept, it is amplified in the context of ML and DL if training data has not been properly selected or pre-processed, has class imbalances, or when inferences are not reviewed responsibly. Striking examples include Amazon’s AI recruiting software that showed discrimination against women or Google’s Vision AI that produced starkly different image labels based on skin color.

Further, the validity of recommendations based on data is prone to concept drift, which describes a scenario, where “the relation between the input data and the target variable changes over time” (Gama et al. 2014 ). That is, ML models for intelligent systems may not produce satisfactory results, when historical data does not describe the present situation adequately anymore, for example due to new competitors entering a market, new production capabilities becoming available, or unprecedented governmental restrictions. Drift does not have to be sudden but can be incremental, gradual, or reoccurring (Gama et al. 2014 ) and thus hard to detect. While techniques for automated learning exist that involve using trusted data windows and concept descriptions (Widmer and Kubat 1996 ), automated strategies for discovering and solving business-related problems are a challenge (Pentland et al. 2020 ).

For applications in electronic markets, considering bias is of high importance as most data points will have human points of contact. These can be as obvious as social media posts or as disguised as omitted variables. Further, poisoning attacks during model retraining can be used to purposefully insert deviating patterns. This entails that training data needs to be carefully reviewed for such human prejudgments. Applications based on this data should be understood as inherently biased rather than as impartial AI. This implies that researchers need to review their datasets and make public any biases they are aware of. Again, it is unrealistic to assume that all bias effects can be explicated in large datasets with high-dimensional data. Nevertheless, to better understand and trust an ML model, it is important to detect and highlight those effects that have or may have an impact on predictions. Lastly, as constant drift can be assumed in any real-world electronic market, a trained model is never finished. Companies must put strategies in place to identify, track, and counter concept drift that impacts the quality of their intelligent system’s decisions. Currently, manual checks and periodic model retraining prevail.

Unpredictability of predictions and the need for explainability

The complexity of DL models and some shallow ML models such as random forest and SVMs, often referred to as of black-box nature, makes it nearly impossible to predict how they will perform in a specific context (Adadi and Berrada 2018 ). This also entails that users may not be able to review and understand the recommendations of intelligent systems based on these models. Moreover, this makes it very difficult to prepare for adversarial attacks, which trick and break DL models (Heinrich et al. 2020 ). They can be a threat to high-stake applications, for example, in terms of perturbations of street signs for autonomous driving (Eykholt et al. 2018 ). Thus, it may become necessary to explain the decision of a black-box model also to ease organizational adoption. Not only do humans prefer simple explanations to trust and adopt a model, but the requirement of explainability may even be enforced by law (Miller 2019 ).

The field of explainable AI (XAI) deals with the augmentation of existing DL models to produce explanations for output predictions. For image data, this involves highlighting areas of the input image that are responsible for generating a specific output decision (Adadi and Berrada 2018 ). Concerning time series data, methods have been developed to highlight the particular important time steps influencing a forecast (Assaf and Schumann 2019 ). A similar approach can be used for highlighting words in a text that lead to specific classification outputs.

Thus, applications in electronic markets with different criticality and human interaction requirements should be designed or augmented distinctively to address the respective concerns. Researchers must review the applications in particular of DL models for their criticality and accountability. Possibly, they must choose an explainable white-box model over a more accurate black-box model (Rudin 2019 ) or consider XAI augmentations to make the model’s predictions more accessible to its users (Adadi and Berrada 2018 ).

Resource limitations and transfer learning

Lastly, building and training comprehensive analytical models with shallow ML or DL is costly and requires large datasets to avoid a cold start. Fortunately, models do not always have to be trained from scratch. The concept of transfer learning allows models that are trained on general datasets (e.g., large-scale image datasets) to be specialized for specific tasks by using a considerably smaller dataset that is problem-specific (Pouyanfar et al. 2019 ). However, using pre-trained models from foreign sources can pose a risk as the models can be subject to biases and adversarial attacks, as introduced above. For example, pre-trained models may not properly reflect certain environmental constraints or contain backdoors by inserting classification triggers, for example, to misclassify medical images (Wang et al. 2020 ). Governmental interventions to redirect or suppress predictions are conceivable as well. Hence, in high-stake situations, the reuse of publicly available analytical models may not be an option. Nevertheless, transfer learning offers a feasible option for small and medium-sized enterprises to deploy intelligent systems or enables large companies to repurpose their own general analytical models for specific applications.

In the context of transfer learning, new markets and ecosystems of AI as a service (AIaaS) are already emerging. Such marketplaces, for example by Microsoft or Amazon Web Services, offer cloud AI applications, AI platforms, and AI infrastructure. In addition to cloud-based benefits for deployments, they also enable transfer learning from already established models to other applications. That is, they allow customers with limited AI development resources to purchase pre-trained models and integrate them into their own business environments (e.g., NLP models for chatbot applications). New types of vendors can participate in such markets, for example, by offering transfer learning results for highly domain-specific tasks, such as predictive maintenance for complex machines. As outlined above, consumers of servitized DL models in particular need to be aware of the risks their black-box nature poses and establish similarly strict protocols as with human operators for similar decisions. As the market of AIaaS is only emerging, guidelines for responsible transfer learning have yet to be established (e.g., Amorós et al. 2020 ).

With this fundamentals article, we provide a broad introduction to ML and DL. Often subsumed as AI technology, both fuel the analytical models underlying contemporary and future intelligent systems. We have conceptualized ML, shallow ML, and DL as well as their algorithms and architectures. Further, we have described the general process of automated analytical model building with its four aspects of data input, feature extraction, model building, and model assessment. Lastly, we contribute to the ongoing diffusion into electronics markets by discussing four fundamental challenges for intelligent systems based on ML and DL in real-world ecosystems.

Here, in particular, AIaaS constitutes a new and unexplored electronic market and will heavily influence other established service platforms. They will, for example, augment the smartness of so-called smart services by providing new ways to learn from customer data and provide advice or instructions to them without being explicitly programmed to do so. We estimate that much of the upcoming research on electronic markets will be against the backdrop of AIaaS and their ecosystems and devise new applications, roles, and business models for intelligent systems based on DL. Related future research will need to address and factor in the challenges we presented by providing structured methodological guidance to build analytical models, assess data collections and model performance, and make predictions safe and accessible to the user.

Change history

13 october 2021.

Springer Nature’s version of this paper was updated to reflect the missing Open Access funding note.

Adadi, A., & Berrada, M. (2018). Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access, 6 , 52138–52160. https://doi.org/10.1109/ACCESS.2018.2870052 .

Article   Google Scholar  

Ahani, A., Nilashi, M., Ibrahim, O., Sanzogni, L., & Weaven, S. (2019). Market segmentation and travel choice prediction in Spa hotels through TripAdvisor’s online reviews. International Journal of Hospitality Management, 80 , 52–77. https://doi.org/10.1016/j.ijhm.2019.01.003 .

Amorós, L., Hafiz, S. M., Lee, K., & Tol, M. C. (2020). Gimme that model!: A trusted ML model trading protocol. arXiv:2003.00610 [cs] . http://arxiv.org/abs/2003.00610

Assaf, R., & Schumann, A. (2019). Explainable deep neural networks for multivariate time series predictions. Proceedings of the 28th International Joint Conference on Artificial Intelligence , 6488–6490. https://doi.org/10.24963/ijcai.2019/932 .

Bastan, M., Ramisa, A., & Tek, M. (2020). Cross-modal fashion product search with transformer-based Embeddings. CVPR Workshop - 3rd workshop on Computer Vision for Fashion,  Art and Design,  Seattle: Washington.

Bishop, C. M. (2006). Pattern recognition and machine learning (Information science and statistics) . Springer-Verlag New York, Inc.

Brynjolfsson, E., & McAfee, A. (2017). The business of artificial intelligence. Harvard Business Review , 1–20.

Chen, S. H., Jakeman, A. J., & Norton, J. P. (2008). Artificial intelligence techniques: An introduction to their use for modelling environmental systems. Mathematics and Computers in Simulation, 78 (2–3), 379–400. https://doi.org/10.1016/j.matcom.2008.01.028 .

Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) , 1 , 886–893. https://doi.org/10.1109/CVPR.2005.177 .

Duin, R. P. W. (1994). Superlearning and neural network magic. Pattern Recognition Letters, 15 (3), 215–217. https://doi.org/10.1016/0167-8655(94)90052-3 .

Eykholt, K., Evtimov, I., Fernandes, E., Li, B., Rahmati, A., Xiao, C., Prakash, A., Kohno, T., & Song, D. (2018). Robust physical-world attacks on deep learning visual classification. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018 , 1625–1634. https://doi.org/10.1109/CVPR.2018.00175 .

Fischer, M., Heim, D., Hofmann, A., Janiesch, C., Klima, C., & Winkelmann, A. (2020). A taxonomy and archetypes of smart services for smart living. Electronic Markets, 30 (1), 131–149. https://doi.org/10.1007/s12525-019-00384-5 .

Fuchs, D. J. (2018). The dangers of human-like Bias in machine-learning algorithms. Missouri S&T’s Peer to Peer, 2 (1), 15.

Google Scholar  

Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys, 46 (4), 1–37. https://doi.org/10.1145/2523813 .

García, S., & Herrera, F. (2008). An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. Journal of Machine Learning Research, 9 (89), 2677–2694.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. The MIT Press.

Goyal, D., & Pabla, B. S. (2015). Condition based maintenance of machine tools—A review. CIRP Journal of Manufacturing Science and Technology, 10 , 24–35. https://doi.org/10.1016/j.cirpj.2015.05.004 .

Grigorescu, S., Trasnea, B., Cocias, T., & Macesanu, G. (2020). A survey of deep learning techniques for autonomous driving. Journal of Field Robotics, 37 (3), 362–386. https://doi.org/10.1002/rob.21918 .

Haselton, M. G., Nettle, D., & Andrews, P. W. (2015). The evolution of cognitive Bias. In: D. M. Buss (Ed.), The handbook of evolutionary psychology (pp. 724–746). Inc: John Wiley & Sons. https://doi.org/10.1002/9780470939376.ch25 .

Heinrich, K., Graf, J., Chen, J., Laurisch, J., & Zschech, P. (2020). Fool me once, shame on you, fool me twice, shame on me: A taxonomy of attack and defense patterns for AI security. Proceedings of the 28th European Conference on Information Systems (ECIS) .

Heinrich, K., Möller, B., Janiesch, C., & Zschech, P. (2019). Is Bigger Always Better? Lessons Learnt from the Evolution of Deep Learning Architectures for Image Classification. Proceedings of the 2019 Pre-ICIS SIGDSA Symposium . https://aisel.aisnet.org/sigdsa2019/20

Heinrich, K., Zschech, P., Janiesch, C., & Bonin, M. (2021). Process data properties matter: Introducing gated convolutional neural networks (GCNN) and key-value-predict attention networks (KVP) for next event prediction with deep learning. Decision Support Systems, 143 , 113494. https://doi.org/10.1016/j.dss.2021.113494 .

Howard, A., Zhang, C., & Horvitz, E. (2017). Addressing bias in machine learning algorithms: A pilot study on emotion recognition for intelligent systems. IEEE Workshop on Advanced Robotics and its Social Impacts (ARSO) , 1–7. https://doi.org/10.1109/ARSO.2017.8025197 .

Jayanth Balaji, A., Harish Ram, D. S., & Nair, B. B. (2018). Applicability of deep learning models for stock Price forecasting an empirical study on BANKEX data. Procedia Computer Science, 143 , 947–953. https://doi.org/10.1016/j.procs.2018.10.340 .

Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349 (6245), 255–260. https://doi.org/10.1126/science.aaa8415 .

Kotsiantis, S. B., Zaharakis, I. D., & Pintelas, P. E. (2006). Machine learning: A review of classification and combining techniques. Artificial Intelligence Review, 26 (3), 159–190. https://doi.org/10.1007/s10462-007-9052-3 .

Kühl, N., Mühlthaler, M., & Goutier, M. (2020). Supporting customer-oriented marketing with artificial intelligence: Automatically quantifying customer needs from social media. Electronic Markets, 30 (2), 351–367. https://doi.org/10.1007/s12525-019-00351-0 .

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521 (7553), 436–444.  https://doi.org/10.1038/nature14539 .

Leijnen, S., & van Veen, F. (2020). The Neural Network Zoo. Proceedings, 47 (1), 9. https://doi.org/10.3390/proceedings47010009 .

Liu, Z., Lin, Y., & Sun, M. (2020). Representation learning for natural language processing . Springer Singapore. https://doi.org/10.1007/978-981-15-5573-2 .

Lowe, D. G. (2004). Distinctive image features from scale-invariant Keypoints. International Journal of Computer Vision, 60 (2), 91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94 .

Madani, A., Arnaout, R., Mofrad, M., & Arnaout, R. (2018). Fast and accurate view classification of echocardiograms using deep learning. Npj Digital Medicine, 1 (1). https://doi.org/10.1038/s41746-017-0013-1 .

Miller, T. (2019). Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267 , 1–38. https://doi.org/10.1016/j.artint.2018.07.007 .

Pan, Z., Yu, W., Yi, X., Khan, A., Yuan, F., & Zheng, Y. (2019). Recent Progress on generative adversarial networks (GANs): A survey. IEEE Access, 7 , 36322–36333. https://doi.org/10.1109/ACCESS.2019.2905015 .

Paula, E. L., Ladeira, M., Carvalho, R. N., & Marzagão, T. (2016). Deep learning anomaly detection as support fraud investigation in Brazilian exports and anti-money laundering. 15th IEEE International Conference on Machine Learning and Applications (ICMLA) , 954–960. https://doi.org/10.1109/ICMLA.2016.0172 .

Pentland, B. T., Liu, P., Kremser, W., & Haerem, T. (2020). The dynamics of drift in digitized processes. MIS Quarterly , 44 (1), 19–47. https://doi.org/10.25300/MISQ/2020/14458 .

Peters, M., Ketter, W., Saar-Tsechansky, M., & Collins, J. (2013). A reinforcement learning approach to autonomous decision-making in smart electricity markets. Machine Learning, 92 (1), 5–39. https://doi.org/10.1007/s10994-013-5340-0 .

Pouyanfar, S., Sadiq, S., Yan, Y., Tian, H., Tao, Y., Reyes, M. P., Shyu, M.-L., Chen, S.-C., & Iyengar, S. S. (2019). A survey on deep learning: Algorithms, techniques, and applications. ACM Computing Surveys, 51 (5), 1–36. https://doi.org/10.1145/3234150 .

Ramaswamy, S., & DeClerck, N. (2018). Customer perception analysis using deep learning and NLP. Procedia Computer Science, 140 , 170–178. https://doi.org/10.1016/j.procs.2018.10.326 .

Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1 (5), 206–215. https://doi.org/10.1038/s42256-019-0048-x .

Russell, S. J., & Norvig, P. (2021). Artificial intelligence: A modern approach (4th ed.). Pearson.

Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24 (5), 513–523. https://doi.org/10.1016/0306-4573(88)90021-0 .

Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61 , 85–117. https://doi.org/10.1016/j.neunet.2014.09.003 .

Searle, J. R. (1980). Minds, brains, and programs. Behavioral and Brain Sciences, 3 (3), 417–424. https://doi.org/10.1017/S0140525X00005756 .

Selz, D. (2020). From electronic markets to data driven insights. Electronic Markets, 30 (1), 57–59. https://doi.org/10.1007/s12525-019-00393-4 .

Shmueli, G., & Koppius, O. (2011). Predictive analytics in information systems research. Management Information Systems Quarterly, 35 (3), 553–572.  https://doi.org/10.2307/23042796 .

Shrestha, Y. R., Krishna, V., & von Krogh, G. (2021). Augmenting organizational decision-making with deep learning algorithms: Principles, promises, and challenges. Journal of Business Research, 123 , 588–603. https://doi.org/10.1016/j.jbusres.2020.09.068 .

Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., & Hassabis, D. (2018). A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362 (6419), 1140–1144. https://doi.org/10.1126/science.aar6404 .

Spooner, T., Fearnley, J., Savani, R., & Koukorinis, A. (2018). Market making via reinforcement learning. Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent systems , 434–442. arXiv:1804.04216v1 

Stone, P., Brooks, R., Brynjolfsson, E., Calo, R., Etzioni, O., Hager, G., Hirschberg, J., Kalyanakrishnan, S., Kamar, E., Kraus, S., Leyton-Brown, Kevin, Parkes, D., Press, W., Saxenian, A. L., Shah, J., Milind Tambe, & Teller, A. (2016). Artificial Intelligence and Life in 2030: the  one hundred year study on artificial intelligence  (Report of the 2015–2016 study panel). Stanford University. https://ai100.stanford.edu/2016-report

Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001 , 1 , I-511–I-518. https://doi.org/10.1109/CVPR.2001.990517 .

Wang, S., Nepal, S., Rudolph, C., Grobler, M., Chen, S., & Chen, T. (2020). Backdoor attacks against transfer learning with pre-trained deep learning models. IEEE Transactions on Services Computing , 1–1. https://doi.org/10.1109/TSC.2020.3000900 .

Wanner, J., Heinrich, K., Janiesch, C., & Zschech, P. (2020). How much AI do you require? Decision factors for adopting AI technology. Proceedings of the 41st International Conference on Information Systems (ICIS) .

Westerlund, M. (2019). The emergence of Deepfake technology: A review. Technology Innovation Management Review , 9 (11), 39–52. https://doi.org/10.22215/timreview/1282

Widmer, G., & Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine Learning, 23 (1), 69–101. https://doi.org/10.1007/BF00116900 .

Wu, M., Liu, F., & Cohn, T. (2018). Evaluating the utility of hand-crafted features in sequence labelling. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , 2850–2856. https://doi.org/10.18653/v1/D18-1310 .

Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent trends in deep learning based natural language processing [review article]. IEEE Computational Intelligence Magazine, 13 (3), 55–75. https://doi.org/10.1109/MCI.2018.2840738 .

Zhang, Y., & Ling, C. (2018). A strategy to apply machine learning to small datasets in materials science. npj Computational Materials, 4 (1). https://doi.org/10.1038/s41524-018-0081-z .

Download references

Open Access funding enabledand organized by Projekt DEAL. This research and development project is funded by the Bayerische Staatsministerium für Wirtschaft, Landesentwicklung und Energie (StMWi) within the framework concept “Informations- und Kommunikationstechnik” (grant no. DIK0143/02) and managed by the project management agency VDI+VDE Innovation + Technik GmbH..

Author information

Authors and affiliations.

Faculty of Business Management & Economics, University of Würzburg, Sanderring 2, 97070, Würzburg, Germany

Christian Janiesch

Institute of Information Systems, Friedrich-Alexander University Erlangen-Nürnberg, Lange Gasse 20, 90403, Nürnberg, Germany

Patrick Zschech

Faculty of Economics and Management, Otto-von-Guericke-Universität Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany

Kai Heinrich

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Christian Janiesch .

Additional information

Responsible Editor: Fabio Lobato

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Janiesch, C., Zschech, P. & Heinrich, K. Machine learning and deep learning. Electron Markets 31 , 685–695 (2021). https://doi.org/10.1007/s12525-021-00475-2

Download citation

Received : 07 October 2020

Accepted : 19 March 2021

Published : 08 April 2021

Issue Date : September 2021

DOI : https://doi.org/10.1007/s12525-021-00475-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Machine learning
  • Deep learning
  • Artificial intelligence
  • Artificial neural networks
  • Analytical model building

JEL classification

  • Find a journal
  • Publish with us
  • Track your research

Purdue University Graduate School

File(s) under embargo

until file(s) become available

A FINITE ELEMENT AND MACHINE LEARNING STUDY OF 3D PEROVSKITE SOLAR CELL: EFFECT OF LAYER THICKNESS AND DELAMINATION

This research presents a comprehensive study of a 3D Perovskite Solar Cell model using Finite Element Analysis (FEA) and Machine Learning (ML). The research aims (i) to understand how material properties impact solar cell’s performance by applying basic semiconductor physics principles (ii) to investigate how interfacial delamination affects the performance of Perovskite solar cells (iii) to determine the optimum thickness of different layers of the solar cell (iv) to determine the fatigue life cycle of Perovskite layer.

Degree Type

  • Master of Science
  • Mechanical Engineering

Campus location

Advisor/supervisor/committee chair, additional committee member 2, additional committee member 3, usage metrics.

  • Energy generation, conversion and storage (excl. chemical and electrical)

CC BY 4.0

IMAGES

  1. (PDF) A Research on Machine Learning Methods and Its Applications

    research articles on machine learning

  2. (PDF) A Research on Machine Learning Methods and Its Applications

    research articles on machine learning

  3. Data Science and Machine Learning : Research Paper on Machine Learning

    research articles on machine learning

  4. Latest Thesis Topics in Machine Learning for Research Scholars

    research articles on machine learning

  5. Figure 1 from Comparative analysis of machine learning algorithms on

    research articles on machine learning

  6. Machine Learning: What Is Machine Learning?

    research articles on machine learning

VIDEO

  1. Extreme Learning Machine: Learning Without Iterative Tuning

  2. Azure Machine Learning

COMMENTS

  1. Machine learning

    Machine learning articles from across Nature Portfolio. Machine learning is the ability of a machine to improve its performance based on previous results. Machine learning methods enable computers ...

  2. Machine Learning: Algorithms, Real-World Applications and Research

    In the current age of the Fourth Industrial Revolution (4IR or Industry 4.0), the digital world has a wealth of data, such as Internet of Things (IoT) data, cybersecurity data, mobile data, business data, social media data, health data, etc. To intelligently analyze these data and develop the corresponding smart and automated applications, the knowledge of artificial intelligence (AI ...

  3. Journal of Machine Learning Research

    The Journal of Machine Learning Research (JMLR), , provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning. All published papers are freely available online. JMLR has a commitment to rigorous yet rapid reviewing. Final versions are (ISSN 1533-7928) immediately ...

  4. Machine Learning: Algorithms, Real-World Applications and Research

    The learning algorithms can be categorized into four major types, such as supervised, unsupervised, semi-supervised, and reinforcement learning in the area [ 75 ], discussed briefly in Sect. " Types of Real-World Data and Machine Learning Techniques ". The popularity of these approaches to learning is increasing day-by-day, which is shown ...

  5. The latest in Machine Learning

    NaiboWang/EasySpider • ACM The Web Conference 2023. As such, web-crawling is an essential tool for both computational and non-computational scientists to conduct research. Data Integration Marketing. 26,735. 0.94 stars / hour. Paper. Code. Papers With Code highlights trending Machine Learning research and the code to implement it.

  6. Machine learning-based approach: global trends, research directions

    Temporal evolution of machine learning-related publications. a) Temporal patterns of machine learning-related articles; b) relative percentage estimation in 2020. Table 3. Number of papers related to machine learning published in 2020. ... The research was conducted by a working group composed of jurists, computer scientists, and social ...

  7. The Journal of Machine Learning Research

    Benjamin Recht. Article No.: 20, pp 724-750. This paper provides elementary analyses of the regret and generalization of minimum-norm interpolating classifiers (MNIC). The MNIC is the function of smallest Reproducing Kernel Hilbert Space norm that perfectly interpolates a label pattern on a finite ...

  8. THE JOURNAL OF MACHINE LEARNING RESEARCH Home

    The Journal of Machine Learning Research (JMLR) provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning.JMLR seeks previously unpublished papers that contain:new algorithms with empirical, theoretical, psychological, or biological justification; experimental and/or theoretical studies yielding new insight into ...

  9. Home

    Overview. Machine Learning is an international forum focusing on computational approaches to learning. Reports substantive results on a wide range of learning methods applied to various learning problems. Provides robust support through empirical studies, theoretical analysis, or comparison to psychological phenomena.

  10. Machine Learning

    8.8 Key features. Machine and deep learning are research areas in multidisciplinary fields that constantly evolve due to the advances in data analytics research in the age of Big Data, Cloud digital ecosystem, etc. The effects of new computing resources and technologies combined with increasing data sets are changing many research, health, and ...

  11. Artificial intelligence and machine learning research ...

    Chui KT, Fung DCL, Lytras MD, Lam TM (2020) Predicting at-risk university students in a virtual learning environment via a machine learning algorithm. Comput Human Behav 107:105584. Article Google Scholar Lytras MD, Visvizi A, Daniela L, Sarirete A, De Pablos PO (2018) Social networks research for sustainable smart education.

  12. Machine learning

    Machine learning. Download RSS feed: News Articles / In the Media / Audio. Displaying 1 - 15 of 901 news articles related to this topic. ... and MIT are working toward novel solutions to the world's problems as part of the Designing for Sustainability research program. May 3, 2024.

  13. 777306 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on MACHINE LEARNING. Find methods information, sources, references or conduct a literature review on ...

  14. Machine Learning with Applications

    Machine Learning with Applications (MLWA) is a peer reviewed, open access journal focused on research related to machine learning.The journal encompasses all aspects of research and development in ML, including but not limited to data mining, computer vision, natural language processing (NLP), intelligent systems, neural networks, AI-based software engineering, bioinformatics and their ...

  15. Machine Learning and Artificial Intelligence in Pharmaceutical Research

    Artificial intelligence (AI) and machine learning (ML) have flourished in the past decade, driven by revolutionary advances in computational technology. This has led to transformative improvements in the ability to collect and process large volumes of data. ... Case Study 2 (Translational Research/Precision Medicine)—Machine Learning for ...

  16. machine learning Latest Research Papers

    Find the latest published documents for machine learning, Related hot topics, top authors, the most cited documents, and related journals. ... Therefore, this research aims to predict user's personalities based on Indonesian text from social media using machine learning techniques. This paper evaluates several machine learning techniques ...

  17. Deep Learning: A Comprehensive Overview on Techniques ...

    Deep learning (DL), a branch of machine learning (ML) and artificial intelligence (AI) is nowadays considered as a core technology of today's Fourth Industrial Revolution (4IR or Industry 4.0). Due to its learning capabilities from data, DL technology originated from artificial neural network (ANN), has become a hot topic in the context of computing, and is widely applied in various ...

  18. (PDF) Machine Learning:A Review

    In this article, we focus on the most recent developments for research in precision psychiatry using machine learning, deep learning, and neural network algorithms, together with neuroimaging and ...

  19. Frontiers

    Combining data-driven machine learning with prior knowledge has significantly advanced medical image processing and analysis. Deep learning, driven by large datasets and powerful GPUs, excels in tasks like image reconstruction, segmentation, and disease classification. However, these models face challenges such as high resource demands, limited generalization, and lack of interpretability. In ...

  20. A Deep Learning Model for Cancer Type Prediction Sets a New Standard

    Summary:. Classifying tumor types using machine learning approaches is not always trivial, particularly for challenging cases such as cancers of unknown primary. In this issue of Cancer Discovery, Darmofal and colleagues describe a new tool that uses information from a clinical sequencing panel to diagnose tumor type, and show that the model is particularly robust.See related article by ...

  21. Enhancement of Reflood Test Prediction by Integrating Machine Learning

    Korea Atomic Energy Research Institute (KAERI) 111 Daedeok-daero 989beon-gil Yuseong-gu Daejeon 34057 Republic of Korea kaeri.re.kr. Search for more papers by this author ... resulting in significant computational costs. This limitation can be overcome by implementing machine learning (ML) predictions that can efficiently expand the DA search ...

  22. Personalized oxygenation could improve outcomes for patients on

    Researchers used a machine learning model to predict personalized oxygenation targets. Data analysis revealed that mortality rates were lower for patients who received the oxygen level predicted to be most beneficial for them. ... However, prior research was unable to establish whether a higher or lower SpO 2 target is better for patients ...

  23. Machine learning and deep learning

    Machine learning describes the capacity of systems to learn from problem-specific training data to automate the process of analytical model building and solve associated tasks. Deep learning is a machine learning concept based on artificial neural networks. ... Journal of Machine Learning Research, 9(89), 2677-2694. Google Scholar Goodfellow ...

  24. Incorporating landscape ecological approach in machine learning

    To integrate all this data, this research employed a DEM-based spatial analysis and spatial data-based machine learning. The classification algorithm used is the random decision forest (RDF) (Biau Citation 2012; Eastman, Citation 2019; Danoedoro et al. Citation 2020). This machine learning algorithm was chosen for its ability to accommodate ...

  25. A Finite Element and Machine Learning Study of 3d Perovskite Solar Cell

    This research presents a comprehensive study of a 3D Perovskite Solar Cell model using Finite Element Analysis (FEA) and Machine Learning (ML). The research aims (i) to understand how material properties impact solar cell's performance by applying basic semiconductor physics principles (ii) to investigate how interfacial delamination affects the performance of Perovskite solar cells (iii) to ...