NIH News in Health

A monthly newsletter from the National Institutes of Health, part of the U.S. Department of Health and Human Services

Search form

January 2023

Print this issue

Health Capsule

Artificial Intelligence and Medical Research

Conceptual graphic showing the many ways AI is integrated into the technologies people use every day

Artificial intelligence, or AI, has been around for decades. In the past 20 years or so, it’s become a growing part of our lives. Researchers are now drawing on the power of AI to improve medicine and health care in innovative and far-reaching ways. NIH is on the cutting edge supporting these efforts.

At first, computers could simply do calculations based on human input. In AI, they learn to perform certain tasks. Some early forms of AI could play checkers or chess and even defeat human world champions. Others could recognize and convert speech to text.

Today, different forms of AI are being used to improve medical care. Researchers are exploring how AI could be used to sift through test results and image data. AI could then make recommendations to help with treatment decisions.

Some NIH-funded studies are using AI to develop “smart clothing” that can reduce low back pain. This technology could warn the wearer of unsafe body movements. Other studies are seeking ways to better manage blood glucose (or blood sugar) levels using wearable sensors.

Learn more about the different types of AI and their use in medical research .

Related Stories

 Illustration of a robot and a doctor analyzing a medical image together.

Artificial Intelligence and Your Health

Illustration of different types of dog breeds.

Pet Dogs to the Rescue!

NIHNiH Logo

How Research Works

Older couple looking up health information on a laptop

Finding Reliable Health Information Online

NIH Office of Communications and Public Liaison Building 31, Room 5B52 Bethesda, MD 20892-2094 [email protected] Tel: 301-451-8224

Editor: Harrison Wein, Ph.D. Managing Editor: Tianna Hicklin, Ph.D. Illustrator: Alan Defibaugh

Attention Editors: Reprint our articles and illustrations in your own publication. Our material is not copyrighted. Please acknowledge NIH News in Health as the source and send us a copy.

For more consumer health news and information, visit health.nih.gov .

For wellness toolkits, visit www.nih.gov/wellnesstoolkits .

Skip to Content

Healthcare research & technology advancements

Our team of clinicians, researchers, and engineers are all working together to create new AI and discover opportunities to increase the availability and accuracy of healthcare technologies globally, to realize long-term health technology potential.

Man and woman sitting on grass

Meet Med-PaLM 2, our large language model designed for the medical domain

Developing AI that can answer medical questions accurately has been a challenge for several decades. With Med-PaLM 2 , a version of PaLM 2 fine-tuned for the medical domain, we showed state-of-the-art performance in answering medical licensing exam questions. With thorough human evaluation, we’re exploring how Med-PaLM 2 can help healthcare organizations by drafting responses, summarizing documents, and providing insights. Learn more .

Expanding the power of AI in medicine

We are building and testing AI models with the goal of helping alleviate the global shortages of physicians, as well as the low access to modern imaging and diagnostic tools in certain parts of the world. With improved tech, we hope to increase accessibility and help more patients receive timely and accurate diagnoses and care.

How DeepVariant is improving the accuracy of genomic analysis

Sequencing genomes enables us to identify variants in a person’s DNA that indicate genetic disorders such as an elevated risk for breast cancer. DeepVariant is an open-source variant caller that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.

Father and daughter embracing

Healthcare research led by scientists, enhanced by Google

Google Health is providing secure technology to partners that helps doctors, nurses, and other healthcare professionals conduct research and help improve our understanding of health. If you are a researcher interested in working with Google Health to conduct health research, enter your details to be notified when Google Health is available for research partnerships.

Using AI to give doctors a 48-hour head start on life-threatening illness

In this research in Nature , we demonstrated how artificial intelligence could accurately predict acute kidney injuries (AKI) in patients up to 48 hours earlier than it is currently diagnosed. Notoriously difficult to spot, AKI affects up to one in five hospitalized patients in the US and UK, and deterioration can happen quickly. Read the article

Deep Learning

Protecting patients, deep learning for electronic health records.

In a paper published in npj Digital Medicine , we used deep learning models to make a broad set of predictions relevant to hospitalized patients using de-identified electronic health records, and showed how that model could be used to render an accurate prediction 24 hours after a patient was admitted to the hospital. Read the article

Protecting patients from medication errors

Research shows that 2% of hospitalized patients experience serious preventable medication-related incidents that can be life-threatening, cause permanent harm, or result in death. Published in Clinical Pharmacology and Therapeutics , our best-performing AI model was able to anticipate physician’s actual prescribing decisions 75% of the time, based on de-identified electronic health records and the doctor’s prescribing records. This is an early step towards testing the hypothesis that machine learning can support clinicians in ways that prevent mistakes and help to keep patients safe. Read the article

Discover the latest

Learn more about our most recent developments from Google’s health-related research and initiatives.

Detecting Signs of Disease from External Images of the Eye

Detecting abnormal chest x-rays using deep learning, improving genomic discovery with machine learning, how ai is advancing science and medicine.

Google researchers have been exploring ways technologies could help advance the fields of medicine and science, working with scientists, doctors, and others in the field. In this video, we share a few research projects that have big potential.

We are continuously publishing new research in health

AI has the potential to help save lives by transforming healthcare and medicine through the creation of more personalized, accessible and effective solutions. This is particularly true in more resource challenged communities where there is often a shortage of healthcare workers. In collaboration with healthcare providers, researchers and industry partners, we’ve published research, created open-source tools, and built AI systems that have the potential to positively impact health outcomes for people globally. With bold innovation that's responsibly developed, AI stands to be a powerful force for health equity, improving outcomes for everyone, everywhere.

Explore how teams at Google are catalyzing the adoption of human-centered AI in healthcare.

medical research ai

A suite of AI models designed for the medical domain

Building on innovations from Med-PaLM , the first large language model to reach expert performance on medical licensing exam-style questions, MedLM is our collection of medically-tuned large models for commercial applications. MedLM can complete a wide range of complex tasks, ranging from answering medical questions, to summarizing dense medical information, to deriving insights from unstructured data. It is now available to Cloud customers through Vertex AI.

Mammography

medical research ai

Improving breast cancer screening with AI

Breast cancer is the most common form of cancer globally, and early detection through breast cancer screening can lead to better chances of survival. Working with healthcare partners like Northwestern Medicine , we developed an AI system that integrates into breast cancer screening workflows to help radiologists identify breast cancer earlier and more consistently. Our published research shows that our technology can identify signs of breast cancer as well as trained radiologists. We are now bringing this research to reality by partnering with iCAD to embed this technology in clinical settings.

medical research ai

Expanding access to ultrasound with AI

Ultrasound is a versatile and increasingly more accessible early disease detection tool, providing real-time dynamic views of major organ systems. We are developing AI models to make it easier to interpret important health information from ultrasound images. Notably, we are focusing on maternal ultrasound and partnering with Jacaranda Health in Kenya to improve our AI models. Our goal is to expand access to care in areas where access to trained sonographers is limited.

Open Health Stack

medical research ai

Building blocks for next-generation healthcare apps

Digital mobile health apps are capable of lowering the barrier to equitable healthcare. However, it’s costly and difficult for developers to build tools that share health information across systems and work well in areas that often lack reliable internet connectivity. Open Health Stack is a suite of open-source building blocks built on an interoperable data standard. This suite of components makes it easier for developers to quickly build apps allowing healthcare workers to access the information and insights they need to make informed decisions.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Review Article
  • Published: 15 September 2022

Multimodal biomedical AI

  • Julián N. Acosta   ORCID: orcid.org/0000-0001-6497-5454 1 ,
  • Guido J. Falcone 1 ,
  • Pranav Rajpurkar   ORCID: orcid.org/0000-0002-8030-3727 2   na1 &
  • Eric J. Topol   ORCID: orcid.org/0000-0002-1478-4729 3   na1  

Nature Medicine volume  28 ,  pages 1773–1784 ( 2022 ) Cite this article

114k Accesses

194 Citations

399 Altmetric

Metrics details

  • Computational biology and bioinformatics
  • Health care

The increasing availability of biomedical data from large biobanks, electronic health records, medical imaging, wearable and ambient biosensors, and the lower cost of genome and microbiome sequencing have set the stage for the development of multimodal artificial intelligence solutions that capture the complexity of human health and disease. In this Review, we outline the key applications enabled, along with the technical and analytical challenges. We explore opportunities in personalized medicine, digital clinical trials, remote monitoring and care, pandemic surveillance, digital twin technology and virtual health assistants. Further, we survey the data, modeling and privacy challenges that must be overcome to realize the full potential of multimodal artificial intelligence in health.

Similar content being viewed by others

medical research ai

Digital medicine and the curse of dimensionality

medical research ai

Digital twins for health: a scoping review

medical research ai

Integrated multimodal artificial intelligence framework for healthcare applications

While artificial intelligence (AI) tools have transformed several domains (for example, language translation, speech recognition and natural image recognition), medicine has lagged behind. This is partly due to complexity and high dimensionality—in other words, a large number of unique features or signals contained in the data—leading to technical challenges in developing and validating solutions that generalize to diverse populations. However, there is now widespread use of wearable sensors and improved capabilities for data capture, aggregation and analysis, along with decreasing costs of genome sequencing and related ‘omics’ technologies. Collectively, this sets the foundation and need for novel tools that can meaningfully process this wealth of data from multiple sources, and provide value across biomedical discovery, diagnosis, prognosis, treatment and prevention.

Most of the current applications of AI in medicine have addressed narrowly defined tasks using one data modality, such as a computed tomography (CT) scan or retinal photograph. In contrast, clinicians process data from multiple sources and modalities when diagnosing, making prognostic evaluations and deciding on treatment plans. Furthermore, current AI assessments are typically one-off snapshots, based on a moment of time when the assessment is performed, and therefore not ‘seeing’ health as a continuous state. In theory, however, AI models should be able to use all data sources typically available to clinicians, and even those unavailable to most of them (for example, most clinicians do not have a deep understanding of genomic medicine). The development of multimodal AI models that incorporate data across modalities—including biosensors, genetic, epigenetic, proteomic, microbiome, metabolomic, imaging, text, clinical, social determinants and environmental data—is poised to partially bridge this gap and enable broad applications that include individualized medicine, integrated, real-time pandemic surveillance, digital clinical trials and virtual health coaches (Fig. 1 ). In this Review, we explore the opportunities for such multimodal datasets in healthcare; we then discuss the key challenges and promising strategies for overcoming these. Basic concepts in AI and machine learning will not be discussed here but are reviewed in detail elsewhere 1 , 2 , 3 .

figure 1

Created with BioRender.com .

Opportunities for leveraging multimodal data

Personalized ‘omics’ for precision health.

With the remarkable progress in sequencing over the past two decades, there has been a revolution in the amount of fine-grained biological data that can be obtained using novel technical developments. These are collectively referred to as the ‘omes’, and includes the genome, proteome, transcriptome, immunome, epigenome, metabolome and microbiome 4 . These can be analyzed in bulk or at the single-cell level, which is relevant because many medical conditions such as cancer are quite heterogeneous at the tissue level, and much of biology shows cell and tissue specificity.

Each of the omics has shown value in different clinical and research settings individually. Genetic and molecular markers of malignant tumors have been integrated into clinical practice 5 , 6 , with the US Food and Drug Administration (FDA) providing approval for several companion diagnostic devices and nucleic acid-based tests 7 , 8 . As an example, Foundation Medicine and Oncotype IQ (Genomic Health) offer comprehensive genomic profiling tailored to the main classes of genomic alterations across a broad panel of genes, with the final goal of identifying potential therapeutic targets 9 , 10 . Beyond these molecular markers, liquid biopsy samples—easily accessible biological fluids such as blood and urine—are becoming a widely used tool for analysis in precision oncology, with some tests based on circulating tumor cells and circulating tumor DNA already approved by the FDA 11 . Beyond oncology, there has been a remarkable increase in the last 15 years in the availability and sharing of genetic data, which enabled genome-wide association studies 12 and characterization of the genetic architecture of complex human conditions and traits 13 . This has improved our understanding of biological pathways and produced tools such as polygenic risk scores 14 (which capture the overall genetic propensity to complex traits for each individual), and may be useful for risk stratification and individualized treatment, as well as in clinical research to enrich the recruitment of participants most likely to benefit from interventions 15 , 16 .

The integration of these very distinct types of data remains challenging. Yet, overcoming this problem is paramount, as the successful integration of omics data, in addition to other types such as electronic health record (EHR) and imaging data, is expected to increase our understanding of human health even further and allow for precise and individualized preventive, diagnostic and therapeutic strategies 4 . Several approaches have been proposed for multi-omics data integration in precision health contexts 17 . Graph neural networks are one example; 18 , 19 these are deep learning model architectures that process computational graphs—a well-known data structure comprising nodes (representing concepts or entities) and edges (representing connections or relationships between nodes)—thereby allowing scientists to account for the known interrelated structure of multiple types of omics data, which can improve performance of a model 20 . Another approach is dimensionality reduction, including novel methods such as PHATE and Multiscale PHATE, which can learn abstract representations of biological and clinical data at different levels of granularity, and have been shown to predict clinical outcomes, for example, in people with coronavirus disease 2019 (COVID-19) 21 , 22 .

In the context of cancer, overcoming challenges related to data access, sharing and accurate labeling could potentially lead to impactful tools that leverage the combination of personalized omics data with histopathology, imaging and clinical data to inform clinical trajectories and improve patient outcomes 23 . The integration of histopathological morphology data with transcriptomics data, resulting in spatially resolved transcriptomics 24 , constitutes a novel and promising methodological advancement that will enable finer-grained research into gene expression within a spatial context. Of note, researchers have utilized deep learning to leverage histopathology images to predict spatial gene expression from these images alone, pointing to morphological features in these images not captured by human experts that could potentially enhance the utility and lower the costs of this technology 25 , 26 .

Genetic data are increasingly cost effective, requiring only a one-in-a-lifetime ascertainment, but they also have limited predictive ability on their own 27 . Integrating genomics data with other omics data may capture more dynamic and real-time information on how each particular combination of genetic background and environmental exposures interact to produce the quantifiable continuum of health status. As an example, Kellogg et al. 28 conducted an N -of-1 study performing whole-genome sequencing (WGS) and periodic measurements of other omics layers (transcriptome, proteome, metabolome, antibodies and clinical biomarkers); polygenic risk scoring showed an increased risk of type II diabetes mellitus, and comprehensive profiling of other omics enabled early detection and dissection of signaling network changes during the transition from health to disease.

As the scientific field advances, the cost-effectiveness profile of WGS will become increasingly favorable, facilitating the combination of clinical and biomarker data with already available genetic data to arrive at a rapid diagnosis of conditions that were previously difficult to detect 29 . Ultimately, the capability to develop multimodal AI that includes many layers of omics data will get us to the desired goal of deep phenotyping of an individual; in other words, a true understanding of each person’s biological uniqueness and how that affects health.

Digital clinical trials

Randomized clinical trials are the gold standard study design to investigate causation and provide evidence to support the use of novel diagnostic, prognostic and therapeutic interventions in clinical medicine. Unfortunately, planning and executing a high-quality clinical trial is not only time consuming (usually taking many years to recruit enough participants and follow them in time) but also financially very costly 30 , 31 . In addition, geographic, sociocultural and economic disparities in access to enrollment, have led to a remarkable underrepresentation of several groups in these studies. This limits the generalizability of results and leads to a scenario whereby widespread underrepresentation in biomedical research further perpetuates existing disparities 32 . Digitizing clinical trials could provide an unprecedented opportunity to overcome these limitations, by reducing barriers to participant enrollment and retainment, promoting engagement and optimizing trial measurements and interventions. At the same time, the use of digital technologies can enhance the granularity of the information obtained from participants, thereby increasing the value of these studies 33 .

Data from wearable technology (including heart rate, sleep, physical activity, electrocardiography, oxygen saturation and glucose monitoring) and smartphone-enabled self-reported questionnaires can be useful for monitoring clinical trial patients, identifying adverse events or ascertaining trial outcomes 34 . Additionally, a recent study highlighted the potential of data from wearable sensors to predict laboratory results 35 . Consequently, the number of studies using digital products has been growing rapidly in the last few years, with a compound annual growth rate of around 34% 36 . Most of these studies utilize data from a single wearable device. One pioneering trial used a ‘band-aid’ patch sensor for detecting atrial fibrillation; the sensor was mailed to participants who were enrolled remotely, without the use of any clinical sites, and set the foundation for digitized clinical trials 37 . Many remote, site-less trials using wearables were conducted during the COVID-19 pandemic to detect coronavirus 38 .

Effectively combining data from different wearable sensors with clinical data remains a challenge and an opportunity. Digital clinical trials could leverage multiple sources of participants’ data to enable automatic phenotyping and subgrouping 34 , which could be useful for adaptive clinical trial designs that use ongoing results to modify the trial in real time 39 , 40 . In the future, we expect that the increased availability of these data and novel multimodal learning techniques will improve our capabilities in digital clinical trials. Of note, recent work in a time-series analysis by Google has demonstrated the promise of attention-based model architectures to combine both static and time-dependent inputs to achieve interpretable time-series forecasting. As a hypothetical example, these models could understand whether to focus on static features such as genetic background, known time-varying features such as time of the day or observed features such as current glycemic levels, to make predictions on future risk of hypoglycemia or hyperglycemia 41 . Graph neural networks have been recently proposed to overcome the problem of missing or irregularly sampled data from multiple health sensors, by leveraging information from the interconnection between these 42 .

Patient recruitment and retention in clinical trials are essential but remain a challenge. In this setting, there is an increasing interest in the utilization of synthetic control methods (that is, using external data to create controls). Although synthetic control trials are still relatively novel 43 , the FDA has already approved medications based on historical controls 44 and has developed a framework for the utilization of real-world evidence 45 . AI models utilizing data from different modalities can potentially help identify or generate the most optimal synthetic controls 46 , 47 .

Remote monitoring: the ‘hospital-at-home’

Recent progress with biosensors, continuous monitoring and analytics raises the possibility of simulating the hospital setting in a person’s home. This offers the promise of marked reduction of cost, less requirement for healthcare workforce, avoidance of nosocomial infections and medical errors that occur in medical facilities, along with the comfort, convenience and emotional support of being with family members 48 .

In this context, wearable sensors have a crucial role in remote patient monitoring. The availability of relatively affordable noninvasive devices (smartwatches or bands) that can accurately measure several physiological metrics is increasing rapidly 49 , 50 . Combining these data with those derived from EHRs—using standards such as the Fast Healthcare Interoperability Resources, a global industry standard for exchanging healthcare data 51 —to query relevant information about a patient’s underlying disease risk could create a more personalized remote monitoring experience for patients and caregivers. Ambient wireless sensors offer an additional opportunity to collect valuable data. Ambient sensors are devices located within the environment (for example, a room, a wall or a mirror) ranging from video cameras and microphones to depth cameras and radio signals. These ambient sensors can potentially improve remote care systems at home and in healthcare institutions 52 .

The integration of data from these multiple modalities and sensors represents a promising opportunity to improve remote patient monitoring, and some studies have already demonstrated the potential of multimodal data in these scenarios. For example, the combination of ambient sensors (such as depth cameras and microphones) with wearables data (for example, accelerometers, which measure physical activity) has the potential to improve the reliability of fall detection systems while keeping a low false alarm rate 53 , and to improve gait analysis performance 54 . Early detection of impairments in physical functional status via activities of daily living such as bathing, dressing and eating is remarkably important to provide timely clinical care, and the utilization of multimodal data from wearable devices and ambient sensors can potentially help with accurate detection and classification of difficulties in these activities 55 .

Beyond management of chronic or degenerative disorders, multimodal remote patient monitoring could also be useful in the setting of acute disease. A recent program conducted by the Mayo Clinic showcased the feasibility and safety of remote monitoring in people with COVID-19 (ref. 56 ). Remote patient monitoring for hospital-at-home applications—not yet validated—requires randomized trials of multimodal AI-based remote monitoring versus hospital admission to show no impairment of safety. We need to be able to predict impending deterioration and have a system to intervene, and this has not been achieved yet.

Pandemic surveillance and outbreak detection

The current COVID-19 pandemic has highlighted the need for effective infectious disease surveillance at national and state levels 57 , with some countries successfully integrating multimodal data from migration maps, mobile phone utilization and health delivery data to forecast the spread of the outbreak and identify potential cases 58 , 59 .

One study has also demonstrated the utilization of resting heart rate and sleep minutes tracked using wearable devices to improve surveillance of influenza-like illness in the USA 60 . This initial success evolved into the Digital Engagement and Tracking for Early Control and Treatment (DETECT) Health study, launched by the Scripps Research Translational Institute as an app-based research program aiming to analyze a diverse set of data from wearables to allow for rapid detection of the emergence of influenza, coronavirus and other fast-spreading viral illnesses. A follow-up study from this program showed that jointly considering participant self-reported symptoms and sensor metrics improved performance relative to either modality alone, reaching an area under the receiver operating curve value of 0.80 (95% confidence interval 0.73–0.86) for classifying COVID-19-positive versus COVID-19-negative status 61 .

Several other use cases for multimodal AI models in pandemic preparedness and response have been tested with promising results, but further validation and replication of these results are needed 62 , 63 .

Digital twins

We currently rely on clinical trials as the best evidence to identify successful interventions. Interventions that help 10 of 100 people may be considered successful, but these are applied to the other 90 without proven or likely benefit. A complementary approach known as ‘digital twins’ can fill the knowledge gaps by leveraging large amounts of data to model and predict with high precision how a certain therapeutic intervention would benefit or harm a particular patient.

Digital twin technology is a concept borrowed from engineering that uses computational models of complex systems (for example, cities, airplanes or patients) to develop and test different strategies or approaches more quickly and economically than in real-life scenarios 64 . In healthcare, digital twins are a promising tool for drug target discovery 65 , 66 .

Integrating data from multiple sources to develop digital twin models using AI tools has already been proposed in precision oncology and cardiovascular health 67 , 68 . An open-source modular framework has also been proposed for the development of medical digital twin models 69 . From a commercial point of view, Unlearn.AI has developed and tested digital twin models that leverage diverse sets of clinical data to enhance clinical trials for Alzheimer’s disease and multiple sclerosis 70 , 71 .

Considering the complexity of human organisms, the development of accurate and useful digital twin technology in medicine will depend on the ability to collect large and diverse multimodal data ranging from omics data and physiological sensors to clinical and sociodemographic data. This will likely require large collaborations across health systems, research groups and industry, such as the Swedish Digital Twins Consortium 65 , 72 . The American Society of Clinical Oncology, through its subsidiary called CancerLinQ, developed a platform that enables researchers to utilize a wealth of data from patients with cancer to help guide optimal treatment and improve outcomes 73 . The development of AI models capable of effectively learning from all these data modalities together, to make real-time predictions, is paramount.

Virtual health assistant

More than one-third of US consumers have acquired a smart speaker in the last few years. However, virtual health assistants—digital AI-enabled coaches that can advise people on their health needs—have not been developed widely to date, and those currently in the market often target a particular condition or use case. In addition, a recent review of health-focused conversational agents apps found that most of these rely on rule-based approaches and predefined app-led dialog 74 .

One of the most popular, although not multimodal AI-based, current applications of these narrowly focused virtual health assistants is in diabetes care. Virta health, Accolade and Onduo by Verily (Alphabet) have all developed applications that aim to improve diabetes control, with some demonstrating improvement in hemoglobin A1c levels in individuals who followed the programs 75 . Many of these companies have expanded or are in the process of expanding to other use cases such as hypertension control and weight loss. Other examples of virtual health coaches have tackled common conditions such as migraine, asthma and chronic obstructive pulmonary disease, among others 76 . Unfortunately, most of these applications have been tested only on small observational studies, and much more research, including randomized clinical trials, are needed to evaluate their benefits.

Looking into the future, the successful integration of multiple data sources in AI models will facilitate the development of broadly focused personalized virtual health assistants 77 . These virtual health assistants can leverage individualized profiles based on genome sequencing, other omics layers, continuous monitoring of blood biomarkers and metabolites, biosensors and other relevant biomedical data—to promote behavior change, answer health-related questions, triage symptoms or communicate with healthcare providers when appropriate. Importantly, these AI-enabled medical coaches will need to demonstrate beneficial effects on clinical outcomes via randomized trials to achieve widespread acceptance in the medical field. As most of these applications are focused on improving health choices, they will need to provide evidence of influencing health behavior, which represents the ultimate pathway for the successful translation of most interventions 78 .

We still have a long way to go to achieve the full potential of AI and multimodal data integration into virtual health assistants, including the technical challenges, data-related challenges and privacy challenges discussed below. Given the rapid advances in conversational AI 79 , coupled with the development of increasingly sophisticated multimodal learning approaches, we expect future digital health applications to embrace the potential of AI to deliver accurate and personalized health coaching.

Multimodal data collection

The first requirement for the successful development of multimodal data-enabled applications is the collection, curation and harmonization of well-phenotyped and large annotated datasets, as no amount of technical sophistication can derive information not present in the data 80 . In the last 20 years, many national and international studies have collected multimodal data with the ultimate goal of accelerating precision health (Table 1 ). In the UK, the UK Biobank initiated enrollment in 2006, reaching a final participant count of over 500,000, and plans to follow participants for at least 30 years after enrollment 81 . This large biobank has collected multiple layers of data from participants, including sociodemographic and lifestyle information, physical measurements, biological samples, 12-lead electrocardiograms and EHR data 82 . Further, almost all participants underwent genome-wide array genotyping and, more recently, proteome, whole-exome sequencing 83 and WGS 84 . A subset of individuals also underwent brain magnetic resonance imaging (MRI), cardiac MRI, abdominal MRI, carotid ultrasound and dual-energy X-ray absorptiometry, including repeat imaging across at least two time points 85 .

Similar initiatives have been conducted in other countries, such as the China Kadoorie Biobank 86 and Biobank Japan 87 . In the USA, the Department of Veteran Affairs launched the Million Veteran Program 88 in 2011, aiming to enroll 1 million veterans to contribute to scientific discovery. Two important efforts funded by the National Institutes of Health (NIH) include the Trans-Omics for Precision Medicine (TOPMed) program and the All of Us Research Program. TOPMed collects WGS with the aim to integrate this genetic information with other omics data 89 . The All of Us Research Program 90 constitutes another novel and ambitious initiative by the NIH that has enrolled about 400,000 diverse participants of the 1 million people planned across the USA, and is focused on enrolling individuals from broadly defined underrepresented groups in biomedical research, which is especially needed in medical AI 91 , 92 .

Besides these large national initiatives, independent institutional and multi-institutional efforts are also building deep, multimodal data resources in smaller numbers of people. The Project Baseline Health Study, funded by Verily and managed in collaboration with Stanford University, Duke University and the California Health and Longevity Institute, aims to enroll at least 10,000 individuals, starting with an initial 2,500 participants from whom a broad range of multimodal data are collected, with the aim of evolving into a combined virtual-in-person research effort 93 . As another example, the American Gut Project collects microbiome data from self-selected participants across several countries 94 . These participants also complete surveys about general health status, disease history, lifestyle data and food frequency. The Medical Information Mart for Intensive Care (MIMIC) database 95 , organized by the Massachusetts Institute of Technology, represents another example of multidimensional data collection and harmonization. Currently in its fourth version, MIMIC is an open-source database that contains de-identified data from thousands of patients who were admitted to the critical care units of the Beth Israel Deaconess Medical Center, including demographic information, EHR data (for example, diagnosis codes, medications ordered and administered, laboratory data and physiological data such as blood pressure or intracranial pressure values), imaging data (for example, chest radiographs) 96 and, in some versions, natural language text such as radiology reports and medical notes. This granularity of data is particularly useful for the data science and machine learning community, and MIMIC has become one of the benchmark datasets for AI models aiming to predict the development of clinical events such as kidney failure, or outcomes such as survival or readmissions 97 , 98 .

The availability of multimodal data in these datasets may help achieve better diagnostic performance across a range of different tasks. As an example, recent work has demonstrated that the combination of imaging and EHR data outperforms each of these modalities alone to identify pulmonary embolism 99 , and to differentiate between common causes of acute respiratory failure, such as heart failure, pneumonia or chronic obstructive pulmonary disease 100 . The Michigan Predictive Activity & Clinical Trajectories in Health (MIPACT) study constitutes another example, with participants contributing data from wearables, physiological data (blood pressure), clinical information (EHR and surveys) and laboratory data 101 . The North American Prodrome Longitudinal Study is yet another example. This multisite program recruited individuals, and collected demographic, clinical and blood biomarker data with the goal of understanding the prodromal stages of psychosis 102 , 103 . Other studies focusing on psychiatric disorders such as the Personalised Prognostic Tools for Early Psychosis Management also collected several types of data and have already empowered the development of multimodal machine learning workflows 104 .

Technical challenges

Implementation and modeling challenges.

Health data are inherently multimodal. Our health status encompasses many domains (social, biological and environmental) that influence well-being in complex ways. Additionally, each of these domains is hierarchically organized, with data being abstracted from the big picture macro level (for example, disease presence or absence) to the in-depth micro level (for example, biomarkers, proteomics and genomics). Furthermore, current healthcare systems add to this multimodal approach by generating data in multiple ways: radiology and pathology images are, for example, paired with natural language data from their respective reports, while disease states are also documented in natural language and tabular data in the EHR.

Multimodal machine learning (also referred to as multimodal learning) is a subfield of machine learning that aims to develop and train models that can leverage multiple different types of data and learn to relate these multiple modalities or combine them, with the goal of improving prediction performance 105 . A promising approach is to learn accurate representations that are similar for different modalities (for example, a picture of an apple should be represented similarly to the word ‘apple’). In early 2021, OpenAI released an architecture termed Contrastive Language Image Pretraining (CLIP), which, when trained on millions of image–text pairs, matched the performance of competitive, fully supervised models without fine-tuning 106 . CLIP was inspired by a similar approach developed in the medical imaging domain termed Contrastive Visual Representation Learning from Text (ConVIRT) 107 . With ConVIRT, an image encoder and a text encoder are trained to generate image and text representations by maximizing the similarity of correctly paired image and text examples and minimizing the similarity of incorrectly paired examples—this is called contrastive learning. This approach for paired image–text co-learning has been used recently to learn from chest X-rays and their associated text reports, outperforming other self-supervised and fully supervised methods 108 . Other architectures have also been developed to integrate multimodal data from images, audio and text, such as the Video-Audio-Text Transformer, which uses videos to obtain paired multimodal image, text and audio and to train accurate multimodal representations able to generalize with good performance on many tasks—such as recognizing actions in videos, classifying audio events, classifying images, and selecting the most adequate video for an input text 109 .

Another desirable feature for multimodal learning frameworks is the ability to learn from different modalities without the need for different model architectures. Ideally, a unified multimodal model would incorporate different types of data (images, physiological sensor data and structured and unstructured text data, among others), codify concepts contained in these different types of data in a flexible and sparse way (that is, a unique task activates only a small part of the network, with the model learning which parts of the network should handle each unique task) 110 , produce aligned representations for similar concepts across modalities (for example, the picture of a dog, and the word ‘dog’ should produce similar internal representations), and provide any arbitrary type of output as required by the task 111 .

In the last few years, there has been a transition from architectures with strong modality-specific biases—such as convolutional neural networks for images, or recurrent neural networks for text and physiological signals—to a relatively novel architecture called the Transformer, which has demonstrated good performance across a wide variety of input and output modalities and tasks 112 . The key strategy behind transformers is to allow neural networks—which are artificial learning models that loosely mimic the behavior of the human brain—to dynamically pay attention to different parts of the input when processing and ultimately making decisions. Originally proposed for natural language processing, thus providing a way to capture the context of each word by attending to other words of the input sentence, this architecture has been successfully extended to other modalities 113 .

While each input token (that is, the smallest unit for processing) in natural language processing corresponds to a specific word, other modalities have generally used segments of images or video clips as tokens 114 . Transformer architectures allow us to unify the framework for learning across modalities but may still need modality-specific tokenization and encoding. A recent study by Meta AI (Meta Platforms) proposed a unified framework for self-supervised learning that is independent of the modality of interest, but still requires modality-specific preprocessing and training 115 . Benchmarks for self-supervised multimodal learning allow us to measure the progress of methods across modalities: for instance, the Domain-Agnostic Benchmark for Self-supervised learning (DABS) is a recently proposed benchmark that includes chest X-rays, sensor data and natural image and text data 116 .

Recent advances proposed by DeepMind (Alphabet), including Perceiver 117 and Perceiver IO 118 , propose a framework for learning across modalities with the same backbone architecture. Importantly, the input to the Perceiver architectures are modality-agnostic byte arrays, which are condensed through an attention bottleneck (that is, an architecture feature that restricts the flow of information, forcing models to condense the most relevant) to avoid size-dependent large memory costs (Fig. 2a ). After processing these inputs, the Perceiver can then feed the representations to a final classification layer to obtain the probability of each output category, while the Perceiver IO can decode these representations directly into arbitrary outputs such as pixels, raw audio and classification labels, through a query vector that specifies the task of interest; for example, the model could output the predicted imaging appearance of an evolving brain tumor, in addition to the probability of successful treatment response.

figure 2

a , Simplified schematic of the Perceiver-like architecture: images, text and other inputs are converted agnostically into byte arrays that are concatenated (that is, fused) and passed through cross-attention mechanisms (that is, a mechanism to project or condense information into a fixed-dimensional representation) to feed information into the network. b , Simplified illustration of the conceptual framework behind the multimodal multitask architectures (for example, Gato), within a hypothetical medical example: distinct input modalities ranging from images, text and actions are tokenized and fed to the network as input sequences, with masked shifted versions of these sequences fed as targets (that is, the network only sees information from previous time points to predict future actions, only previous words to predict the next or only the image to predict text); the network then learns to handle multiple modalities and tasks.

A promising aspect of transformers is the ability to learn meaningful representations with unlabeled data, which is paramount in biomedical AI given the limited and expensive resources needed to obtain high-quality labels. Many of the approaches mentioned above require aligned data from different modalities (for example, image–text pairs). A study from DeepMind, in fact, suggested that curating higher-quality image–text datasets may be more important than generating large single-modality datasets, and other aspects of algorithm development and training 119 . However, these data may not be readily available in the setting of biomedical AI. One possible solution to this problem is to leverage available data from one modality to help learning with another—a multimodal learning task termed ‘co-learning’ 105 . As an example, some studies suggest that transformers pretrained on unlabeled language data might be able to generalize well to a broad range of other tasks 120 . In medicine, a model architecture called ‘CycleGANs’, trained on unpaired contrast and non-contrast CT scans, has been used to generate synthetic non-contrast or contrast CT scans 121 , with this approach showing improvements, for instance, in COVID-19 diagnosis 122 . While promising, this approach has not been tested widely in the biomedical setting and requires further exploration.

Another important modeling challenge relates to the exceedingly high number of dimensions contained in multimodal health data, collectively termed ‘the curse of dimensionality’. As the number of dimensions (that is, variables or features contained in a dataset) increases, the number of people carrying some specific combinations of these features decreases (or for some combinations, even disappears), leading to ‘dataset blind spots’, that is, portions of the feature space (the set of all possible combinations of features or variables) that do not have any observation. These dataset blind spots can hurt model performance in terms of real-life prediction and should therefore be considered early in the model development and evaluation process 123 . Several strategies can be used to mitigate this issue, and have been described in detail elsewhere 123 . In brief, these include collecting data using maximum performance tasks (for example, rapid finger tapping for motor control, as opposed to passively collected data during everyday movement), ensuring large and diverse sample sizes (that is, with the conditions matching those expected at clinical deployment of the model), using domain knowledge to guide feature engineering and selection (with a focus on feature repeatability), appropriate model training and regularization, rigorous model validation and comprehensive model monitoring (including monitoring the difference between the distributions of training data and data found after deployment). Looking to the future, developing models able to incorporate previous knowledge (for example, known gene regulatory pathways and protein interactions) might be another promising approach to overcome the curse of dimensionality. Along these lines, recent studies demonstrated that models augmented by retrieving information from large databases outperform larger models trained on larger datasets, effectively leveraging available information and also providing added benefits such as interpretability 124 , 125 .

An increasingly used approach in multimodal learning is to combine the data from different modalities, as opposed to simply inputting several modalities separately into a model, to increase prediction performance—process termed ‘multimodal fusion’ 126 , 127 . Fusion of different data modalities can be performed at different stages of the process. The simplest approach involves concatenating input modalities or features before any processing (early fusion). While simple, this approach is not suitable for many complex data modalities. A more sophisticated approach is to combine and co-learn representations of these different modalities during the training process (joint fusion), allowing for modality-specific preprocessing while still capturing the interaction between data modalities. Finally, an alternative approach is to train separate models for each modality and combine the output probabilities (late fusion), a simple and robust approach, but at the cost of missing any information that could be abstracted from the interaction between modalities. Early work on fusion focused on allowing time-series models to leverage information from structured covariates for tasks such as forecasting osteoarthritis progression and predicting surgical outcomes in patients with cerebral palsy 128 . As another example of fusion, a group from DeepMind used a high-dimensional EHR-based dataset comprising 620,000 dimensions that were projected into a continuous embedding space with only 800 dimensions, capturing a wide array of information in a 6-h time frame for each patient, and built a recurrent neural network to predict acute kidney injury over time 129 . A lot of studies have used fusion of two modalities (bimodal fusion) to improve predictive performance. Imaging and EHR-based data have been fused to improve detection of pulmonary embolism, outperforming single-modality models 99 . Another bimodal study fused imaging features from chest X-rays with clinical covariates, improving the diagnosis of tuberculosis in individuals with HIV 130 . Optical coherence tomography and infrared reflectance optic disc imaging have been combined to better predict visual field maps compared to using either of those modalities alone 131 .

Multimodal fusion is a general concept that can be tackled using any architectural choice. Although not biomedical, we can learn from some AI imaging work; modern guided image generation models such as DALL-E 132 and GLIDE 133 often concatenate information from different modalities into the same encoder. This approach has demonstrated success in a recent study conducted by DeepMind (using Gato, a generalist agent) showing that concatenating a wide variety of tokens created from text, images and button presses, among others, can be used to teach a model to perform several distinct tasks ranging from captioning images and playing Atari games to stacking blocks with a robot arm (Fig. 2b ) 134 . Importantly, a recent study titled Align Before Fuse suggested that aligning representations across modalities before fusing them might result in better performance in downstream tasks, such as for creating text captions for images 135 . A recent study from Google Research proposed using attention bottlenecks for multimodal fusion, thereby restricting the flow of cross-modality information to force models to share the most relevant information across modalities and hence improving computational performance 136 .

Another paradigm of using two modalities together is to ‘translate’ from one to the other. In many cases, one data modality may be strongly associated with clinical outcomes but be less affordable, accessible or require specialized equipment or invasive procedures. Deep learning-enabled computer vision has been shown to capture information typically requiring a higher-fidelity modality for human interpretation. As an example, one study developed a convolutional neural network that uses echocardiogram videos to predict laboratory values of interest such as cardiac biomarkers (troponin I and brain natriuretic peptide) and other commonly obtained biomarkers, and found that predictions from the model were accurate, with some of them even having more prognostic performance for heart failure admissions than conventional laboratory testing 137 . Deep learning has also been widely studied in cancer pathology to make predictions beyond typical pathologist interpretation tasks with H&E stains, with several applications including prediction of genotype and gene expression, response to treatment and survival using only pathology images as inputs 138 .

Many other important challenges relating to multimodal model architectures remain. For some modalities (for example, three-dimensional imaging), even models using only a single time point require large computing capabilities, and the prospect of implementing a model that also processes large-scale omics or text data represents an important infrastructural challenge.

While multimodal learning has improved at an accelerated rate for the past few years, we expect that current methods are unlikely to be sufficient to overcome all the major challenges mentioned above. Therefore, further innovation will be required to fully enable effective, multimodal AI models.

Data challenges

The multidimensional data underpinning health leads to a broad range of challenges in terms of collecting, linking and annotating these data. Medical datasets can be described along several axes 139 , including the sample size, depth of phenotyping, the length and intervals of follow-up, the degree of interaction between participants, the heterogeneity and diversity of the participants, the level of standardization and harmonization of the data and the amount of linkage between data sources. While science and technology have advanced remarkably to facilitate data collection and phenotyping, there are inevitable trade-offs among these features of biomedical datasets. For example, although large sample sizes (in the range of hundreds of thousands to millions) are desirable in most cases for the training of AI models (especially multimodal AI models), the costs of achieving deep phenotyping and good longitudinal follow-up scales rapidly with larger numbers of participants, becoming financially unsustainable unless automated methods of data collection are put in place.

There are large-scale efforts to provide meaningful harmonization to biomedical datasets, such as the Observational Medical Outcomes Partnership Common Data Model developed by the Observational Health Data Sciences and Informatics collaboration 140 . Harmonization enormously facilitates research efforts and enhances reproducibility and translation into clinical practice. However, harmonization may obscure some relevant pathophysiological processes underlying certain diseases. As an example, ischemic stroke subtypes tend not to be accurately captured by existing ontologies 141 , but utilizing raw data from EHRs or radiology reports could allow for the use of natural language processing for phenotyping 142 . Similarly, the Diagnostic and Statistical Manual of Mental Disorders categorizes diagnoses based on clinical manifestations, which might not fully represent underlying pathophysiological processes 143 .

Achieving diversity across race/ethnicity, ancestry, income level, education level, healthcare access, age, disability status, geographic locations, gender and sexual orientation has proven difficult in practice. Genomics research is a prominent example, with the vast majority of studies focusing on individuals from European ancestry 144 . However, diversity of biomedical datasets is paramount as it constitutes the first step to ensure generalizability to the broader population 145 . Beyond these considerations, a required step for multimodal AI is the appropriate linking of all data types available in the datasets, which represents another challenge owing to the increasing risk of identification of individuals and regulatory constraints 146 .

Another frequent problem with biomedical data is the usually high proportion of missing data. While simply excluding patients with missing data before training is an option in some cases, selection bias can arise when other factors influence missing data 147 , and it is often more appropriate to address these gaps with statistical tools, such as multiple imputation 148 . As a result, imputation is a pervasive preprocessing step in many biomedical scientific fields, ranging from genomics to clinical data. Imputation has remarkably improved the statistical power of genome-wide association studies to identify novel genetic risk loci, and is facilitated by large reference datasets with deep genotypic coverage such as 1000 Genomes 149 , the UK10K 150 , the Haplotype reference consortium 151 and, recently, TOPMed 89 . Beyond genomics, imputation has also demonstrated utility for other types of medical data 152 . Different strategies have been suggested to make fewer assumptions. These include carry-forward imputation, with imputed values flagged and information added on when they were last measured 153 , and more complex strategies such as capturing the presence of missing data and time intervals using learnable decay terms 154 .

The risk of incurring several biases is important when conducting studies that collect health data, and multiple approaches are necessary to monitor and mitigate these biases 155 . The risk of these biases is amplified when combining data from multiple sources, as the bias toward individuals more likely to consent to each data modality could be amplified when considering the intersection between these potentially biased populations. This complex and unsolved problem is more important in the setting of multimodal health data (compared to unimodal data) and would warrant its own in-depth review. Medical AI algorithms using demographic features such as race as inputs can learn to perpetuate historical human biases, thereby resulting in harm when deployed 156 . Importantly, recent work has demonstrated that AI models can identify such features solely from imaging data, which highlights the need for deliberate efforts to detect racial bias and equalize racial outcomes during data quality control and model development 157 . In particular, selection bias is a common type of bias in large biobank studies, and has been reported as a problem, for example, in the UK Biobank 158 . This problem has also been pervasive in the scientific literature regarding COVID-19 (ref. 159 ). For example, patients using allergy medications were more likely to be tested for COVID-19, which leads to an artificially lower rate of positive tests, and an apparent protective effect among those tested—probably due to selection bias 160 . Importantly, selection bias can result in AI models trained on a sample that differs considerably from the general population 161 , thus hurting these models at inference time 162 .

Privacy challenges

The successful development of multimodal AI in health requires breadth and depth of data, which encompasses higher privacy challenges than single-modality AI models. For example, previous studies have demonstrated that by utilizing only a little background information about participants, an adversary could re-identify those in large datasets (for example, the Netflix prize dataset), uncovering sensitive information about the individuals 163 .

In the USA, the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule is the fundamental legislation to protect privacy of health data. However, some types of health data—such as user-generated and de-identified health data—are not covered by this regulation, which poses a risk of reidentification by combining information from multiple sources. In contrast, the more recent General Data Protection Regulation (GDPR) from the European Union has a much broader scope regarding the definition of health data, and even goes beyond data protection to also require the release of information about automated decision-making using these data 164 .

Given the challenges, multiple technical solutions have been proposed and explored to ensure security and privacy while training multimodal AI models, including differential privacy, federated learning, homomorphic encryption and swarm learning 165 , 166 . Differential privacy proposes a systematic random perturbation of the data with the ultimate goal of obscuring individual-level information while maintaining the global distribution of the dataset 167 . As expected, this approach constitutes a trade-off between the level of privacy obtained and the expected performance of the models. Federated learning, on the other hand, allows several individuals or health systems to collectively train a model without transferring raw data. In this approach, a trusted central server distributes a model to each of the individuals/organizations; each individual or organization then trains the model for a certain number of iterations and shares the model updates back to the trusted central server 165 . Finally, the trusted central server aggregates the model updates from all individuals/organizations and starts another round. Federated multimodal learning has been implemented in a multi-institutional collaboration for predicting clinical outcomes in people with COVID-19 (ref. 168 ). Homomorphic encryption is a cryptographic technique that allows mathematical operations on encrypted input data, therefore providing the possibility of sharing model weights without leaking information 169 . Finally, swarm learning is a relatively novel approach that, similarly to federated learning, is also based on several individuals or organizations training a model on local data, but does not require a trusted central server because it replaces it with the use of blockchain smart contracts 170 .

Importantly, these approaches are often complementary and they can and should be used together. A recent study demonstrated the potential of coupling federated learning with homomorphic encryption to train a model to predict a COVID-19 diagnosis from chest CT scans, with the aggregate model outperforming all of the locally trained models 122 . While these methods are promising, multimodal health data are usually spread across several distinct organizations, ranging from healthcare institutions and academic centers to pharmaceutical companies. Therefore, the development of new methods to incentivize data sharing across sectors while preserving patient privacy is crucial.

An additional layer of safety can be obtained by leveraging novel developments in edge computing 171 . Edge computing, as opposed to cloud computing, refers to the idea of bringing computation closer to the sources of data (for example, close to ambient sensors or wearable devices). In combination with other methods such as federated learning, edge computing provides more security by avoiding the transmission of sensitive data to centralized servers. Furthermore, edge computing provides other benefits, such as reducing storage costs, latency and bandwidth usage. For example, some X-ray systems now run optimized versions of deep learning models directly in their hardware, instead of transferring images to cloud servers for identification of life-threatening conditions 172 .

As a result of the expanding healthcare AI market, biomedical data are increasingly valuable, leading to another challenge pertaining to data ownership. To date, this constitutes an open issue of debate. Some voices advocate for private patient ownership of the data, arguing that this approach would ensure the patients’ right to self-determination, support health data transactions and maximize patients’ benefit from data markets; while others suggest a non-property, regulatory model would better protect secure and transparent data use 173 , 174 . Independent of the framework, appropriate incentives should be put in place to facilitate data sharing while ensuring security and privacy 175 , 176 .

Multimodal medical AI unlocks key applications in healthcare and many other opportunities exist beyond those described here. The field of drug discovery is a pertinent example, with many tasks that could leverage multidimensional data including target identification and validation, prediction of drug interactions and prediction of side effects 177 . While we addressed many important challenges to the use of multimodal AI, others that were outside the scope of this review are just as important, including the potential for false positives and how clinicians should interpret and explain the risks to patients.

With the ability to capture multidimensional biomedical data, we confront the challenge of deep phenotyping—understanding each individual’s uniqueness. Collaboration across industries and sectors is needed to collect and link large and diverse multimodal health data (Box 1 ). Yet, as this juncture, we are far better at collating and storing such data, than we are at data analysis. To meaningfully process such high-dimensional data and actualize the many exciting use cases, it will take a concentrated joint effort of the medical community and AI researchers to build and validate new models, and ultimately demonstrate their utility to improve health outcomes.

Box 1 Priorities for future development of multimodal biomedical AI

Discover and formulate key medical AI tasks for which multimodal data will add value over single modalities.

Develop approaches that can pretrain models using large amounts of unlabeled data across modalities and only require fine-tuning on limited labeled data.

Benchmark the effect of model architectures and multimodal approaches when working with previously underexplored high-dimensional data, such as omics data.

Collect paired (for example, image–text) multimodal data that could be used to train and test the generalizability of multimodal medical AI algorithms.

Esteva, A. et al. A guide to deep learning in healthcare. Nat. Med. 25 , 24–29 (2019).

Article   CAS   PubMed   Google Scholar  

Esteva, A. et al. Deep learning-enabled medical computer vision. NPJ Digit. Med. 4 , 5 (2021).

Article   PubMed   PubMed Central   Google Scholar  

Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. J. AI in health and medicine. Nat. Med. 28 , 31–38 (2022).

Karczewski, K. J. & Snyder, M. P. Integrative omics for health and disease. Nat. Rev. Genet. 19 , 299–310 (2018).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Sidransky, D. Emerging molecular markers of cancer. Nat. Rev. Cancer 2 , 210–219 (2002).

Parsons, D. W. et al. An integrated genomic analysis of human glioblastoma multiforme. Science 321 , 1807–1812 (2008).

Food and Drug Administration. List of cleared or approved companion diagnostic devices (in vitro and imaging tools) https://www.fda.gov/medical-devices/in-vitro-diagnostics/list-cleared-or-approved-companion-diagnostic-devices-in-vitro-and-imaging-tools (2021).

Food and Drug Administration. Nucleic acid-based tests https://www.fda.gov/medical-devices/in-vitro-diagnostics/nucleic-acid-based-tests (2020).

Foundation Medicine. Why comprehensive genomic profiling? https://www.foundationmedicine.com/resource/why-comprehensive-genomic-profiling (2018).

Oncotype IQ. Oncotype MAP pan-cancer tissue test https://www.oncotypeiq.com/en-US/pan-cancer/healthcare-professionals/oncotype-map-pan-cancer-tissue-test/about-the-test-oncology (2020).

Heitzer, E., Haque, I. S., Roberts, C. E. S. & Speicher, M. R. Current and future perspectives of liquid biopsies in genomics-driven oncology. Nat. Rev. Genet. 20 , 71–88 (2018).

Article   Google Scholar  

Uffelmann, E. et al. Genome-wide association studies. Nat. Rev. Methods Primers 1 , 1–21 (2021).

Watanabe, K. et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat. Genet. 51 , 1339–1348 (2019).

Choi, S. W., Mak, T. S. -H. & O’Reilly, P. F. Tutorial: a guide to performing polygenic risk score analyses. Nat. Protoc. 15 , 2759–2772 (2020).

Damask, A. et al. Patients with high genome-wide polygenic risk scores for coronary artery disease may receive greater clinical benefit from alirocumab treatment in the ODYSSEY OUTCOMES trial. Circulation 141 , 624–636 (2020).

Article   PubMed   Google Scholar  

Marston, N. A. et al. Predicting benefit from evolocumab therapy in patients with atherosclerotic disease using a genetic risk score: results from the FOURIER trial. Circulation 141 , 616–623 (2020).

Duan, R. et al. Evaluation and comparison of multi-omics data integration methods for cancer subtyping. PLoS Comput. Biol. 17 , e1009224 (2021).

Kang, M., Ko, E. & Mersha, T. B. A roadmap for multi-omics data integration using deep learning. Brief. Bioinform . 23 , bbab454 (2022).

Wang, T. et al. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat. Commun. 12 , 3445 (2021).

Zhang, X.-M., Liang, L., Liu, L. & Tang, M.-J. Graph neural networks and their current applications in bioinformatics. Front. Genet. 12 , 690049 (2021).

Moon, K. R. et al. Visualizing structure and transitions in high-dimensional biological data. Nat. Biotechnol. 37 , 1482–1492 (2019).

Kuchroo, M. et al. Multiscale PHATE identifies multimodal signatures of COVID-19. Nat. Biotechnol . https://doi.org/10.1038/s41587-021-01186-x (2022).

Boehm, K. M., Khosravi, P., Vanguri, R., Gao, J. & Shah, S. P. Harnessing multimodal data integration to advance precision oncology. Nat. Rev. Cancer 22 , 114–126 (2021).

Marx, V. Method of the year: spatially resolved transcriptomics. Nat. Methods 18 , 9–14 (2021).

He, B. et al. Integrating spatial gene expression and breast tumour morphology via deep learning. Nat. Biomed. Eng. 4 , 827–834 (2020).

Bergenstråhle, L. et al. Super-resolved spatial transcriptomics by deep data fusion. Nat. Biotechnol . https://doi.org/10.1038/s41587-021-01075-3 (2021).

Janssens, A. C. J. W. Validity of polygenic risk scores: are we measuring what we think we are? Hum. Mol. Genet 28 , R143–R150 (2019).

Kellogg, R. A., Dunn, J. & Snyder, M. P. Personal omics for precision health. Circ. Res. 122 , 1169–1171 (2018).

Owen, M. J. et al. Rapid sequencing-based diagnosis of thiamine metabolism dysfunction syndrome. N. Engl. J. Med. 384 , 2159–2161 (2021).

Moore, T. J., Zhang, H., Anderson, G. & Alexander, G. C. Estimated costs of pivotal trials for novel therapeutic agents approved by the US food and drug administration, 2015–2016. JAMA Intern. Med. 178 , 1451–1457 (2018).

Sertkaya, A., Wong, H. -H., Jessup, A. & Beleche, T. Key cost drivers of pharmaceutical clinical trials in the United States. Clin. Trials 13 , 117–126 (2016).

Loree, J. M. et al. Disparity of race reporting and representation in clinical trials leading to cancer drug approvals from 2008 to 2018. JAMA Oncol. 5 , e191870 (2019).

Steinhubl, S. R., Wolff-Hughes, D. L., Nilsen, W., Iturriaga, E. & Califf, R. M. Digital clinical trials: creating a vision for the future. NPJ Digit. Med. 2 , 126 (2019).

Inan, O. T. et al. Digitizing clinical trials. NPJ Digit. Med. 3 , 101 (2020).

Dunn, J. et al. Wearable sensors enable personalized predictions of clinical laboratory measurements. Nat. Med. 27 , 1105–1112 (2021).

Marra, C., Chen, J. L., Coravos, A. & Stern, A. D. Quantifying the use of connected digital products in clinical research. NPJ Digit. Med . 3 , 50 (2020).

Steinhubl, S. R. et al. Effect of a home-based wearable continuous ECG monitoring patch on detection of undiagnosed atrial fibrillation: the mSToPS randomized clinical trial. JAMA 320 , 146–155 (2018).

Pandit, J. A., Radin, J. M., Quer, G. & Topol, E. J. Smartphone apps in the COVID-19 pandemic. Nat. Biotechnol . 40 , 1013–1022 (2022).

Pallmann, P. et al. Adaptive designs in clinical trials: why use them, and how to run and report them. BMC Med. 16 , 29 (2018).

Klarin, D. & Natarajan, P. Clinical utility of polygenic risk scores for coronary artery disease. Nat. Rev. Cardiol . https://doi.org/10.1038/s41569-021-00638-w (2021).

Lim, B., Arık, S. Ö., Loeff, N. & Pfister, T. Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int. J. Forecast. 37 , 1748–1764 (2021).

Zhang, X., Zeman, M., Tsiligkaridis, T. & Zitnik, M. Graph-guided network for irregularly sampled multivariate time series. In International Conference on Learning Representation (ICLR, 2022).

Thorlund, K., Dron, L., Park, J. J. H. & Mills, E. J. Synthetic and external controls in clinical trials—a primer for researchers. Clin. Epidemiol. 12 , 457–467 (2020).

Food and Drug Administration. FDA approves first treatment for a form of Batten disease https://www.fda.gov/news-events/press-announcements/fda-approves-first-treatment-form-batten-disease#:~:text=The%20U.S.%20Food%20and%20Drug,specific%20form%20of%20Batten%20disease (2017).

Food and Drug Administration. Real-world evidence https://www.fda.gov/science-research/science-and-research-special-topics/real-world-evidence (2022).

AbbVie. Synthetic control arm: the end of placebos? https://stories.abbvie.com/stories/synthetic-control-arm-end-placebos.htm (2019).

Unlearn.AI. Generating synthetic control subjects using machine learning for clinical trials in Alzheimer’s disease (DIA 2019) https://www.unlearn.ai/post/generating-synthetic-control-subjects-alzheimers (2019).

Noah, B. et al. Impact of remote patient monitoring on clinical outcomes: an updated meta-analysis of randomized controlled trials. NPJ Digit. Med . 1 , 20172 (2018).

Strain, T. et al. Wearable-device-measured physical activity and future health risk. Nat. Med. 26 , 1385–1391 (2020).

Iqbal, S. M. A., Mahgoub, I., Du, E., Leavitt, M. A. & Asghar, W. Advances in healthcare wearable devices. NPJ Flex. Electron. 5 , 9 (2021).

Mandel, J. C., Kreda, D. A., Mandl, K. D., Kohane, I. S. & Ramoni, R. B. SMART on FHIR: a standards-based, interoperable apps platform for electronic health records. J. Am. Med. Inform. Assoc. 23 , 899–908 (2016).

Haque, A., Milstein, A. & Fei-Fei, L. Illuminating the dark spaces of healthcare with ambient intelligence. Nature 585 , 193–202 (2020).

Kwolek, B. & Kepski, M. Human fall detection on embedded platform using depth maps and wireless accelerometer. Comput. Methods Prog. Biomed. 117 , 489–501 (2014).

Wang, C. et al. Multimodal gait analysis based on wearable inertial and microphone sensors. In 2017 IEEE SmartWorld, Ubiquitous Intelligence Computing, Advanced Trusted Computed, Scalable Computing Communications, Cloud Big Data Computing , Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) 1–8 (2017).

Luo, Z. et al. Computer vision-based descriptive analytics of seniors’ daily activities for long-term health monitoring. In Proc. Machine Learning Research Vol. 85, 1–18 (PMLR, 2018).

Coffey, J. D. et al. Implementation of a multisite, interdisciplinary remote patient monitoring program for ambulatory management of patients with COVID-19. NPJ Digit. Med. 4 , 123 (2021).

Whitelaw, S., Mamas, M. A., Topol, E. & Van Spall, H. G. C. Applications of digital technology in COVID-19 pandemic planning and response. Lancet Digit. Health 2 , e435–e440 (2020).

Wu, J. T., Leung, K. & Leung, G. M. Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study. Lancet 395 , 689–697 (2020).

Jason Wang, C., Ng, C. Y. & Brook, R. H. Response to COVID-19 in Taiwan: big data analytics, new technology, and proactive testing. JAMA 323 , 1341–1342 (2020).

Radin, J. M., Wineinger, N. E., Topol, E. J. & Steinhubl, S. R. Harnessing wearable device data to improve state-level real-time surveillance of influenza-like illness in the USA: a population-based study. Lancet Digit. Health 2 , e85–e93 (2020).

Quer, G. et al. Wearable sensor data and self-reported symptoms for COVID-19 detection. Nat. Med. 27 , 73–77 (2020).

Syrowatka, A. et al. Leveraging artificial intelligence for pandemic preparedness and response: a scoping review to identify key use cases. NPJ Digit. Med. 4 , 96 (2021).

Varghese, E. B. & Thampi, S. M. A multimodal deep fusion graph framework to detect social distancing violations and FCGs in pandemic surveillance. Eng. Appl. Artif. Intell. 103 , 104305 (2021).

San, O. The digital twin revolution. Nat. Comput. Sci. 1 , 307–308 (2021).

Björnsson, B. et al. Digital twins to personalize medicine. Genome Med. 12 , 4 (2019).

Kamel Boulos, M. N. & Zhang, P. Digital twins: from personalised medicine to precision public health. J. Pers. Med 11 , 745 (2021).

Hernandez-Boussard, T. et al. Digital twins for predictive oncology will be a paradigm shift for precision cancer care. Nat. Med. 27 , 2065–2066 (2021).

Coorey, G., Figtree, G. A., Fletcher, D. F. & Redfern, J. The health digital twin: advancing precision cardiovascular medicine. Nat. Rev. Cardiol. 18 , 803–804 (2021).

Masison, J. et al. A modular computational framework for medical digital twins. Proc. Natl Acad. Sci. USA 118 , e2024287118 (2021).

Fisher, C. K., Smith, A. M. & Walsh, J. R. Machine learning for comprehensive forecasting of Alzheimer’s disease progression. Sci. Rep. 9 , 13622 (2019).

Walsh, J. R. et al. Generating digital twins with multiple sclerosis using probabilistic neural networks. Preprint at https://arxiv.org/abs/2002.02779 (2020).

Swedish Digital Twin Consortium. https://www.sdtc.se/ (accessed 1 February 2022).

Potter, D. et al. Development of CancerLinQ, a health information learning platform from multiple electronic health record systems to support improved quality of care. JCO Clin. Cancer Inform. 4 , 929–937 (2020).

Parmar, P., Ryu, J., Pandya, S., Sedoc, J. & Agarwal, S. Health-focused conversational agents in person-centered care: a review of apps. NPJ Digit. Med. 5 , 21 (2022).

Dixon, R. F. et al. A virtual type 2 diabetes clinic using continuous glucose monitoring and endocrinology visits. J. Diabetes Sci. Technol. 14 , 908–911 (2020).

Claxton, S. et al. Identifying acute exacerbations of chronic obstructive pulmonary disease using patient-reported symptoms and cough feature analysis. NPJ Digit. Med. 4 , 107 (2021).

Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25 , 44–56 (2019).

Patel, M. S., Volpp, K. G. & Asch, D. A. Nudge units to improve the delivery of health care. N. Engl. J. Med. 378 , 214–216 (2018).

Roller, S. et al. Recipes for building an open-domain Chatbot. In Proc. 16th Conference of the European Chapter of the Association for Computational Linguistics 300–325 (Association for Computational Linguistics, 2021).

Chen, J. H. & Asch, S. M. Machine learning and prediction in medicine - beyond the peak of inflated expectations. N. Engl. J. Med. 376 , 2507–2509 (2017).

Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562 , 203–209 (2018).

Woodfield, R., Grant, I., UK Biobank Stroke Outcomes Group, UK Biobank Follow-Up and Outcomes Working Group & Sudlow, C. L. M. Accuracy of electronic health record data for identifying stroke cases in large-scale epidemiological studies: a systematic review from the UK biobank stroke outcomes group. PLoS ONE 10 , e0140533 (2015).

Szustakowski, J. et al. Advancing human genetics research and drug discovery through exome sequencing of the UK Biobank. Nat. Genet. 53 , 942–948 (2021).

Halldorsson, B. V. et al. The sequences of 150,119 genomes in the UK Biobank. Nature 607 , 732–740 (2022).

\Littlejohns, T. J. et al. The UK Biobank imaging enhancement of 100,000 participants: rationale, data collection, management and future directions. Nat. Commun . 11 , 2624 (2020).

Chen, Z. et al. China Kadoorie Biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-up. Int. J. Epidemiol. 40 , 1652–1666 (2011).

Nagai, A. et al. Overview of the BioBank Japan Project: study design and profile. J. Epidemiol. 27 , S2–S8 (2017).

Gaziano, J. M. et al. Million Veteran Program: a mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 70 , 214–223 (2016).

Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590 , 290–299 (2021).

All of Us Research Program Investigators. et al. The ‘All of Us’ Research Program. N. Engl. J. Med. 381 , 668–676 (2019).

Mapes, B. M. et al. Diversity and inclusion for the All of Us research program: a scoping review. PLoS ONE 15 , e0234962 (2020).

Kaushal, A., Altman, R. & Langlotz, C. Geographic distribution of US cohorts used to train deep learning algorithms. JAMA 324 , 1212–1213 (2020).

Arges, K. et al. The Project Baseline Health Study: a step towards a broader mission to map human health. NPJ Digit. Med . 3 , 84 (2020).

McDonald, D. et al. American Gut: an open platform for citizen science microbiome research. mSystems 3 , e00031–18 (2018).

Johnson, A. E. W. et al. MIMIC-III, a freely accessible critical care database. Sci. Data 3 , 160035 (2016).

Johnson, A. E. W. et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6 , 317 (2019).

Deasy, J., Liò, P. & Ercole, A. Dynamic survival prediction in intensive care units from heterogeneous time series without the need for variable selection or curation. Sci. Rep . 10 , 22129 (2020).

Barbieri, S. et al. Benchmarking deep learning architectures for predicting readmission to the ICU and describing patients-at-risk. Sci. Rep. 10 , 1111 (2020).

Huang, S.-C., Pareek, A., Zamanian, R., Banerjee, I. & Lungren, M. P. Multimodal fusion with deep neural networks for leveraging CT imaging and electronic health record: a case-study in pulmonary embolism detection. Sci. Rep. 10 , 22147 (2020).

Jabbour, S., Fouhey, D., Kazerooni, E., Wiens, J. & Sjoding, M. W. Combining chest X-rays and electronic health record data using machine learning to diagnose acute respiratory failure. J. Am. Med. Inform. Assoc. 29 , 1060–1068 (2022).

Golbus, J. R., Pescatore, N. A., Nallamothu, B. K., Shah, N. & Kheterpal, S. Wearable device signals and home blood pressure data across age, sex, race, ethnicity, and clinical phenotypes in the Michigan Predictive Activity & Clinical Trajectories in Health (MIPACT) study: a prospective, community-based observational study. Lancet Digit. Health 3 , e707–e715 (2021).

Addington, J. et al. North American Prodrome Longitudinal Study (NAPLS 2): overview and recruitment. Schizophr. Res. 142 , 77–82 (2012).

Perkins, D. O. et al. Towards a psychosis risk blood diagnostic for persons experiencing high-risk symptoms: preliminary results from the NAPLS project. Schizophr. Bull. 41 , 419–428 (2015).

Koutsouleris, N. et al. Multimodal machine learning workflows for prediction of psychosis in patients with clinical high-risk syndromes and recent-onset depression. JAMA Psychiatry 78 , 195–209 (2021).

Baltrusaitis, T., Ahuja, C. & Morency, L.-P. Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41 , 423–443 (2019).

Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning (eds. Meila, M. & Zhang, T.) vol. 139, 8748–8763 (PMLR, 18–24 July 2021).

Zhang, Y., Jiang, H., Miura, Y., Manning, C. D. & Langlotz, C. P. Contrastive learning of medical visual representations from paired images and text. Preprint at https://arxiv.org/abs/2010.00747 (2020).

Zhou, H. -Y. et al. Generalized radiograph representation learning via cross-supervision between images and free-text radiology reports. Nat. Mach. Intell. 4 , 32–40 (2022).

Akbari, H. et al. VATT: transformers for multimodal self-supervised learning from raw video, audio and text. In Advances in Neural Information Processing Systems (eds. Ranzato, M. et al.) vol. 34, 24206–24221 (Curran Associates, Inc., 2021).

Bao, H. et al. VLMo: unified vision-language pre-training with mixture-of-modality-experts. Preprint at https://arxiv.org/abs/2111.02358 (2022).

Dean, J. Introducing Pathways: a next-generation AI architecture https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/ (10 November 2021).

Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems (eds. Guyon, I. et al.) vol. 30 (Curran Associates, Inc., 2017).

Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. In International Conference on Learning Representations (ICLR, 2021).

Li et al. Oscar: Object-semantics aligned pre-training for vision-language tasks. Preprint at https://doi.org/10.48550/arXiv.2004.06165 (2020).

Baevski, A. et al. data2vec: a general framework for self-supervised learning in speech, vision and language. Preprint at https://arxiv.org/abs/2202.03555 (2022).

Tamkin, A. et al. DABS: a Domain-Agnostic Benchmark for Self-Supervised Learning. In 35th Conf.Neural Information Processing Systems Datasets and Benchmarks Track (2021).

Jaegle, A. et al. Perceiver: general perception with iterative attention. In Proc. 38th International Conference on Machine Learning (eds. Meila, M. & Zhang, T.) vol. 139, 4651–4664 (PMLR, 18–24 July 2021).

Jaegle, A. et al. Perceiver IO: a general architecture for structured inputs & outputs. In International Conference on Learning Representations (ICLR, 2022).

Hendricks, L. A., Mellor, J., Schneider, R., Alayrac, J.-B. & Nematzadeh, A. Decoupling the role of data, attention, and losses in multimodal transformers. Trans. Assoc. Comput. Linguist. 9 , 570–585 (2021).

Lu, K., Grover, A., Abbeel, P. & Mordatch, I. Pretrained transformers as universal computation engines. Preprint at https://arxiv.org/abs/2103.05247 (2021).

Sandfort, V., Yan, K., Pickhardt, P. J. & Summers, R. M. Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks. Sci. Rep. 9 , 16884 (2019).

Bai, X. et al. Advancing COVID-19 diagnosis with privacy-preserving collaboration in artificial intelligence. Nat. Mach. Intell. 3 , 1081–1089 (2021).

Berisha, V. et al. Digital medicine and the curse of dimensionality. NPJ Digit. Med. 4 , 153 (2021).

Guu, K., Lee, K., Tung, Z., Pasupat, P. & Chang, M. Retrieval augmented language model pre-training. In Proc. 37th International Conference on Machine Learning (eds. Iii, H. D. & Singh, A.) vol. 119, 3929–3938 (PMLR, 13–18 July 2020).

Borgeaud, S. et al. Improving language models by retrieving from trillions of tokens. In Proc. 39th International Conference on Machine Learning (eds. Chaudhuri, K. et al.) vol. 162, 2206–2240 (PMLR, 17–23 July 2022).

Huang, S. -C., Pareek, A., Seyyedi, S., Banerjee, I. & Lungren, M. P. Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. NPJ Digit. Med. 3 , 136 (2020).

Muhammad, G. et al. A comprehensive survey on multimodal medical signals fusion for smart healthcare systems. Inf. Fusion 76 , 355–375 (2021).

Fiterau, M. et al. ShortFuse: Biomedical time series representations in the presence of structured information. In Proc. 2nd Machine Learning for Healthcare Conference (eds. Doshi-Velez, F. et al.) vol. 68, 59–74 (PMLR, 18–19 August 2017).

Tomašev, N. et al. A clinically applicable approach to continuous prediction of future acute kidney injury. Nature 572 , 116–119 (2019).

Rajpurkar, P. et al. CheXaid: deep learning assistance for physician diagnosis of tuberculosis using chest X-rays in patients with HIV. NPJ Digit. Med. 3 , 115 (2020).

Kihara, Y. et al. Policy-driven, multimodal deep learning for predicting visual fields from the optic disc and optical coherence tomography imaging. Ophthalmology https://doi.org/10.1016/j.ophtha.2022.02.017 (2022).

Ramesh, A. et al. Zero-shot text-to-image generation. In Proc. 38th International Conference on Machine Learning (eds. Meila, M. & Zhang, T.) vol. 139, 8821–8831 (PMLR, 18–24 July 2021).

Nichol, A. Q. et al. GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. In Proc. 39th International Conference on Machine Learning (eds. Chaudhuri, K. et al.) vol. 162, 16784–16804 (PMLR, 17–23 July 2022).

Reed, S. et al. A generalist agent. Preprint at https://arxiv.org/abs/2205.06175 (2022).

Li, J. et al. Align before fuse: vision and language representation learning with momentum distillation. Preprint at https://arxiv.org/abs/2107.07651 (2021).

Nagrani, A. et al. Attention bottlenecks for multimodal fusion. In Advances in Neural Information Processing Systems (eds. Ranzato, M. et al.) vol. 34, 14200–14213 (Curran Associates, Inc., 2021).

Hughes, J. W. et al. Deep learning evaluation of biomarkers from echocardiogram videos. EBioMedicine 73 , 103613 (2021).

Echle, A. et al. Deep learning in cancer pathology: a new generation of clinical biomarkers. Br. J. Cancer 124 , 686–696 (2020).

Shilo, S., Rossman, H. & Segal, E. Axes of a revolution: challenges and promises of big data in healthcare. Nat. Med. 26 , 29–38 (2020).

Hripcsak, G. et al. Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Stud. Health Technol. Inform. 216 , 574–578 (2015).

PubMed   PubMed Central   Google Scholar  

Rannikmäe, K. et al. Accuracy of identifying incident stroke cases from linked health care data in UK Biobank. Neurology 95 , e697–e707 (2020).

Garg, R., Oh, E., Naidech, A., Kording, K. & Prabhakaran, S. Automating ischemic stroke subtype classification using machine learning and natural language processing. J. Stroke Cerebrovasc. Dis. 28 , 2045–2051 (2019).

Casey, B. J. et al. DSM-5 and RDoC: progress in psychiatry research? Nat. Rev. Neurosci. 14 , 810–814 (2013).

Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell 177 , 26–31 (2019).

Zou, J. & Schiebinger, L. Ensuring that biomedical AI benefits diverse populations. EBioMedicine 67 , 103358 (2021).

Rocher, L., Hendrickx, J. M. & de Montjoye, Y. -A. Estimating the success of re-identifications in incomplete datasets using generative models. Nat. Commun. 10 , 3069 (2019).

Haneuse, S., Arterburn, D. & Daniels, M. J. Assessing missing data assumptions in EHR-based studies: a complex and underappreciated task. JAMA Netw. Open 4 , e210184–e210184 (2021).

van Smeden, M., Penning de Vries, B. B. L., Nab, L. & Groenwold, R. H. H. Approaches to addressing missing values, measurement error, and confounding in epidemiologic studies. J. Clin. Epidemiol. 131 , 89–100 (2021).

1000 Genomes Project Consortium. et al. A global reference for human genetic variation. Nature 526 , 68–74 (2015).

UK10K Consortium. et al. The UK10K project identifies rare variants in health and disease. Nature 526 , 82–90 (2015).

McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48 , 1279–1283 (2016).

Li, J. et al. Imputation of missing values for electronic health record laboratory data. NPJ Digit. Med. 4 , 147 (2021).

Tang, S. et al. Democratizing EHR analyses with FIDDLE: a flexible data-driven preprocessing pipeline for structured clinical data. J. Am. Med. Inform. Assoc. 27 , 1921–1934 (2020).

Che, Z. et al. Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 8 , 6085 (2018).

Vokinger, K. N., Feuerriegel, S. & Kesselheim, A. S. Mitigating bias in machine learning for medicine. Commun. Med. 1 , 25 (2021).

Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366 , 447–453 (2019).

Gichoya, J. W. et al. AI recognition of patient race in medical imaging: a modelling study. Lancet Digit Health 4 , e406–e414 (2022).

Swanson, J. M. The UK Biobank and selection bias. Lancet 380 , 110 (2012).

Griffith, G. J. et al. Collider bias undermines our understanding of COVID-19 disease risk and severity. Nat. Commun. 11 , 5749 (2020).

Thompson, L. A. et al. The influence of selection bias on identifying an association between allergy medication use and SARS-CoV-2 infection. EClinicalMedicine 37 , 100936 (2021).

Fry, A. et al. Comparison of sociodemographic and health-related characteristics of UK biobank participants with those of the general population. Am. J. Epidemiol. 186 , 1026–1034 (2017).

Keyes, K. M. & Westreich, D. UK Biobank, big data, and the consequences of non-representativeness. Lancet 393 , 1297 (2019).

Narayanan, A. & Shmatikov, V. Robust de-anonymization of large sparse datasets. In IEEE Symposium on Security and Privacy 111–125 (2008).

Gerke, S., Minssen, T. & Cohen, G. Ethical and legal challenges of artificial intelligence-driven healthcare. Artif. Intelli. Health. 11326, 213–227(2020).

Kaissis, G. A., Makowski, M. R., Rückert, D. & Braren, R. F. Secure, privacy-preserving and federated machine learning in medical imaging. Nat. Mach. Intell. 2 , 305–311 (2020).

Rieke, N. et al. The future of digital health with federated learning. NPJ Digit. Med. 3 , 119 (2020).

Ziller, A. et al. Medical imaging deep learning with differential privacy. Sci. Rep. 11 , 13524 (2021).

Dayan, I. et al. Federated learning for predicting clinical outcomes in patients with COVID-19. Nat. Med. 27 , 1735–1743 (2021).

Wood, A., Najarian, K. & Kahrobaei, D. Homomorphic encryption for machine learning in medicine and bioinformatics. ACM Comput. Surv. 53 , 1–35 (2020).

Warnat-Herresthal, S. et al. Swarm learning for decentralized and confidential clinical machine learning. Nature 594 , 265–270 (2021).

Zhou, Z. et al. Edge intelligence: paving the last mile of artificial intelligence with edge computing. Proc. IEEE 107 , 1738–1762 (2019).

Intel. How edge computing is driving advancements in healthcare analytics; https://www.intel.com/content/www/us/en/healthcare-it/edge-analytics.html (11 March 2022.)

Ballantyne, A. How should we think about clinical data ownership? J. Med. Ethics 46 , 289–294 (2020).

Liddell, K., Simon, D. A. & Lucassen, A. Patient data ownership: who owns your health? J. Law Biosci. 8 , lsab023 (2021).

Bierer, B. E., Crosas, M. & Pierce, H. H. Data authorship as an incentive to data sharing. N. Engl. J. Med. 376 , 1684–1687 (2017).

Scheibner, J. et al. Revolutionizing medical data sharing using advanced privacy-enhancing technologies: technical, legal, and ethical synthesis. J. Med. Internet Res. 23 , e25120 (2021).

Vamathevan, J. et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 18 , 463–477 (2019).

Download references

Acknowledgements

We thank A. Tamkin for invaluable feedback. NIH grant UL1TR002550 (to E.J.T.) supported this work.

Author information

These authors jointly supervised this work: Pranav Rajpurkar, Eric J. Topol.

Authors and Affiliations

Department of Neurology, Yale School of Medicine, New Haven, CT, USA

Julián N. Acosta & Guido J. Falcone

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA

Pranav Rajpurkar

Scripps Research Translational Institute, Scripps Research, La Jolla, CA, USA

Eric J. Topol

You can also search for this author in PubMed   Google Scholar

Corresponding authors

Correspondence to Pranav Rajpurkar or Eric J. Topol .

Ethics declarations

Competing interests.

Since completing this Review, J.N.A. became an employee of Rad AI. All the other authors declare no competing interests.

Peer review

Peer review information.

Nature Medicine thanks Joseph Ledsam, Leo Anthony Celi and Jenna Wiens for their contribution to the peer review of this work. Primary Handling Editor: Karen O’Leary, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article.

Acosta, J.N., Falcone, G.J., Rajpurkar, P. et al. Multimodal biomedical AI. Nat Med 28 , 1773–1784 (2022). https://doi.org/10.1038/s41591-022-01981-2

Download citation

Received : 21 March 2022

Accepted : 01 August 2022

Published : 15 September 2022

Issue Date : September 2022

DOI : https://doi.org/10.1038/s41591-022-01981-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Pathogenomics for accurate diagnosis, treatment, prognosis of oncology: a cutting edge overview.

  • Xiaobing Feng

Journal of Translational Medicine (2024)

Deep learning based joint fusion approach to exploit anatomical and functional brain information in autism spectrum disorders

  • Sara Saponaro
  • Francesca Lizzi
  • Alessandra Retico

Brain Informatics (2024)

Genetics of human brain development

  • Hongjun Song
  • Guo-li Ming

Nature Reviews Genetics (2024)

Immunotherapy for recurrent or metastatic nasopharyngeal carcinoma

npj Precision Oncology (2024)

The dawn of multimodal artificial intelligence in nephrology

  • Benjamin Shickel
  • Azra Bihorac

Nature Reviews Nephrology (2024)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

medical research ai

AMIE: A research AI system for diagnostic medical reasoning and conversations

January 12, 2024

Posted by Alan Karthikesalingam and Vivek Natarajan, Research Leads, Google Research

The physician-patient conversation is a cornerstone of medicine, in which skilled and intentional communication drives diagnosis, management, empathy and trust. AI systems capable of such diagnostic dialogues could increase availability, accessibility, quality and consistency of care by being useful conversational partners to clinicians and patients alike. But approximating clinicians’ considerable expertise is a significant challenge.

Recent progress in large language models (LLMs) outside the medical domain has shown that they can plan, reason, and use relevant context to hold rich conversations. However, there are many aspects of good diagnostic dialogue that are unique to the medical domain. An effective clinician takes a complete “clinical history” and asks intelligent questions that help to derive a differential diagnosis. They wield considerable skill to foster an effective relationship, provide information clearly, make joint and informed decisions with the patient, respond empathically to their emotions, and support them in the next steps of care. While LLMs can accurately perform tasks such as medical summarization or answering medical questions, there has been little work specifically aimed towards developing these kinds of conversational diagnostic capabilities.

Inspired by this challenge, we developed Articulate Medical Intelligence Explorer (AMIE) , a research AI system based on a LLM and optimized for diagnostic reasoning and conversations. We trained and evaluated AMIE along many dimensions that reflect quality in real-world clinical consultations from the perspective of both clinicians and patients. To scale AMIE across a multitude of disease conditions, specialties and scenarios, we developed a novel self-play based simulated diagnostic dialogue environment with automated feedback mechanisms to enrich and accelerate its learning process. We also introduced an inference time chain-of-reasoning strategy to improve AMIE’s diagnostic accuracy and conversation quality. Finally, we tested AMIE prospectively in real examples of multi-turn dialogue by simulating consultations with trained actors.

Evaluation of conversational diagnostic AI

Besides developing and optimizing AI systems themselves for diagnostic conversations, how to assess such systems is also an open question. Inspired by accepted tools used to measure consultation quality and clinical communication skills in real-world settings, we constructed a pilot evaluation rubric to assess diagnostic conversations along axes pertaining to history-taking, diagnostic accuracy, clinical management, clinical communication skills, relationship fostering and empathy.

We then designed a randomized, double-blind crossover study of text-based consultations with validated patient actors interacting either with board-certified primary care physicians (PCPs) or the AI system optimized for diagnostic dialogue. We set up our consultations in the style of an objective structured clinical examination (OSCE), a practical assessment commonly used in the real world to examine clinicians’ skills and competencies in a standardized and objective way. In a typical OSCE, clinicians might rotate through multiple stations, each simulating a real-life clinical scenario where they perform tasks such as conducting a consultation with a standardized patient actor (trained carefully to emulate a patient with a particular condition). Consultations were performed using a synchronous text-chat tool, mimicking the interface familiar to most consumers using LLMs today.

AMIE: an LLM-based conversational diagnostic research AI system

We trained AMIE on real-world datasets comprising medical reasoning, medical summarization and real-world clinical conversations.

It is feasible to train LLMs using real-world dialogues developed by passively collecting and transcribing in-person clinical visits, however, two substantial challenges limit their effectiveness in training LLMs for medical conversations. First, existing real-world data often fails to capture the vast range of medical conditions and scenarios, hindering the scalability and comprehensiveness. Second, the data derived from real-world dialogue transcripts tends to be noisy, containing ambiguous language (including slang, jargon, humor and sarcasm), interruptions, ungrammatical utterances, and implicit references.

To address these limitations, we designed a self-play based simulated learning environment with automated feedback mechanisms for diagnostic medical dialogue in a virtual care setting, enabling us to scale AMIE’s knowledge and capabilities across many medical conditions and contexts. We used this environment to iteratively fine-tune AMIE with an evolving set of simulated dialogues in addition to the static corpus of real-world data described.

This process consisted of two self-play loops: (1) an “inner” self-play loop, where AMIE leveraged in-context critic feedback to refine its behavior on simulated conversations with an AI patient simulator; and (2) an “outer” self-play loop where the set of refined simulated dialogues were incorporated into subsequent fine-tuning iterations. The resulting new version of AMIE could then participate in the inner loop again, creating a virtuous continuous learning cycle.

Further, we also employed an inference time chain-of-reasoning strategy which enabled AMIE to progressively refine its response conditioned on the current conversation to arrive at an informed and grounded reply.

We tested performance in consultations with simulated patients (played by trained actors), compared to those performed by 20 real PCPs using the randomized approach described above. AMIE and PCPs were assessed from the perspectives of both specialist attending physicians and our simulated patients in a randomized, blinded crossover study that included 149 case scenarios from OSCE providers in Canada, the UK and India in a diverse range of specialties and diseases.

Notably, our study was not designed to emulate either traditional in-person OSCE evaluations or the ways clinicians usually use text, email, chat or telemedicine. Instead, our experiment mirrored the most common way consumers interact with LLMs today, a potentially scalable and familiar mechanism for AI systems to engage in remote diagnostic dialogue.

Performance of AMIE

In this setting, we observed that AMIE performed simulated diagnostic conversations at least as well as PCPs when both were evaluated along multiple clinically-meaningful axes of consultation quality. AMIE had greater diagnostic accuracy and superior performance for 28 of 32 axes from the perspective of specialist physicians, and 24 of 26 axes from the perspective of patient actors.

Limitations

Our research has several limitations and should be interpreted with appropriate caution. Firstly, our evaluation technique likely underestimates the real-world value of human conversations, as the clinicians in our study were limited to an unfamiliar text-chat interface, which permits large-scale LLM–patient interactions but is not representative of usual clinical practice. Secondly, any research of this type must be seen as only a first exploratory step on a long journey. Transitioning from a LLM research prototype that we evaluated in this study to a safe and robust tool that could be used by people and those who provide care for them will require significant additional research. There are many important limitations to be addressed, including experimental performance under real-world constraints and dedicated exploration of such important topics as health equity and fairness, privacy, robustness, and many more, to ensure the safety and reliability of the technology.

AMIE as an aid to clinicians

In a recently released preprint , we evaluated the ability of an earlier iteration of the AMIE system to generate a DDx alone or as an aid to clinicians. Twenty (20) generalist clinicians evaluated 303 challenging, real-world medical cases sourced from the New England Journal of Medicine (NEJM) ClinicoPathologic Conferences (CPCs). Each case report was read by two clinicians randomized to one of two assistive conditions: either assistance from search engines and standard medical resources, or AMIE assistance in addition to these tools. All clinicians provided a baseline, unassisted DDx prior to using the respective assistive tools.

AMIE exhibited standalone performance that exceeded that of unassisted clinicians (top-10 accuracy 59.1% vs. 33.6%, p= 0.04). Comparing the two assisted study arms, the top-10 accuracy was higher for clinicians assisted by AMIE, compared to clinicians without AMIE assistance (24.6%, p<0.01) and clinicians with search (5.45%, p=0.02). Further, clinicians assisted by AMIE arrived at more comprehensive differential lists than those without AMIE assistance.

It's worth noting that NEJM CPCs are not representative of everyday clinical practice. They are unusual case reports in only a few hundred individuals so offer limited scope for probing important issues like equity or fairness.

Bold and responsible research in healthcare — the art of the possible

Access to clinical expertise remains scarce around the world. While AI has shown great promise in specific clinical applications, engagement in the dynamic, conversational diagnostic journeys of clinical practice requires many capabilities not yet demonstrated by AI systems. Doctors wield not only knowledge and skill but a dedication to myriad principles, including safety and quality, communication, partnership and teamwork, trust, and professionalism. Realizing these attributes in AI systems is an inspiring challenge that should be approached responsibly and with care. AMIE is our exploration of the “art of the possible”, a research-only system for safely exploring a vision of the future where AI systems might be better aligned with attributes of the skilled clinicians entrusted with our care. It is early experimental-only work, not a product, and has several limitations that we believe merit rigorous and extensive further scientific studies in order to envision a future in which conversational, empathic and diagnostic AI systems might become safe, helpful and accessible.

Acknowledgements

The research described here is joint work across many teams at Google Research and Google Deepmind. We are grateful to all our co-authors - Tao Tu, Mike Schaekermann, Anil Palepu, Daniel McDuff, Jake Sunshine, Khaled Saab, Jan Freyberg, Ryutaro Tanno, Amy Wang, Brenna Li, Mohamed Amin, Sara Mahdavi, Karan Sighal, Shekoofeh Azizi, Nenad Tomasev, Yun Liu, Yong Cheng, Le Hou, Albert Webson, Jake Garrison, Yash Sharma, Anupam Pathak, Sushant Prakash, Philip Mansfield, Shwetak Patel, Bradley Green, Ewa Dominowska, Renee Wong, Juraj Gottweis, Dale Webster, Katherine Chou, Christopher Semturs, Joelle Barral, Greg Corrado and Yossi Matias. We also thank Sami Lachgar, Lauren Winer and John Guilyard for their support with narratives and the visuals. Finally, we are grateful to Michael Howell, James Manyika, Jeff Dean, Karen DeSalvo, Zoubin Ghahramani and Demis Hassabis for their support during the course of this project .

  • Generative AI
  • Health & Bioscience

Other posts of interest

medical research ai

May 23, 2024

  • Generative AI ·
  • Machine Intelligence

medical research ai

  • Conferences & Events ·
  • Health & Bioscience ·
  • Machine Intelligence ·
  • Product ·
  • Quantum ·
  • Responsible AI

medical research ai

May 15, 2024

How artificial intelligence can power clinical development

Clinical development and the randomized clinical trial (RCT) remain largely unaffected by the unprecedented wave of innovation in pharmaceutical R&D fueled by developments in artificial intelligence (AI), including generative AI (gen AI) and foundational models. However, they are under pressure from the rise of precision medicine and a more competitive development landscape. So far, the adoption of AI for clinical development has emphasized operational excellence and acceleration, but advances in scientific AI have made this the time to leverage modern analytical tools and novel sources of data to design more precise, efficient trials with greater success rates.

About the authors

This article is a collaborative effort by Chris Anagnostopoulos, David Champagne , Thomas Devenyns, Alex Devereson , and Heikki Tarkkila, representing views from McKinsey’s Life Sciences Practice.

The introduction of RCTs in the mid-20th century ushered in the modern era of evidence-based drug development. Their rigidity and simplicity were welcome defenses against an overreliance on anecdotes and case studies and have unarguably served patients well, supporting the introduction of countless safe, effective therapies. Yet RCTs are starting to be seen as bottlenecks that lengthen the time for therapies to gain approval and can increase costs. 1 Donald A Berry, “The brave new world of clinical cancer research: Adaptive biomarker-driven trials integrating clinical practice with clinical research,” Molecular Oncology , 2015, Volume 9, Issue 5; Michael Baumann et al., “How much does it cost to research and develop a new drug? A systematic review and assessment,” PharmacoEconomics , 2021, Volume 39, Issue 11. Meanwhile, as gen AI breaks milestone after milestone, patients eagerly await the translation of such unprecedented technological progress into pharmaceutical R&D that could deliver faster access to better treatments.

In addition to rising expectations about timelines, clinical development is also facing increasing demands to generate targeted and meaningful data. As precision medicine becomes mainstream, RCTs often must prove not only the general efficacy of a treatment but also whether it will benefit a specific segment of the patient population. The smaller that segment, the harder it can be to enroll enough patients in trials. In addition, the bar for what constitutes a clinically meaningful treatment effect is creeping higher to meet standards set by regulators and payers as well as competitive pressure. In most therapeutic areas today, there are, on average, 40 percent more assets per indication than in 2006, 2 McKinsey research on data from EvaluatePharma, March 2019. raising the need for greater differentiation. And clinicians and patients alike need more evidence to make decisions about the best available treatment at a given time.

All this comes as researchers accelerate drug discovery by making greater use of AI and gen AI, such as DeepMind’s AI-enabled platform AlphaFold, which can predict the 3-D structure of molecules 3 “ AI in biopharma research: A time to focus and scale ,” McKinsey, October 10, 2022. leading to a pipeline of more and better-designed preclinical assets with a better target validation. Companies are also striving to improve their operational processes with a view to speeding up the time it takes to apply for a first-in-human study. But if clinical development fails to keep pace, the benefits to patients of faster drug discovery will inevitably be delayed, and delivering on the promise of AI-enabled acceleration may flounder.

While the RCT will remain a central pillar of clinical development, the tide is changing. Regulators have recently issued guidance on appropriate use of real-world data (RWD). 4 See, for example, the US Food and Drug Administration’s real-world evidence activities responding to the 21st Century Cures Act. The healthcare data ecosystem is booming, leveraging privacy-respecting technology to make patient-level data safely available for research. And evidence generation from multiple data sources is now possible using causal machine learning, which aims to distinguish correlation from causation via a combination of biostatistics with machine learning (ML) and, increasingly, gen AI and foundational models.

Despite this level of promise, only a handful of established companies are deploying AI and data-driven approaches systematically in their clinical development. The focus so far has been on improving operational excellence and increasing acceleration rather than helping to inform trial design strategy. The remainder of this white paper dives deeper into the context, gives tangible example use cases and associated impact, and identifies challenges companies face when adopting AI for clinical development. Rather than staying at a high level, it aims to bring the topic to life through relevant details and case examples.

Never just tech

Creating value beyond the hype

Let’s deliver on the promise of technology from strategy to scale.

Expanding adoption of AI and RWD for clinical development

Today AI can draw upon growing volumes of RWD. Electronic health records and claims data from certain geographies are widely available, and novel data sources—including biobanks, data omics panels, population-wide genomic studies, patient registries, and imaging and digital pathology—are increasing in number and diversity. All these sources reflect a notable change in patients’ ability to share their data for the purpose of advancing research and treatments in privacy-respecting ways, including technology that enables the training of ML models in a manner that respects patients’ privacy.

In addition, new tools can systematically capture knowledge from unstructured data. Large language models (for example, BioGPT) are able to convert unstructured physician notes into high-quality structured data. Similarly, these models can search the vast corpus of published literature and identify connections between biological entities on a large scale, generating high-quality input for knowledge graphs that better represent the totality of available evidence on a given indication across domains, including genes, targets, proteins, pathways, and phenotypes.

Most pharma companies using AI and RWD in clinical development tend to do so only in isolated use cases, and they rarely deploy both in combination. For example, they might use machine learning to select trial sites or to predict patient enrollment rates. They might use RWD to measure disease prevalence—the size of an eligible population—to understand the natural course of a disease among untreated patients, or to construct an external control arm for regulatory purposes. While these use cases deliver value, a great deal more is now possible. Similarly, while the level of funding and investment going into AI-enabled drug discovery companies has tripled in the last five years, that same trend is not true for equivalent companies in the clinical development space.

Four use cases that demonstrate the potential of AI

Companies that are deploying AI combined with RWD are seeing impactful results. While RWD can be valuable supporting evidence in health authority submission packages, the most successful companies focus their use of AI and RWD to make better and more informed decisions to support the success of clinical development programs in every step, from asset and portfolio strategy to protocol and trial design:

  • At the stage of defining an asset strategy, AI and RWD can reveal which indications are most promising indications to pursue for novel assets. Several leading biopharma companies identified multiple new indications for existing assets in this way, and an early-stage biotech used AI and RWD to assess whether to shift its indication selection strategy regarding a novel asset.
  • AI and RWD can support decisions about the target patient population of a clinical trial, with subgroup discovery, refine trial eligibility criteria and help remove patients that are highly unlikely to benefit from the treatment, and shorten the length of trials. One early-stage biotech could better characterize “super-progressors” (that is, patients whose disease was likely to progress faster within the time frame of a clinical trial) and design a trial with similar expected benefits in a faster time frame.
  • For decisions about portfolio strategy, AI and RWD can help companies identify the right combination of drugs for an indication or for the right patients. One biopharma company leveraged a prospective observational data set to generate evidence supportive of earlier positioning of its third-line treatment. Another was able to identify “super-responders” for several drugs in its portfolio—insights that helped the company position its assets optimally in a crowded indication.
  • At the step of selecting and optimizing the end points of a clinical trial, AI and RWD can help a company identify patient attributes that closely track the primary end point over time. One biopharma company replaced a rare disease’s existing end point, which was an infrequent event, with end points that occurred with greater frequency or could be measured with blood tests. This cut the length of trials by 15 to 30 percent.

The following discussion looks in greater depth at four detailed use cases, illustrating the wealth of information that AI can unlock.

Indication selection for asset strategy

Selecting which indications to target with a specific molecule is one of the most important decisions a biopharma company makes. This decision is often informed by a combination of input from key opinion leaders, literature reviews, omics analysis (for example, genome-wide association studies), RCT data, and competitor decisions. Such decisions rarely are fully data driven, typically prove hard to integrate, and cover only part of the available evidence base, resulting in a subjective and suboptimal synthesis. In contrast, strategies informed by RWD and AI are objective and comprehensive in their use of multiple data sets and foundational models built on clinical data continually redefining the art of the possible.

For already-approved therapies, companies can use RWD to understand the therapies’ likely efficacy on alternative indications. One approach is examining the outcomes of patients who were prescribed the drug on a spontaneous basis by their physician; another is determining a drug’s average effect on patients exposed to the treatment because of incidental comorbidities. AI techniques can then extrapolate any findings to a set of patients whose characteristics may be distributed differently, by leveraging observed correlations between patient characteristics and outcomes.

Where companies are looking to expand the indication of an approved asset or identify for a novel asset, RWD and AI can estimate the biological proximity of one indication to another from a patient and clinical perspective—that is, whether the patient experiences similar symptoms, comorbidities, lab characteristics, and treatment journeys. Foundational models that treat medical events as words and patient medical histories as documents can uncover the semantic similarity of different events, including diagnoses. Each indication, from what is likely to be a sizable list of those with biological proximity, is scored according to its similarity to one or more anchor references—perhaps the indication for which the asset was originally approved or, in the case of a novel asset, one for which there is strong preclinical evidence of efficacy of the asset’s mechanism of action (MoA).

The scoring can also incorporate information from molecular knowledge graphs that show new connections—for example, between entities such as proteins or human biological pathways that have already been identified in literature or public data. Exhibit 1 shows a potential outcome of this approach: it can reveal the indications most strongly connected to each of the reference indications, serving as an anchor into the existing evidence base. The indications with the most connections to these references are further prioritized by unmet medical need, strategic fit, and technical feasibility, resulting in an evidence-based indication selection.

These analytical approaches can identify novel indications that can be rapidly validated via in vitro or animal models, can increase the confidence in selecting indications with a high probability of success, and can derisk resource allocation accordingly. The analysis provides a clearer and more holistic evidence base for investors, shareholders, and R&D leaders, and it can reduce the opportunity cost of blind alleys by helping new treatments reach patients faster.

Subgroup discovery for trial design

Once an asset has been matched with an indication, pharmaceutical companies pursue an all-comer clinical development and trial design strategy—that is, one that includes all patients except those deemed high-risk based on input from key opinion leaders and conventional wisdom. 5 The use of the all-comer strategy may reflect a low level of confidence in existing hypotheses that currently seek to explain response heterogeneity in patient subpopulations. As a result, patients unlikely to respond to a treatment may be included in the trial. A notable exception to this method is in precision oncology, where researchers use specific biomarkers discovered preclinically (such as genetic mutations) to stratify patients according to their probability of progression or to predict a patient’s response to different treatments. This approach is revolutionizing oncology treatments, but even it can identify only a fraction of possible patient subpopulations.

In contrast, AI and omics-rich RWD can examine thousands of genetic and/or phenotypic attributes to pinpoint the combinations most likely to influence prognostic or predictive scores, explain response heterogeneity, and improve a trial’s technical probability of success. Even in situations where the patient cohorts are modestly sized, foundational models trained on broad RWD can be used as a starting point for more bespoke modeling at the indication level.

One large biopharmaceutical company used RWD to remove likely nonresponders from a trial and to reduce its expected duration by between 5 and 10 percent without compromising its probability of success, which would allow the treatment to reach patients faster. Another company leveraged hospital episode RWD to identify super-responders to its comparator arm for an asset in Phase I, ensuring the trial was designed in a manner that focused on patients without access to effective treatments.

AI for subgroup discovery can also directly support the strategic trade-off between the size of the eligible population and the level of the treatment effect in a smaller population, in the form of an efficient frontier (Exhibit 2). The different subpopulations that are marked as blue on the exhibit are those for which it is impossible to simultaneously increase their breadth without reducing the drug’s expected effect. This approach widens the overlap between patients with access to this new treatment and patients who are most likely to benefit significantly from it.

Subgroup discovery and comparative efficacy for portfolio optimization

Not only can AI be used to discover subgroups to inform the design of a trial, it also can contribute to the wider portfolio strategy. The product pipelines and portfolios of many biopharmaceutical companies contain multiple assets aimed to treat a similar set of patients, often because companies lack sufficient evidence to inform the portfolio strategy differently, especially for new MoAs. However, the systematic analysis of all available data sources can predict the response of different patient subgroups (or disease endotypes), which could help companies hone their portfolio strategy—in some cases, resulting in a double-digit percentage improvement in net present value. This strategic stance can support targeted trials that produce the type of precision evidence that clinicians need to make the best possible treatment decisions for their patients amid a proliferation of novel assets.

In addition, several analytical approaches and data sources can be combined to determine the efficacy of novel MoAs in the portfolio relative to each other or to approved treatments of different patient subgroups. Enriching RWD with data from molecular knowledge graphs can enable foundational-model-powered representations of treatments that represent their associated biological pathways. From this, the company can estimate the efficacy of a novel treatment by observing the outcomes of approved treatments that share similar biological action mechanisms with the novel treatment. This exercise can be repeated across multiple disease endotypes or patient subgroups, identified through a combination of patient-level attributes (Exhibit 3).

End point optimization in clinical development

In many trials, there is no single, established end point. In others, it can be hard to know which of several would best measure the intended action of the drug. Some end points may take a long time to manifest, leading to extended development timelines. Others may detect progression that is more reliable for some types of patients than for others or that may be particularly invasive, increasing the trial’s burden on patients.

AI used on rich, patient-level longitudinal data sets can be deployed to identify patient attributes that closely track the primary end point over time. Such an application can control multiple confounding factors to ensure the association persists in future trials where the patient population may differ, thereby establishing potential novel end points.

Here, X-rays, CT and MRI scans, and other imaging techniques are often valuable sources of data for monitoring biomarkers in a noninvasive manner (as is the case with using MRIs to help detect Alzheimer’s disease 6 Ruth Stephen et al., “Change in CAIDE dementia risk score and neuroimaging biomarkers during a 2-year multidomain lifestyle randomized controlled trial: Results of a post-hoc subgroup analysis,” Journals of Gerontology: Series A , 2021, Volume 76, Issue 8. ). Foundational models using medical imaging are increasingly able to extract imaging biomarkers curated by human expertise from raw images at scale. They can even discover altogether novel, deeply hidden visual signatures of disease activity and severity with better predictive characteristics.

In all cases, the association between novel end points and desired patient outcomes such as survival or quality of life must persist even under treatment. The clinical data and the information present in RWD must be combined with a biological understanding of the mechanisms affected by the considered treatment.

Tackling the organizational challenges

Inevitably, supplementing a century-long, RCT-dependent approach to clinical development with a new, analytics-driven one using novel data poses organizational challenges. Here are some ways to tackle them.

Embed AI into clinical development

The governance processes and incentive structures in place in most biopharmaceutical companies encourage default solutions in clinical development at the asset team level and beyond. One consequence is the prevalence of all-comer trials, which target the broadest possible population without leveraging evidence about nonresponders and super-responders—a method that can increase trials’ duration and decrease their probability of success, resulting in longer waits for patients who would benefit from the treatment.

R&D leaders can help change that culture by encouraging the use of analytical methods to integrate all sources of available evidence when making key decisions. Individual incentives can help. A stronger step is to strengthen the development governance model to require that all key decisions be supported by AI.

Develop internal capabilities and talent

External providers can supply off-the-shelf AI solutions to support some use cases. But with so many new solutions emerging, no provider can yet support them all. Even if they could, companies would be unwise to become dependent upon them for critical strategic decisions. Providers with such rich intellectual property might eventually become competitors.

Biopharmaceutical companies could therefore consider building their own AI capabilities and products. However, significant investment is needed to acquire the relevant data sets; build up the necessary data science, data engineering, and business translator expertise; and establish agile product development processes that ensure use cases deliver business value in a timely and cost-efficient manner.

Companies will also have to work hard to attract and retain top data scientists and data engineers, as the biopharmaceutical industry is not typically their first choice. Measures that can help include offering competitive remuneration packages, building an innovative employer brand, and locating in key talent hubs. So, too, can a focus on the industry’s value proposition—offering employees the chance to help people lead longer and healthier lives.

Develop a targeted data strategy

With so much data becoming available, companies should consider formulating a detailed and iterative data strategy for each disease area. The purchase of certain commercial data sets, such as electronic health records, claims, and sometimes omics, can be relatively straightforward, even in therapeutic areas such as oncology, where biomarker-rich data are required. But other types of data sets—registries, biobanks, and clinical trials from other pharma companies—can involve complex negotiations and contracts, and companies will have to partner with a range of different data providers.

In addition, each data set is likely to be structured differently, with varying levels of quality and completeness. This means companies will likely need data curation and analytical capabilities for tasks such as combining different data sets to plug gaps in evidence and meeting regulatory expectations for the use of real-world data.

Build scalable products

AI’s power lies in its application across the clinical development portfolio, not restricted to isolated use cases. Achieving this kind of scale can be hard, due to limitations of the code base, data platforms, the company’s data engineering capabilities, and the analytical stack needed to collect, combine, and analyze data. As a result, pilots often remain just that—never reused or deployed against other assets in development.

To counter this, analytical use cases should be built as products, meaning solutions built in a manner that makes them easy to reuse and scale for different use cases, even at the proof-of-concept phase. The code base should be well annotated, for example, and deploy common, reusable analytical components that are either developed internally, open-source, or bought off the shelf. Shared ML development standards and coding frameworks can help by speeding up development work across different use cases because data scientists and engineers are familiar with them. Companies also need a mature and flexible ML operations stack—that is, the operational components needed to streamline the process of taking ML models to production and maintaining and monitoring them.

By harnessing novel data and the power of AI, biopharmaceutical companies can move beyond RCTs to vastly improve clinical development. The benefits accrue at each stage of the development process, from formulating asset strategy to designing the protocol and trial planning. For patients, the results can speed up access to new treatments, thanks to an accelerated development timeline, and make treatments more likely to elicit the strongest response.

High-impact solutions are already being implemented, and more will follow. Companies can benefit from beginning to build the capabilities they need and scaling them across the portfolio and their assets’ life cycles. The clinical development and trial design system is ripe for change, and the value to patients will be vast.

Chris Anagnostopoulos is an associate partner in McKinsey’s Athens office, David Champagne and Alex Devereson are partners in the London office, Thomas Devenyns is an associate partner in the Geneva office, and Heikki Tarkkila is an associate partner in the Helsinki office.

Explore a career with us

Related articles.

Futuristic science and technology concept background

Generating real-world evidence at scale using advanced analytics

A female scientist stands in a modern lab

How AI could revolutionize drug discovery

Futuristic laboratory equipment. Rear view of scientist during DNA research

AI in biopharma research: A time to focus and scale

AI in Clinical Medicine

Over the last two decades, the digitization of medical records created opportunities for automation and data-driven clinical support for a range of routine clinical applications.

AI in Medicine

Associated Schools

Harvard Medical School

Harvard Medical School

What you'll learn.

Define the unique challenges and opportunities for integrating AI in specialized health care fields. 

Discuss the ethical considerations and potential biases in AI algorithms, especially in decision-making processes related to patient care, diagnosis, and treatment planning.   

Review the current status area of AI regulation and how it can impact health care.   

Assess the long-term quality and accuracy of AI technologies and their impact on patient care.   

Develop methods for integrating AI into medical education, including content generation, evaluation, and ensuring alignment with educational objectives.

Course description

Today, artificial intelligence (AI) is accelerating innovation in clinical medicine. We are on the cusp of revolutionizing how we care for patients in profound ways.  

New technologies are available now to help you impact your practice. AI medical scribes, new research tools and diagnostic tests, and personalized treatment options are just a few applications of AI that are beginning to have a direct impact on clinicians and the patients they serve.  

Up until now, most medical practitioners have not received formal training in artificial intelligence. Recognizing that now is the time for physicians and allied health care professionals to prepare for how AI is changing medical care, Harvard Medical School is offering this new continuing education course, AI in Clinical Medicine.  

This live virtual course focuses on cutting edge and exciting new applications of AI, including foundational principles and lessons learned that you will be able to take directly back to your practice. Over two days, you will hear from medical society leaders, academic leaders, and innovators from academia and industry.  

Sessions will delve into the applications of AI in diagnosing diseases, predicting patient outcomes, and personalizing treatment plans. Through lectures, field-specific break-out sessions, and real-world case studies, you will acquire the knowledge to harness AI’s potential to improve patient care and medical research. Faculty experts will explore the ethical implications, challenges, and opportunities inherent in integrating AI into medical practice. 

During this course we will cut through the hype around AI to provide realistic, firsthand viewpoints into the potential of AI in clinical practice. Physicians and medical practitioners of all kinds are highly encouraged to join us for this transformational learning experience. 

You may also like

Professor lecturing in the Surgical Leadership Program

Surgical Leadership

This Harvard Medical School one-year, application-based certificate program equips surgeons with the skills, strategies and confidence to take on greater leadership roles. 

HMX Physiology Course Image

HMX Physiology

Learn foundational concepts in physiology and see how the material is used in taking care of patients.

Gastro

Gastroenterology

Gastroenterology 2024 will be held online this year, using live streaming, electronic Q&A, and other remote learning technologies.

Artificial intelligence  is being used in healthcare for everything from answering patient questions to assisting with surgeries and developing new pharmaceuticals.

According to  Statista , the artificial intelligence (AI) healthcare market, which is valued at $11 billion in 2021, is projected to be worth $187 billion in 2030. That massive increase means we will likely continue to see considerable changes in how medical providers, hospitals, pharmaceutical and biotechnology companies, and others in the healthcare industry operate.

Better  machine learning (ML)  algorithms, more access to data, cheaper hardware, and the availability of 5G have contributed to the increasing application of AI in the healthcare industry, accelerating the pace of change. AI and ML technologies can sift through enormous volumes of health data—from health records and clinical studies to genetic information—and analyze it much faster than humans.

Healthcare organizations are using AI to improve the efficiency of all kinds of processes, from back-office tasks to patient care. The following are some examples of how AI might be used to benefit staff and patients:

  • Administrative workflow:  Healthcare workers spend a lot of time doing paperwork and other administrative tasks. AI and automation can help perform many of those mundane tasks, freeing up employee time for other activities and giving them more face-to-face time with patients. For example, generative AI can help clinicians with note-taking and content summarization that can help keep medical records as thoroughly as possible. AI might also help with accurate coding and sharing of information between departments and billing.
  • Virtual nursing assistants:  One study found that  64% of patients  are comfortable with the use of AI for around-the-clock access to answers that support nurses provide. AI virtual nurse assistants—which are AI-powered chatbots, apps, or other interfaces—can be used to help answer questions about medications, forward reports to doctors or surgeons and help patients schedule a visit with a physician. These sorts of routine tasks can help take work off the hands of clinical staff, who can then spend more time directly on patient care, where human judgment and interaction matter most.
  • Dosage error reduction:  AI can be used to help identify errors in how a patient self-administers medication. One example comes from a study in  Nature Medicine , which found that up to 70% of patients don’t take insulin as prescribed. An AI-powered tool that sits in the patient’s background (much like a wifi router) might be used to flag errors in how the patient administers an insulin pen or inhaler.
  • Less invasive surgeries:  AI-enabled robots might be used to work around sensitive organs and tissues to help reduce blood loss, infection risk and post-surgery pain.
  • Fraud prevention:  Fraud in the healthcare industry is enormous, at $380 billion/year, and raises the cost of consumers’ medical premiums and out-of-pocket expenses. Implementing AI can help recognize unusual or suspicious patterns in insurance claims, such as billing for costly services or procedures that are not performed, unbundling (which is billing for the individual steps of a procedure as though they were separate procedures), and performing unnecessary tests to take advantage of insurance payments.

A recent study found that  83% of patients  report poor communication as the worst part of their experience, demonstrating a strong need for clearer communication between patients and providers. AI technologies like  natural language processing  (NLP), predictive analytics, and  speech recognition  might help healthcare providers have more effective communication with patients. AI might, for instance, deliver more specific information about a patient’s treatment options, allowing the healthcare provider to have more meaningful conversations with the patient for shared decision-making.

According to  Harvard’s School of Public Health , although it’s early days for this use, using AI to make diagnoses may reduce treatment costs by up to 50% and improve health outcomes by 40%.

One use case example is out of the  University of Hawaii , where a research team found that deploying  deep learning  AI technology can improve breast cancer risk prediction. More research is needed, but the lead researcher pointed out that an AI algorithm can be trained on a much larger set of images than a radiologist—as many as a million or more radiology images. Also, that algorithm can be replicated at no cost except for hardware.

An  MIT group  developed an ML algorithm to determine when a human expert is needed. In some instances, such as identifying cardiomegaly in chest X-rays, they found that a hybrid human-AI model produced the best results.

Another  published study  found that AI recognized skin cancer better than experienced doctors.  US, German and French researchers used deep learning on more than 100,000 images to identify skin cancer. Comparing the results of AI to those of 58 international dermatologists, they found AI did better.

As health and fitness monitors become more popular and more people use apps that track and analyze details about their health. They can share these real-time data sets with their doctors to monitor health issues and provide alerts in case of problems.

AI solutions—such as big data applications, machine learning algorithms and deep learning algorithms—might also be used to help humans analyze large data sets to help clinical and other decision-making. AI might also be used to help detect and track infectious diseases, such as COVID-19, tuberculosis, and malaria.

One benefit the use of AI brings to health systems is making gathering and sharing information easier. AI can help providers keep track of patient data more efficiently.

One example is diabetes. According to the  Centers for Disease Control and Prevention , 10% of the US population has diabetes. Patients can now use wearable and other monitoring devices that provide feedback about their glucose levels to themselves and their medical team. AI can help providers gather that information, store, and analyze it, and provide data-driven insights from vast numbers of people. Using this information can help healthcare professionals determine how to better treat and manage diseases.

Organizations are also starting to use AI to help improve drug safety. The company SELTA SQUARE, for example, is  innovating the pharmacovigilance (PV) process , a legally mandated discipline for detecting and reporting adverse effects from drugs, then assessing, understanding, and preventing those effects. PV demands significant effort and diligence from pharma producers because it’s performed from the clinical trials phase all the way through the drug’s lifetime availability. Selta Square uses a combination of AI and automation to make the PV process faster and more accurate, which helps make medicines safer for people worldwide.

Sometimes, AI might reduce the need to test potential drug compounds physically, which is an enormous cost-savings.  High-fidelity molecular simulations  can run on computers without incurring the high costs of traditional discovery methods.

AI also has the potential to help humans predict toxicity, bioactivity, and other characteristics of molecules or create previously unknown drug molecules from scratch.

As AI becomes more important in healthcare delivery and more AI medical applications are developed, ethical, and regulatory governance must be established. Issues that raise concern include the possibility of bias, lack of transparency, privacy concerns regarding data used for training AI models, and safety and liability issues.

“AI governance is necessary, especially for clinical applications of the technology,” said Laura Craft, VP Analyst at  Gartner . “However, because new AI techniques are largely new territory for most [health delivery organizations], there is a lack of common rules, processes, and guidelines for eager entrepreneurs to follow as they design their pilots.”

The World Health Organization (WHO) spent 18 months deliberating with leading experts in ethics, digital technology, law, and human rights and various Ministries of Health members to produce a report that is called  Ethics & Governance of Artificial Intelligence for Health . This report identifies ethical challenges to using AI in healthcare, identifies risks, and outlines six  consensus principles  to ensure AI works for the public’s benefit:

  • Protecting autonomy
  • Promoting human safety and well-being
  • Ensuring transparency
  • Fostering accountability
  • Ensuring equity
  • Promoting tools that are responsive and sustainable

The WHO report also provides recommendations that ensure governing AI for healthcare both maximizes the technology’s promise and holds healthcare workers accountable and responsive to the communities and people they work with.

AI provides opportunities to help reduce human error, assist medical professionals and staff, and provide patient services 24/7. As AI tools continue to develop, there is potential to use AI even more in reading medical images, X-rays and scans, diagnosing medical problems and creating treatment plans.

AI applications continue to help streamline various tasks, from answering phones to analyzing population health trends (and likely, applications yet to be considered). For instance, future AI tools may automate or augment more of the work of clinicians and staff members. That will free up humans to spend more time on more effective and compassionate face-to-face professional care.

When patients need help, they don’t want to (or can’t) wait on hold. Healthcare facilities’ resources are finite, so help isn’t always available instantaneously or 24/7—and even slight delays can create frustration and feelings of isolation or cause certain conditions to worsen.

IBM® watsonx Assistant™ AI healthcare chatbots  can help providers do two things: keep their time focused where it needs to be and empower patients who call in to get quick answers to simple questions.

IBM watsonx Assistant  is built on deep learning, machine learning and natural language processing (NLP) models to understand questions, search for the best answers and complete transactions by using conversational AI.

Get email updates about AI advancements, strategies, how-tos, expert perspective and more.

See IBM watsonx Assistant in action and request a demo

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.

Neurology and Neurosurgery

  • Artificial intelligence: Enhanced expertise drives innovation

May 23, 2024

medical research ai

Mayo Clinic clinician-researchers are at the forefront of developing artificial intelligence (AI) applications for neurosurgery.

"Artificial intelligence will provide a good portion of future innovations in healthcare. That's the case not just in knowledge-based fields but certainly in technical and interventional fields like neurosurgery," says Richard W. Byrne, M.D. , a neurosurgeon who recently joined Mayo Clinic in Jacksonville, Florida.

One particular focus is brain-computer interfaces. Dr. Byrne is co-principal investigator of a study involving intracortical visual prostheses to treat blindness. The study, funded partly by the National Institutes of Health, is a collaboration with the Illinois Institute of Technology.

Two years ago, Dr. Byrne performed the first implantation of wireless electrodes to provide artificial vision. "We implanted 400 completely wireless electrodes in the right occipital cortex of a man with total blindness. The artificial vision helps him to navigate a room and pick things up," Dr. Byrne says. "There are countless potential applications."

Dr. Byrne's clinical practice focuses on glioma and skull base tumors. He also has deep experience with cortical mapping in awake brain surgery. He notes that awake craniotomies were first performed a century ago.

"This is not new. It's just better and safer now," he says. "Patients tolerate it remarkably well. In fact, many patients seek it because they then participate in the procedure, monitoring their own function. Patients love that sense of control."

Surgeons are also empowered. "Some tumors may be obvious on MRI but completely unobvious during surgery. The borders are quite fuzzy," Dr. Byrne says. "With awake craniotomy, you have instant feedback. You know that what you're doing is safe."

Mayo Clinic's clinical care and research efforts are bolstered by physicians' close ties to professional organizations. Dr. Byrne recently became president-elect of the Society of Neurological Surgeons. He is a director of the American Board of Neurological Surgery and a member of the board of directors of the American Association of Neurological Surgeons, as well as a past president of the Neurosurgical Society of America.

That deep experience dovetails with AI's strengths. "AI is really good at putting into words what an experienced surgeon might have an intuition about," Dr. Byrne says. "After 25 years of clinical practice, you have an intuition of what you should and should not do. AI is pretty good at figuring out why you feel that way."

For more information

Illinois Institute of Technology .

Refer a patient to Mayo Clinic.

Receive Mayo Clinic news in your inbox.

  • Medical Professionals

Your gift holds great power – donate today!

Make your tax-deductible gift and be a part of the cutting-edge research and care that's changing medicine.

This paper is in the following e-collection/theme issue:

Published on 22.5.2024 in Vol 26 (2024)

AI Quality Standards in Health Care: Rapid Umbrella Review

Authors of this article:

Author Orcid Image

  • Craig E Kuziemsky 1 , BSc, BCom, PhD   ; 
  • Dillon Chrimes 2 , BSc, MSc, PhD   ; 
  • Simon Minshall 2 , BSc, MSc   ; 
  • Michael Mannerow 1 , BSc   ; 
  • Francis Lau 2 , BSc, MSc, MBA, PhD  

1 MacEwan University, Edmonton, AB, Canada

2 School of Health Information Science, University of Victoria, Victoria, BC, Canada

Corresponding Author:

Craig E Kuziemsky, BSc, BCom, PhD

MacEwan University

10700 104 Avenue

Edmonton, AB, T5J4S2

Phone: 1 7806333290

Email: [email protected]

Background: In recent years, there has been an upwelling of artificial intelligence (AI) studies in the health care literature. During this period, there has been an increasing number of proposed standards to evaluate the quality of health care AI studies.

Objective: This rapid umbrella review examines the use of AI quality standards in a sample of health care AI systematic review articles published over a 36-month period.

Methods: We used a modified version of the Joanna Briggs Institute umbrella review method. Our rapid approach was informed by the practical guide by Tricco and colleagues for conducting rapid reviews. Our search was focused on the MEDLINE database supplemented with Google Scholar. The inclusion criteria were English-language systematic reviews regardless of review type, with mention of AI and health in the abstract, published during a 36-month period. For the synthesis, we summarized the AI quality standards used and issues noted in these reviews drawing on a set of published health care AI standards, harmonized the terms used, and offered guidance to improve the quality of future health care AI studies.

Results: We selected 33 review articles published between 2020 and 2022 in our synthesis. The reviews covered a wide range of objectives, topics, settings, designs, and results. Over 60 AI approaches across different domains were identified with varying levels of detail spanning different AI life cycle stages, making comparisons difficult. Health care AI quality standards were applied in only 39% (13/33) of the reviews and in 14% (25/178) of the original studies from the reviews examined, mostly to appraise their methodological or reporting quality. Only a handful mentioned the transparency, explainability, trustworthiness, ethics, and privacy aspects. A total of 23 AI quality standard–related issues were identified in the reviews. There was a recognized need to standardize the planning, conduct, and reporting of health care AI studies and address their broader societal, ethical, and regulatory implications.

Conclusions: Despite the growing number of AI standards to assess the quality of health care AI studies, they are seldom applied in practice. With increasing desire to adopt AI in different health topics, domains, and settings, practitioners and researchers must stay abreast of and adapt to the evolving landscape of health care AI quality standards and apply these standards to improve the quality of their AI studies.

Introduction

Growth of health care artificial intelligence.

In recent years, there has been an upwelling of artificial intelligence (AI)–based studies in the health care literature. While there have been reported benefits, such as improved prediction accuracy and monitoring of diseases [ 1 ], health care organizations face potential patient safety, ethical, legal, social, and other risks from the adoption of AI approaches [ 2 , 3 ]. A search of the MEDLINE database for the terms “artificial intelligence” and “health” in the abstracts of articles published in 2022 alone returned >1000 results. Even by narrowing it down to systematic review articles, the same search returned dozens of results. These articles cover a wide range of AI approaches applied in different health care contexts, including such topics as the application of machine learning (ML) in skin cancer [ 4 ], use of natural language processing (NLP) to identify atrial fibrillation in electronic health records [ 5 ], image-based AI in inflammatory bowel disease [ 6 ], and predictive modeling of pressure injury in hospitalized patients [ 7 ]. The AI studies reported are also at different AI life cycle stages, from model development, validation, and deployment to evaluation [ 8 ]. Each of these AI life cycle stages can involve different contexts, questions, designs, measures, and outcomes [ 9 ]. With the number of health care AI studies rapidly on the rise, there is a need to evaluate the quality of these studies in different contexts. However, the means to examine the quality of health care AI studies have grown more complex, especially when considering their broader societal and ethical implications [ 10 - 13 ].

Coiera et al [ 14 ] described a “replication crisis” in health and biomedical informatics where issues regarding experimental design and reporting of results impede our ability to replicate existing research. Poor replication raises concerns about the quality of published studies as well as the ability to understand how context could impact replication across settings. The replication issue is prevalent in health care AI studies as many are single-setting approaches and we do not know the extent to which they can be translated to other settings or contexts. One solution to address the replication issue in AI studies has been the development of a growing number of AI quality standards. Most prominent are the reporting guidelines from the Enhancing the Quality and Transparency of Health Research (EQUATOR) network [ 15 ]. Examples include the CONSORT-AI (Consolidated Standards of Reporting Trials–Artificial Intelligence) extension for reporting AI clinical trials [ 16 ] and the SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials–Artificial Intelligence) extension for reporting AI clinical trial protocols [ 17 ]. Beyond the EQUATOR guidelines, there are also the Minimum Information for Medical AI Reporting standard [ 18 ] and the Minimum Information About Clinical Artificial Intelligence Modeling checklist [ 19 ] on the minimum information needed in published AI studies. These standards mainly focus on the methodological and reporting quality aspects of AI studies to ensure that the published information is rigorous, complete, and transparent.

Need for Health Care AI Standards

However, there is a shortcoming of standard-driven guidance that spans the entire AI life cycle spectrum of design, validation, implementation, and governance. The World Health Organization has published six ethical principles to guide the use of AI [ 20 ] that cover (1) protecting human autonomy; (2) promoting human well-being and safety and the public interest; (3) ensuring transparency, explainability, and intelligibility; (4) fostering responsibility and accountability; (5) ensuring inclusiveness and equity; and (6) promoting AI that is responsive and sustainable. In a scoping review, Solanki et al [ 21 ] operationalized health care AI ethics through a framework of 6 guidelines that spans the entire AI life cycle of data management, model development, deployment, and monitoring. The National Health Service England has published a best practice guide on health care AI on how to get it right that encompasses a governance framework, addressing data access and protection issues, spreading the good innovation, and monitoring uses over time [ 22 ]. To further promote the quality of health care AI, van de Sande et al [ 23 ] have proposed a step-by-step approach with specific AI quality criteria that span the entire AI life cycle from development and implementation to governance.

Despite the aforementioned principles, frameworks, and guidance, there is still widespread variation in the quality of published AI studies in the health care literature. For example, 2 systematic reviews of 152 prediction and 28 diagnosis studies have shown poor methodological and reporting quality that have made it difficult to replicate, assess, and interpret the study findings [ 24 , 25 ]. The recent shifts beyond study quality to broader ethical, equity, and regulatory issues have also raised additional challenges for AI practitioners and researchers on the impact, transparency, trustworthiness, and accountability of the AI studies involved [ 13 , 26 - 28 ]. Increasingly, we are also seeing reports of various types of AI implementation issues [ 2 ]. There is a growing gap between the expected quality and performance of health care AI that needs to be addressed. We suggest that the overall issue is a lack of awareness and of the use of principles, frameworks, and guidance in health care AI studies.

This rapid umbrella review addressed the aforementioned issues by focusing on the principles and frameworks for health care AI design, implementation, and governance. We analyzed and synthesized the use of AI quality standards as reported in a sample of published health care AI systematic review articles. In this paper, AI quality standards are defined as guidelines, criteria, checklists, statements, guiding principles, or framework components used to evaluate the quality of health care AI studies in different domains and life cycle stages. In this context, quality covers the trustworthiness, methodological, reporting, and technical aspects of health care AI studies. Domains refer to the disciplines, branches, or areas in which AI can be found or applied, such as computer science, medicine, and robotics. The findings from this review can help address the growing need for AI practitioners and researchers to navigate the increasingly complex landscape of AI quality standards to plan, conduct, evaluate, and report health care AI studies.

With the increasing volume of systematic review articles that appear in the health care literature each year, an umbrella review has become a popular and timely approach to synthesize knowledge from published systematic reviews on a given topic. For this paper, we drew on the umbrella review method in the typology of systematic reviews for synthesizing evidence in health care by MacEntee [ 29 ]. In this typology, umbrella reviews are used to synthesize multiple systematic reviews from different sources into a summarized form to address a specific topic. We used a modified version of the Joanna Briggs Institute (JBI) umbrella review method to tailor the process, including developing of an umbrella review protocol, applying a rapid approach, and eliminating duplicate original studies [ 30 ]. Our rapid approach was informed by the practical guide to conducting rapid reviews in the areas of database selection, topic refinement, searching, study selection, data extraction, and synthesis by Tricco et al [ 31 ]. A PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram of our review process is shown in Figure 1 [ 32 ]. A PRISMA checklist is provided in Multimedia Appendix 1 [ 32 ].

medical research ai

Objective and Questions

The objective of this rapid umbrella review was to examine the use of AI quality standards based on a sample of published health care AI systematic reviews. Specifically, our questions were as follows:

  • What AI quality standards have been applied to evaluate the quality of health care AI studies?
  • What key quality standard–related issues are noted in these reviews?
  • What guidance can be offered to improve the quality of health care AI studies through the incorporation of AI quality standards?

Search Strategy

Our search strategy focused on the MEDLINE database supplemented with Google Scholar. Our search terms consisted of “artificial intelligence” or “AI,” “health,” and “systematic review” mentioned in the abstract (refer to Multimedia Appendix 2 for the search strings used). We used the .TW search field tag as it searches on title and abstract as well as fields such as abstract, Medical Subject Heading terms, and Medical Subject Heading subheadings. Our rationale to limit the search to MEDLINE with simple terms was to keep the process manageable, recognizing the huge volume of health care AI–related literature reviews that have appeared in the last few years, especially on COVID-19. One author conducted the MEDLINE and Google Scholar searches with assistance from an academic librarian. For Google Scholar, we restricted the search to the first 100 citations returned.

Inclusion Criteria

We considered all English-language systematic review articles published over a 36-month period from January 1, 2020, to December 31, 2022. The review could be any type of systematic review, meta-analysis, narrative review, qualitative review, scoping review, meta-synthesis, realist review, or umbrella review as defined in the review typology by MacEntee [ 29 ]. The overarching inclusion criteria were AI and health as the focus. To be considered for inclusion, the review articles must meet the following criteria:

  • Each original study in the review is described, where an AI approach in the form of a model, method, algorithm, technique, or intervention is proposed, designed, implemented, or evaluated within a health care context to address a particular health care problem or topic area.
  • We define AI as a simulation of the approximation of human intelligence in machines that comprises learning, reasoning, and logic [ 33 ]. In that approximation, AI has different levels of adaptivity and autonomy. Weak AI requires supervision or reinforced learning with human intervention to adapt to the environment, with low autonomous interaction. Strong AI is highly adaptive and highly autonomous via unsupervised learning, with no human intervention.
  • We looked through all the articles, and our health care context categorization was informed by the stated settings (eg, hospital) and purpose (eg, diagnosis) mentioned in the included reviews.
  • The review can include all types of AI approaches, such as ML, NLP, speech recognition, prediction models, neural networks, intelligent robotics, and AI-assisted and automated medical devices.
  • The review must contain sufficient detail on the original AI studies, covering their objectives, contexts, study designs, AI approaches, measures, outcomes, and reference sources.

Exclusion Criteria

We excluded articles if any one of the following applied:

  • Review articles published before January 1, 2020; not accessible in web-based format; or containing only an abstract
  • Review articles in languages other than English
  • Earlier versions of the review article with the same title or topic by the same authors
  • Context not health care–related, such as electronic commerce or smart manufacturing
  • The AI studies not containing sufficient detail on their purpose, features, or reference sources
  • Studies including multiple forms of digital health technologies besides AI, such as telehealth, personal health records, or communication tools

Review Article Selection

One author conducted the literature searches and retrieved the citations after eliminating duplicates. The author then screened the citation titles and abstracts against the inclusion and exclusion criteria. Those that met the inclusion criteria were retrieved for full-text review independently by 2 other authors. Any disagreements in final article selection were resolved through consensus between the 2 authors or with a third author. The excluded articles and the reasons for their exclusion were logged.

Quality Appraisal

In total, 2 authors applied the JBI critical appraisal checklist independently to appraise the quality of the selected reviews [ 30 ]. The checklist has 11 questions that allow for yes , no , unclear , or not applicable as the response. The questions cover the areas of review question, inclusion criteria, search strategy and sources, appraisal criteria used, use of multiple reviewers, methods of minimizing data extraction errors and combining studies, publication bias, and recommendations supported by data. The reviews were ranked as high, medium, and low quality based on their JBI critical appraisal score (≥0.75 was high quality, ≥0.5 and <0.75 was medium quality, and <0.5 was low quality). All low-quality reviews were excluded from the final synthesis.

Data Extraction

One author extracted data from selected review articles using a predefined template. A second author validated all the articles for correctness and completeness. As this review was focused on AI quality standards, we extracted data that were relevant to this topic. We created a spreadsheet template with the following data fields to guide data extraction:

  • Author, year, and reference: first author last name, publication year, and reference number
  • URL: the URL where the review article can be found
  • Objective or topic: objective or topic being addressed by the review article
  • Type: type of review reported (eg, systematic review, meta-analysis, or scoping review)
  • Sources: bibliographic databases used to find the primary studies reported in the review article
  • Years: period of the primary studies covered by the review article
  • Studies: total number of primary studies included in the review article
  • Countries: countries where the studies were conducted
  • Settings: study settings reported in the primary studies of the review article
  • Participants: number and types of individuals being studied as reported in the review article
  • AI approaches: the type of AI model, method, algorithm, technique, tool, or intervention described in the review article
  • Life cycle and design: the stage or design of the AI study in the AI life cycle in the primary studies being reported, such as requirements, design, implementation, monitoring, experimental, observational, training-test-validation, or controlled trial
  • Appraisal: quality assessment of the primary studies using predefined criteria (eg, risk of bias)
  • Rating: quality assessment results of the primary studies reported in the review article
  • Measures: performance criteria reported in the review article (eg, mortality, accuracy, and resource use)
  • Analysis: methods used to summarize the primary study results (eg, narrative or quantitative)
  • Results: aggregate findings from the primary studies in the review article
  • Standards: name of the quality standards mentioned in the review article
  • Comments: issues mentioned in the review article relevant to our synthesis

Removing Duplicate AI Studies

We identified all unique AI studies across the selected reviews after eliminating duplicates that appeared in them. We retrieved full-text articles for every tenth of these unique studies and searched for mention of AI quality standard–related terms in them. This was to ensure that all relevant AI quality standards were accounted for even if the reviews did not mention them.

Analysis and Synthesis

Our analysis was based on a set of recent publications on health care AI standards. These include (1) the AI life cycle step-by-step approach by van de Sande et al [ 23 ] with a list of AI quality standards as benchmarks, (2) the reporting guidelines by Shelmerdine et al [ 15 ] with specific standards for different AI-based clinical studies, (3) the international standards for evaluating health care AI by Wenzel and Wiegand [ 26 ], and (4) the broader requirements for trustworthy health care AI across the entire life cycle stages by the National Academy of Medicine (NAM) [ 8 ] and the European Union Commission (EUC) [ 34 ]. As part of the synthesis, we created a conceptual organizing scheme drawing on published literature on AI domains and approaches to visualize their relationships (via a Euler diagram) [ 35 ]. All analyses and syntheses were conducted by one author and then validated by another to resolve differences.

For the analysis, we (1) extracted key characteristics of the selected reviews based on our predefined template; (2) summarized the AI approaches, life cycle stages, and quality standards mentioned in the reviews; (3) extracted any additional AI quality standards mentioned in the 10% sample of unique AI studies from the selected reviews; and (4) identified AI quality standard–related issues reported.

For the synthesis, we (1) mapped the AI approaches to our conceptual organizing scheme, visualized their relationships with the AI domains and health topics found, and described the challenges in harmonizing these terms; (2) established key themes from the AI quality standard issues identified and mapped them to the NAM and EUC frameworks [ 8 , 34 ]; and (3) created a summary list of the AI quality standards found and mapped them to the life cycle phases by van de Sande et al [ 23 ].

Drawing on these findings, we proposed a set of guidelines that can enhance the quality of future health care AI studies and described its practice, policy, and research implications. Finally, we identified the limitations of this rapid umbrella review as caveats for the readers to consider. As health care, AI, and standards are replete with industry terminologies, we used the acronyms where they are mentioned in the paper and compiled an alphabetical acronym list with their spelled-out form at the end of the paper.

Summary of Included Reviews

We found 69 health care AI systematic review articles published between 2020 and 2022, of which 35 (51%) met the inclusion criteria. The included articles covered different review types, topics, settings, numbers of studies, designs, participants, AI approaches, and performance measures (refer to Multimedia Appendix 3 [ 36 - 68 ] for the review characteristics). We excluded the remaining 49% (34/69) of the articles because they (1) covered multiple technologies (eg, telehealth), (2) had insufficient detail, (3) were not specific to health care, or (4) were not in English (refer to Multimedia Appendix 4 for the excluded reviews and reasons). The quality of these reviews ranged from JBI critical appraisal scores of 1.0 to 0.36, with 49% (17/35) rated as high quality, 40% (14/35) rated as moderate quality, and 6% (2/35) rated as poor quality ( Multimedia Appendix 5 [ 36 - 68 ]). A total of 6% (2/35) of the reviews were excluded for their low JBI scores [ 69 , 70 ], leaving a sample of 33 reviews for the final synthesis.

Regarding review types, most (23/33, 70%) were systematic reviews [ 37 - 40 , 45 - 51 , 53 - 57 , 59 - 64 , 66 , 67 ], with the remaining being scoping reviews [ 36 , 41 - 44 , 52 , 58 , 65 , 68 ]. Only 3% (1/33) of the reviews were meta-analyses [ 38 ], and another was a rapid review [ 61 ]. Regarding health topics, the reviews spanned a wide range of specific health conditions, disciplines, areas, and practices. Examples of conditions were COVID-19 [ 36 , 37 , 49 , 51 , 56 , 62 , 66 ], mental health [ 48 , 65 , 68 ], infection [ 50 , 59 , 66 ], melanoma [ 57 ], and hypoglycemia [ 67 ]. Examples of disciplines were public health [ 36 , 37 , 56 , 66 ], nursing [ 42 , 43 , 61 ], rehabilitation [ 52 , 64 ], and dentistry [ 55 , 63 ]. Areas included mobile health and wearables [ 41 , 52 , 54 , 65 ], surveillance and remote monitoring [ 51 , 61 , 66 ], robotic surgeries [ 47 ], and biobanks [ 39 ]. Practices included diagnosis [ 37 , 47 , 49 , 58 , 59 , 62 ], prevention [ 47 ], prediction [ 36 , 38 , 49 , 50 , 57 ], disease management [ 41 , 46 , 47 , 58 ], and administration [ 42 ]. Regarding settings, less than half (12/33, 36%) were explicit in their health care settings, which included multiple sources [ 36 , 42 , 43 , 50 , 54 , 61 ], hospitals [ 45 , 49 ], communities [ 44 , 51 , 58 ], and social media groups [ 48 ]. The number of included studies ranged from 794 on COVID-19 [ 49 ] to 8 on hypoglycemia [ 67 ]. Regarding designs, most were performance assessment studies using secondary data sources such as intensive care unit [ 38 ], imaging [ 37 , 62 , 63 ], and biobank [ 39 ] databases. Regarding participants, they included patients, health care providers, educators, students, simulated cases, and those who use social media. Less than one-third of the reviews (8/33, 24%) mentioned sample sizes, which ranged from 11 adults [ 44 ] to 1,547,677 electronic medical records [ 40 ] (refer to Multimedia Appendix 3 for details).

Regarding AI approaches, there were >60 types of AI models, methods, algorithms, tools, and techniques mentioned in varying levels of detail across the broad AI domains of computer science, data science with and without NLP, and robotics. The main AI approaches were ML and deep learning (DL), with support vector machine, convolutional neural network, neural network, logistic regression, and random forest being mentioned the most (refer to the next section for details). The performance measures covered a wide range of metrics, such as diagnostic and prognostic accuracies (eg, sensitivity, specificity, accuracy, and area under the curve) [ 37 - 40 , 46 - 48 , 53 , 57 , 59 , 63 , 67 ], resource use (eg, whether an intensive care unit stay was necessary, length of stay, and cost) [ 37 , 58 , 62 ], and clinical outcomes (eg, COVID-19 severity, mortality, and behavior change) [ 36 , 37 , 49 , 56 , 62 , 65 ]. A few reviews (6/33, 18%) focused on the extent of the socioethical guidelines addressed [ 44 , 51 , 55 , 58 , 66 , 68 ]. Regarding life cycle stages, different schemes were applied, including preprocessing and classification [ 48 , 57 ], data preparation-preprocessing [ 37 , 38 ], different stages of adoption (eg, knowledge, persuasion, decision making, implementation) [ 44 ], conceptual research [ 42 ], model development [ 36 , 37 , 40 , 42 , 45 , 46 , 50 - 56 , 58 - 64 , 66 , 67 ], design [ 43 ], training and testing [ 38 , 42 , 45 , 50 - 53 , 58 , 61 - 64 ], validation [ 36 - 38 , 40 , 45 , 46 , 50 , 51 , 53 , 55 , 56 , 58 - 64 , 67 ], pilot trials [ 65 ], public engagement [ 68 ], implementation [ 42 , 44 , 60 - 62 , 66 , 68 ], confirmation [ 44 ], and evaluation [ 42 , 43 , 53 , 60 - 62 , 65 ] (refer to Multimedia Appendix 3 for details). It is worth noting that the period covered for our review did not include any studies on large language models (LLMs). LLM studies became more prevalent in the literature in the period just after our review.

Use of Quality Standards in Health Care AI Studies

To make sense of the different AI approaches mentioned, we used a Euler diagram [ 71 ] as a conceptual organizing scheme to visualize their relationships with AI domains and health topics ( Figure 2 [ 36 , 41 - 43 , 47 , 48 , 51 - 54 , 56 - 58 , 60 , 62 , 65 , 67 ]). The Euler diagram shows that AI broadly comprised approaches in the domains of computer science, data science with and without NLP, and robotics that could be overlapping. The main AI approaches were ML and DL, with DL being a more advanced form of ML through the use of artificial neural networks [ 33 ]. The diagram also shows that AI can exist without ML and DL (eg, decision trees and expert systems). There are also outliers in these domains with borderline AI-like approaches mostly intended to enhance human-computer interactions, such as social robotics [ 42 , 43 ], robotic-assisted surgery [ 47 ], and exoskeletons [ 54 ]. The health topics in our reviews spanned the AI domains, with most falling within data science with or without NLP. This was followed by computer science mostly for communication or database and other functional support and robotics for enhanced social interactions that may or may not be AI driven. There were borderline AI approaches such as programmed social robotics [ 42 , 43 ] or AI-enhanced social robots [ 54 ]. These approaches focus on AI enabled social robotic programming and did not use ML or DL. Borderline AI approaches also included virtual reality [ 60 ] and wearable sensors [ 65 , 66 , 68 ].

Regarding AI life cycle stages, we harmonized the different terms used in the original studies by mapping them to the 5 life cycle phases by van de Sande et al [ 23 ]: 0 (preparation), I (model development), II (performance assessment), III (clinical testing), and IV (implementation). Most AI studies in the reviews mapped to the first 3 life cycle phases by van de Sande et al [ 23 ]. These studies would typically describe the development and performance of the AI approach on a given health topic in a specific domain and setting, including their validation, sometimes done using external data sets [ 36 , 38 ]. A small number of reviews reported AI studies that were at the clinical testing phase [ 60 , 61 , 66 , 68 ]. A total of 7 studies were described as being in the implementation phase [ 66 , 68 ]. On the basis of the descriptions provided, few of the AI approaches in the studies in the AI reviews had been adopted for routine use in clinical settings [ 66 , 68 ] with quantifiable improvements in health outcomes (refer to Multimedia Appendix 6 [ 36 - 68 ] for details).

Regarding AI quality standards, only 39% (13/33) of the reviews applied specific AI quality standards in their results [ 37 - 40 , 45 , 46 , 50 , 54 , 58 , 59 , 61 , 63 , 66 ], and 12% (4/33) mentioned the need for standards [ 55 , 63 , 68 ]. These included the Prediction Model Risk of Bias Assessment Tool [ 37 , 38 , 58 , 59 ], Newcastle-Ottawa Scale [ 39 , 50 ], Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modeling Studies [ 38 , 59 ], Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis–Machine Learning Extension [ 50 ], levels of evidence [ 61 ], Critical Appraisal Skills Program Clinical Prediction Rule Checklist [ 40 ], Mixed Methods Appraisal Tool [ 66 ], and CONSORT-AI [ 54 ]. Another review applied 7 design justice principles as the criteria to appraise the quality of their AI studies [ 68 ]. There were also broader-level standards mentioned. These included the European Union ethical guidelines for trustworthy AI [ 44 ]; international AI standards from the International Organization for Standardization (ISO); and AI policy guidelines from the United States, Russia, and China [ 46 ] (refer to Multimedia Appendix 6 for details). We updated the Euler diagram ( Figure 2 [ 36 , 41 - 43 , 47 , 48 , 51 - 54 , 56 - 58 , 60 , 62 , 65 , 67 ]) to show in red the health topics in reviews with no mention of specific AI standards.

medical research ai

Of the 178 unique original AI studies from the selected reviews that were examined, only 25 (14%) mentioned the use of or need for specific AI quality standards (refer to Multimedia Appendix 7 [ 36 - 68 ] for details). They were of six types: (1) reporting—COREQ (Consolidated Criteria for Reporting Qualitative Research), Strengthening the Reporting of Observational Studies in Epidemiology, Standards for Reporting Diagnostic Accuracy Studies, PRISMA, and EQUATOR; (2) data—Unified Medical Language System, Food and Drug Administration (FDA) Adverse Event Reporting System, MedEx, RxNorm, Medical Dictionary for Regulatory Activities, and PCORnet; (3) technical—ISO-12207, FDA Software as a Medical Device, EU-Scholarly Publishing and Academic Resources Coalition, Sensor Web Enablement, Open Geospatial Consortium, Sensor Observation Service, and the American Medical Association AI recommendations; (4) robotics—ISO-13482 and ISO and TC-299; (5) ethics—Helsinki Declaration and European Union AI Watch; and (6) regulations—Health Insurance Portability and Accountability Act (HIPAA) and World Health Organization World Economic Forum. These standards were added to the list of AI quality standards mentioned by review in Multimedia Appendix 6 .

A summary of the harmonized AI topics, approaches, domains, the life cycle phases by van de Sande et al [ 23 ], and quality standards derived from our 33 reviews and 10% of unique studies within them is shown in Table 1 .

a Borderline AI approaches in the AI domains are identified with (x) .

b Italicized entries are AI quality standards mentioned only in the original studies in the reviews.

c CNN: convolutional neural network.

d SVM: support vector machine.

e RF: random forest.

f DT: decision tree.

g LoR: logistic regression.

h NLP: natural language processing.

i Phase 0: preparation before model development; phase I: AI model development; phase II: assessment of AI performance and reliability; phase III: clinical testing of AI; and phase IV: implementing and governing AI.

j AB: adaptive boosting or adaboost.

k ARMED: attribute reduction with multi-objective decomposition ensemble optimizer.

l BE: boost ensembling.

m BNB: Bernoulli naïve Bayes.

n PROBAST: Prediction Model Risk of Bias Assessment Tool.

o TRIPOD: Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis.

p FDA-SaMD: Food and Drug Administration–Software as a Medical Device.

q STROBE: Strengthening the Reporting of Observational Studies in Epidemiology.

r ICU: intensive care unit.

s ANN-ELM: artificial neural network extreme learning machine.

t ELM: ensemble machine learning.

u LSTM: long short-term memory.

v ESICULA: super intensive care unit learner algorithm.

w CHARMS: Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modeling Studies.

x SFCN: sparse fully convolutional network.

y NOS: Newcastle-Ottawa scale.

z ANN: artificial neural network.

aa EN: elastic net.

ab GAM: generalized additive model.

ac CASP: Critical Appraisal Skills Programme.

ad mHealth: mobile health.

ae DL: deep learning.

af FL: federated learning.

ag ML: machine learning.

ah SAR: socially assistive robot.

ai CDSS: clinical decision support system.

aj COREQ: Consolidated Criteria for Reporting Qualitative Research.

ak ISO: International Organization for Standardization.

al EU-SPARC: Scholarly Publishing and Academic Resources Coalition Europe.

am AMS: Associated Medical Services.

an BICMM: Bayesian independent component mixture model.

ao BNC: Bayesian network classifier.

ap C4.5: a named algorithm for creating decision trees.

aq CPH: Cox proportional hazard regression.

ar IEC: international electrotechnical commission.

as NIST: National Institute of Standards and Technology.

at OECD-AI: Organisation for Economic Co-operation and Development–artificial intelligence.

au AUC: area under the curve.

av BCP-NN: Bayesian classifier based on propagation neural network.

aw BCPNN: Bayesian confidence propagation neural network.

ax BNM: Bayesian network model.

ay TRIPOD-ML: Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis–Machine Learning.

az FAERS: Food and Drug Administration Adverse Event Reporting System.

ba MedDRA: Medical Dictionary for Regulatory Activities.

bb MADE1.0: Medical Artificial Intelligence Data Set for Electronic Health Records 1.0.

bc ANFIS: adaptive neuro fuzzy inference system.

bd EML: ensemble machine learning.

be cTAKES: clinical Text Analysis and Knowledge Extraction System.

bf CUI: concept unique identifier.

bg KM: k-means clustering.

bh UMLS: Unified Medical Language System.

bi 3DQI: 3D quantitative imaging.

bj ACNN: attention-based convolutional neural network.

bk LASSO: least absolute shrinkage and selection operator.

bl MCRM: multivariable Cox regression model.

bm MLR: multivariate linear regression.

bn CNN-TF: convolutional neural network using Tensorflow.

bo IRRCN: inception residual recurrent convolutional neural network.

bp IoT: internet of things.

bq NVHDOL: notal vision home optical-based deep learning.

br HIPAA: Health Insurance Portability and Accountability Act.

bs BC: Bayesian classifier.

bt EM: ensemble method.

bu PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses.

bv RCT: randomized controlled trial.

bw ROBINS-I: Risk of Bias in Non-Randomised Studies of Interventions.

bx DSP: deep supervised learning.

by NN: neural network.

bz SPIRIT: Standard Protocol Items: Recommendations for Interventional Trials.

ca ABS: agent based simulation.

cb LiR: linear regression.

cc TOPSIS: technique for order of preference by similarity to ideal solution.

cd ABC: artificial bee colony.

ce DCNN: deep convolutional neural network.

cf AL: abductive learning.

cg AR: automated reasoning.

ch BN: Bayesian network.

ci COBWEB: a conceptual clustering algorithm.

cj CH: computer heuristic.

ck AR-HMM: auto-regressive hidden Markov model.

cl MLoR: multivariate logistic regression.

cm ITS: intelligent tutoring system.

cn AMA: American Medical Association.

co APS: automated planning and scheduling.

cp ES: expert system.

cq SWE: software engineering.

cr OGC: open geospatial consortium standard.

cs SOS: start of sequence.

ct BiGAN: bidirectional generative adversarial network.

cu ADA-NN: adaptive dragonfly algorithms with neural network.

cv F-CNN: fully convolutional neural network.

cw FFBP-ANN: feed-forward backpropagation artificial neural network.

cx AFM: adaptive finite state machine.

cy ATC: anatomical therapeutic chemical.

cz AFC: active force control.

da FDA: Food and Drug Administration.

db MMAT: Mixed Methods Appraisal Tool.

dc STARD: Standards for Reporting of Diagnostic Accuracy Study.

dd VR: virtual reality.

de EU: European Union.

df EQUATOR: Enhancing the Quality and Transparency of Health Research.

dg WHO-WEF: World Health Organization World Economic Forum.

dh CCC: concordance correlation coefficient.

di IEEE: Institute of Electrical and Electronics Engineers.

There were also other AI quality standards not mentioned in the reviews or their unique studies. They included guidelines such as the do no harm road map, Factor Analysis of Information Risk, HIPAA, and the FDA regulatory framework mentioned by van de Sande et al [ 23 ]; AI clinical study reporting guidelines such as Clinical Artificial Intelligence Modeling and Minimum Information About Clinical Artificial Intelligence Modeling mentioned by Shelmerdine et al [ 15 ]; and the international technical AI standards such as ISO and International Electrotechnical Commission 22989, 23053, 23894, 24027, 24028, 24029, and 24030 mentioned by Wenzel and Wiegand [ 26 ].

With these additional findings, we updated the original table of AI standards in the study by van de Sande et al [ 23 ] showing crucial steps and key documents by life cycle phase ( Table 2 ).

a Italicized references are original studies cited in the reviews, and references denoted with the footnote t are those cited in our paper but not present in any of the reviews.

b AI: artificial intelligence.

c FDA: Food and Drug Administration.

d ECLAIR: Evaluate Commercial AI Solutions in Radiology.

e FHIR: Fast Healthcare Interoperability Resources.

f FAIR: Findability, Accessibility, Interoperability, and Reusability.

g PROBAST: Prediction Model Risk of Bias Assessment Tool.

h HIPAA: Health Insurance Portability and Accountability Act.

i OOTA: Office of The Assistant Secretary.

j GDPR: General Data Protection Regulation.

k EU: European Union.

l WMA: World Medical Association.

m WEF: World Economic Forum.

n SORMAS: Surveillance, Outbreak Response Management and Analysis System.

o WHO: World Health Organization.

p ML: machine learning.

q TRIPOD: Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis.

r TRIPOD-ML: Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis—Machine Learning.

s CLAIM: Checklist for Artificial Intelligence in Medical Imaging.

t References denoted with the footnote t are those cited in our paper but not present in any of the reviews.

u CHARMS: Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modeling Studies.

v PRISMA-DTA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses of Diagnostic Test Accuracy.

w MI-CLAIM: Minimum Information About Clinical Artificial Intelligence Modeling.

x MINIMAR: Minimum Information for Medical AI Reporting.

y NOS: Newcastle-Ottawa Scale.

z LOE: level of evidence.

aa MMAT: Mixed Methods Appraisal Tool.

ab CASP: Critical Appraisal Skills Programme.

ac STARD: Standards for Reporting of Diagnostic Accuracy Studies.

ad COREQ: Consolidated Criteria for Reporting Qualitative Research.

ae MADE1.0: Model Agnostic Diagnostic Engine 1.0.

af DECIDE-AI: Developmental and Exploratory Clinical Investigations of Decision-Support Systems Driven by Artificial Intelligence.

ag SPIRIT-AI: Standard Protocol Items: Recommendations for Interventional Trials–Artificial Intelligence.

ah CONSORT-AI: Consolidated Standards of Reporting Trials–Artificial Intelligence.

ai RoB 2: Risk of Bias 2.

aj ROBINS-I: Risk of Bias in Non-Randomised Studies of Interventions.

ak RCT: randomized controlled trial.

al STROBE: Strengthening the Reporting of Observational Studies in Epidemiology.

am AI-ML: artificial intelligence–machine learning.

an TAM: Technology Acceptance Model.

ao SaMD: Software as a Medical Device.

ap IMDRF: International Medical Device Regulators Forum.

aq EQUATOR: Enhancing the Quality and Transparency of Health Research.

ar NIST: National Institute of Standards and Technology.

as OECD: Organisation for Economic Co-operation and Development.

at AMA: American Medical Association.

au CCC: Computing Community Consortium.

av ISO: International Organization for Standardization.

aw IEEE: Institute of Electrical and Electronics Engineers.

ax OGC: Open Geospatial Consortium.

ay SWE: Sensor Web Enablement.

az SOS: Sensor Observation Service.

ba IEC: International Electrotechnical Commission.

bb FAERS: Food and Drug Administration Adverse Event Reporting System.

bc MedDRA: Medical Dictionary for Regulatory Activities.

bd UMLS: Unified Medical Language System.

be R&D: research and development.

bf SPARC: Scholarly Publishing and Academic Resources Coalition.

bg TC: technical committee.

Quality Standard–Related Issues

We extracted a set of AI quality standard–related issues from the 33 reviews and assigned themes based on keywords used in the reviews ( Multimedia Appendix 8 [ 36 - 68 ]). In total, we identified 23 issues, with the most frequently mentioned ones being clinical utility and economic benefits (n=10); ethics (n=10); benchmarks for data, model, and performance (n=9); privacy, security, data protection, and access (n=8); and federated learning and integration (n=8). Table 3 shows the quality standard issues by theme from the 33 reviews. To provide a framing and means of conceptualizing the quality-related issues, we did a high-level mapping of the issues to the AI requirements proposed by the NAM [ 8 ] and EUC [ 20 ]. The mapping was done by 2 of the authors, with the remaining authors validating the results. Final mapping was the result of consensus across the authors ( Table 4 ).

a AI: artificial intelligence.

b SDOH: social determinants of health.

a B5-1: key considerations in model development; T6-2: key considerations for institutional infrastructure and governance; and T6-3: key artificial intelligence tool implementation concepts, considerations, and tasks.

b 1—human agency and oversight; 2—technical robustness and safety; 3—privacy and data governance; 4—transparency; 5—diversity, nondiscrimination, and fairness; 6—societal and environmental well-being; and 7—accountability.

c N/A: not applicable.

d Themes not addressed.

e SDOH: social determinants of health.

We found that all 23 quality standard issues were covered in the AI frameworks by the NAM and EUC. Both frameworks have a detailed set of guidelines and questions to be considered at different life cycle stages of the health care AI studies. While there was consistency in the mapping of the AI issues to the NAM and EUC frameworks, there were some differences across them. Regarding the NAM, the focus was on key aspects of AI model development, infrastructure and governance, and implementation tasks. Regarding the EUC, the emphasis was on achieving trustworthiness by addressing all 7 interconnected requirements of accountability; human agency and oversight; technical robustness and safety; privacy and data governance; transparency; diversity, nondiscrimination, and fairness; and societal and environmental well-being. The quality standard issues were based on our analysis of the review articles, and our mapping was at times more granular than the issues from the NAM and EUC frameworks. However, our results showed that the 2 frameworks do provide sufficient terminology for quality standard–related issues. By embracing these guidelines, one can enhance the buy-in and adoption of the AI interventions in the health care system.

Principal Findings

Overall, we found that, despite the growing number of health care AI quality standards in the literature, they are seldom applied in practice, as is shown in a sample of recently published systematic reviews of health care AI studies. Of the reviews that mentioned AI quality standards, most were used to ensure the methodological and reporting quality of the AI studies involved. At the same time, the reviews identified many AI quality standard–related issues, including those broader in nature, such as ethics, regulations, transparency, interoperability, safety, and governance. Examples of broader standards mentioned in a handful of reviews or original studies are the ISO-12207, Unified Medical Language System, HIPAA, FDA Software as a Medical Device, World Health Organization AI governance, and American Medical Association augmented intelligence recommendations. These findings reflect the evolving nature of health care AI, which has not yet reached maturity or been widely adopted. There is a need to apply appropriate AI quality standards to demonstrate the transparency, robustness, and benefits of these AI approaches in different AI domains and health topics while protecting the privacy, safety, and rights of individuals and society from the potential unintended consequences of such innovations.

Another contribution of our study was a conceptual reframing for a systems-based perspective to harmonize health care AI. We did not look at AI studies solely as individual entities but rather as part of a bigger system that includes clinical, organizational, and societal aspects. Our findings complement those of recent publications, such as an FDA paper that advocates for a need to help people understand the broader system of AI in health care, including across different clinical settings [ 72 ]. Moving forward, we advocate for AI research that looks at how AI approaches will mature over time. AI approaches evolve through different phases of maturity as they move from development to validation to implementation. Each phase of maturity has different requirements [ 23 ] that must be assessed as part of evaluating AI approaches across domains as the number of health care applications rapidly increases [ 73 ]. However, comparing AI life cycle maturity across studies was challenging as there were a variety of life cycle terms used across the reviews, making it hard to compare life cycle maturity in and across studies. To address this issue, we provided a mapping of life cycle terms from the original studies but also used the system life cycle phases by van de Sande et al [ 23 ] as a common terminology for AI life cycle stages. A significant finding from the mapping was that most AI studies in our selected reviews were still at early stages of maturity (ie, model preparation, development, or validation), with very few studies progressing to later phases of maturity such as clinical testing and implementation. If AI research in health systems is to evolve, we need to move past single-case studies with external data validation to studies that achieve higher levels of life cycle maturity, such as clinical testing and implementation over a variety of routine health care settings (eg, hospitals, clinics, and patient homes and other community settings).

Our findings also highlighted that there are many AI approaches and quality standards used across domains in health care AI studies. To better understand their relationships and the overall construct of the approach, our applied conceptual organizing scheme for harmonized health care characterizes AI studies according to AI domains, approaches, health topics, life cycle phases, and quality standards. The health care AI landscape is complex. The Euler diagram shows multiple AI approaches in one or more AI domains for a given health topic. These domains can overlap, and the AI approaches can be driven by ML, DL, or other types (eg, decision trees, robotics). This complexity is expected to increase as the number of AI approaches and range of applications across all health topics and settings grows over time. For meaningful comparison, we need a harmonized scheme such as the one described in this paper to make sense of the multitude of AI terminology for the types of approaches reported in the health care AI literature. The systems-based perspective in this review provides the means for harmonizing AI life cycles and incorporating quality standards through different maturity stages, which could help advance health care AI research by scaling up to clinical validation and implementation in routine practice. Furthermore, we need to move toward explainable AI approaches where applications are based on clinical models if we are to move toward later stages of AI maturity in health care (eg, clinical validation, and implementation) [ 74 ].

Proposed Guidance

To improve the quality of future health care AI studies, we urge AI practitioners and researchers to draw on published health care AI quality standard literature, such as those identified in this review. The type of quality standards to be considered should cover the trustworthiness, methodological, reporting, and technical aspects. Examples include the NAM and EUC AI frameworks that address trustworthiness and the EQUATOR network with its catalog of methodological and reporting guidelines identified in this review. Also included are the Minimum Information for Medical AI Reporting guidelines and technical ISO standards (eg, robotics) that are not in the EQUATOR. Components that should be standardized are the AI ethics, approaches, life cycle stages, and performance measures used in AI studies to facilitate their meaningful comparison and aggregation. The technical standards should address such key design features as data, interoperability, and robotics. Given the complexities of the different AI approaches involved, rather than focusing on the underlying model or algorithm design, one should compare their actual performance based on life cycle stages (eg, degree of accuracy in model development or assessment vs outcome improvement in implementation). The summary list of the AI quality standards described in this paper is provided in Multimedia Appendix 9 for those wishing to apply them in future studies.

Implications

Our review has practice, policy, and research implications. For practice, better application of health care AI quality standards could help AI practitioners and researchers become more confident regarding the rigor and transparency of their health care AI studies. Developers adhering to standards may help make AI approaches in domains less of a black box and reduce unintended consequences such as systemic bias or threats to patient safety. AI standards may help health care providers better understand, trust, and apply the study findings in relevant clinical settings. For policy, these standards can provide the necessary guidance to address the broader impacts of health care AI, such as the issues of data governance, privacy, patient safety, and ethics. For research, AI quality standards can help advance the field by improving the rigor, reproducibility, and transparency in the planning, design, conduct, reporting, and appraisal of health care AI studies. Standardization would also allow for the meaningful comparison and aggregation of different health care AI studies to expand the evidence base in terms of their performance impacts, such as cost-effectiveness, and clinical outcomes.

Limitations

Despite our best effort, this umbrella review has limitations. First, we only searched for peer-reviewed English articles with “health” and “AI” as the keywords in MEDLINE and Google Scholar covering a 36-month period. It is possible to have missed relevant or important reviews that did not meet our inclusion criteria. Second, some of the AI quality standards were only published in the last few years, at approximately the same time when the AI reviews were conducted. As such, it is possible for AI review and study authors to have been unaware of these standards or the need to apply them. Third, the AI standard landscape is still evolving; thus, there are likely standards that we missed in this review (eg, Digital Imaging and Communications in Medicine in pattern recognition with convolutional neural networks [ 75 ]). Fourth, the broader socioethical guidelines are still in the early stages of being refined, operationalized, and adopted. They may not yet be in a form that can be easily applied when compared with the more established methodological and reporting standards with explicit checklists and criteria. Fifth, our literature review did not include any literature reviews on LLMs [ 76 ], and we know there are reviews of LLMs published in 2023 and beyond. Nevertheless, our categorization of NLP could coincide with NLP and DL in our Euler diagram, and furthermore, LLMs could be used in health care via approved chatbot applications at an early life cycle phase, for example, using decision trees first to prototype the chatbot as clinical decision support [ 77 ] before advancing it in the mature phase toward a more robust AI solution in health care with LLMs. Finally, only one author was involved in screening citation titles and abstracts (although 2 were later involved in full-text review of all articles that were screened in), and there is the possibility that we erroneously excluded an article on the basis of title and abstract. Despite these limitations, this umbrella review provided a snapshot of the current state of knowledge and gaps that exist with respect to the use of and need for AI quality standards in health care AI studies.

Conclusions

Despite the growing number of AI standards to assess the quality of health care AI studies, they are seldom applied in practice. With the recent unveiling of broader ethical guidelines such as those of the NAM and EUC, more transparency and guidance in health care AI use are needed. The key contribution of this review was the harmonization of different AI quality standards that could help practitioners, developers, and users understand the relationships among AI domains, approaches, life cycles, and standards. Specifically, we advocate for common terminology on AI life cycles to enable comparison of AI maturity across stages and settings and ensure that AI research scales up to clinical validation and implementation.

Acknowledgments

CK acknowledges funding support from a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (RGPIN/04884-2019). The authors affirm that no generative artificial intelligence tools were used in the writing of this manuscript.

Authors' Contributions

CK contributed to conceptualization (equal), methodology (equal), data curation (equal), formal analysis (equal), investigation (equal), and writing—original draft (lead). DC contributed to conceptualization (equal), methodology (equal), data curation (equal), formal analysis (equal), investigation (equal), and visualization (equal). SM contributed to conceptualization (equal), methodology (equal), data curation (equal), formal analysis (equal), investigation (equal), and visualization (equal). MM contributed to conceptualization (equal), methodology (equal), data curation (equal), formal analysis (equal), and investigation (equal). FL contributed to conceptualization (equal), methodology (lead), data curation (lead), formal analysis (lead), investigation (equal), writing—original draft (equal), visualization (equal), project administration (lead), and supervision (lead).

Conflicts of Interest

None declared.

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) checklist.

PubMed search strings.

Characteristics of the included reviews.

List of excluded reviews and reasons.

Quality of the included reviews using Joanna Briggs Institute scores.

Health care artificial intelligence reviews by life cycle stage.

Quality standards found in 10% of unique studies in the selected reviews.

Quality standard–related issues mentioned in the artificial intelligence reviews.

Summary list of artificial intelligence quality standards.

  • Saleh L, Mcheick H, Ajami H, Mili H, Dargham J. Comparison of machine learning algorithms to increase prediction accuracy of COPD domain. In: Proceedings of the 15th International Conference on Enhanced Quality of Life and Smart Living. 2017. Presented at: ICOST '17; August 29-31, 2017:247-254; Paris, France. URL: https://doi.org/10.1007/978-3-319-66188-9_22 [ CrossRef ]
  • Gerke S, Minssen T, Cohen IG. Ethical and legal challenges of artificial intelligence-driven healthcare. Artif Intell Healthc. 2020:295-336. [ FREE Full text ] [ CrossRef ]
  • Čartolovni A, Tomičić A, Lazić Mosler E. Ethical, legal, and social considerations of AI-based medical decision-support tools: a scoping review. Int J Med Inform. May 2022;161:104738. [ CrossRef ] [ Medline ]
  • Das K, Cockerell CJ, Patil A, Pietkiewicz P, Giulini M, Grabbe S, et al. Machine learning and its application in skin cancer. Int J Environ Res Public Health. Dec 20, 2021;18(24):13409. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Elkin P, Mullin S, Mardekian J, Crowner C, Sakilay S, Sinha S, et al. Using artificial intelligence with natural language processing to combine electronic health record's structured and free text data to identify nonvalvular atrial fibrillation to decrease strokes and death: evaluation and case-control study. J Med Internet Res. Nov 09, 2021;23(11):e28946. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kawamoto A, Takenaka K, Okamoto R, Watanabe M, Ohtsuka K. Systematic review of artificial intelligence-based image diagnosis for inflammatory bowel disease. Dig Endosc. Nov 2022;34(7):1311-1319. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Anderson C, Bekele Z, Qiu Y, Tschannen D, Dinov ID. Modeling and prediction of pressure injury in hospitalized patients using artificial intelligence. BMC Med Inform Decis Mak. Aug 30, 2021;21(1):253. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Matheny M, Israni ST, Ahmed M, Whicher D. Artificial Intelligence in Health Care: The Hope, the Hype, the Promise, the Peril. Washington, DC. National Academy of Medicine; 2019.
  • Park Y, Jackson GP, Foreman MA, Gruen D, Hu J, Das AK. Evaluating artificial intelligence in medicine: phases of clinical research. JAMIA Open. Oct 2020;3(3):326-331. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Yang C, Kors JA, Ioannou S, John LH, Markus AF, Rekkas A, et al. Trends in the conduct and reporting of clinical prediction model development and validation: a systematic review. J Am Med Inform Assoc. Apr 13, 2022;29(5):983-989. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Van Calster B, Wynants L, Timmerman D, Steyerberg EW, Collins GS. Predictive analytics in health care: how can we know it works? J Am Med Inform Assoc. Dec 01, 2019;26(12):1651-1654. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. Oct 29, 2019;17(1):195. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Yin J, Ngiam KY, Teo HH. Role of artificial intelligence applications in real-life clinical practice: systematic review. J Med Internet Res. Apr 22, 2021;23(4):e25759. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Coiera E, Ammenwerth E, Georgiou A, Magrabi F. Does health informatics have a replication crisis? J Am Med Inform Assoc. Aug 01, 2018;25(8):963-968. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Shelmerdine SC, Arthurs OJ, Denniston A, Sebire NJ. Review of study reporting guidelines for clinical studies using artificial intelligence in healthcare. BMJ Health Care Inform. Aug 23, 2021;28(1):e100385. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Liu X, Rivera SC, Moher D, Calvert MJ, Denniston AK, SPIRIT-AI and CONSORT-AI Working Group. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. BMJ. Sep 09, 2020;370:m3164-m3148. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Rivera SC, Liu X, Chan AW, Denniston AK, Calvert MJ, SPIRIT-AI and CONSORT-AI Working Group. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. BMJ. Sep 09, 2020;370:m3210. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Hernandez-Boussard T, Bozkurt S, Ioannidis JP, Shah NH. MINIMAR (MINimum information for medical AI reporting): developing reporting standards for artificial intelligence in health care. J Am Med Inform Assoc. Dec 09, 2020;27(12):2011-2015. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Norgeot B, Quer G, Beaulieu-Jones BK, Torkamani A, Dias R, Gianfrancesco M, et al. Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist. Nat Med. Sep 2020;26(9):1320-1324. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Ethics and governance of artificial intelligence for health: WHO guidance. Licence: CC BY-NC-SA 3.0 IGO. World Health Organization. URL: https://apps.who.int/iris/bitstream/handle/10665/341996/9789240029200-eng.pdf [accessed 2024-04-05]
  • Solanki P, Grundy J, Hussain W. Operationalising ethics in artificial intelligence for healthcare: a framework for AI developers. AI Ethics. Jul 19, 2022;3(1):223-240. [ FREE Full text ] [ CrossRef ]
  • Joshi I, Morley J. Artificial intelligence: how to get it right: putting policy into practice for safe data-driven innovation in health and care. National Health Service. 2019. URL: https://transform.england.nhs.uk/media/documents/NHSX_AI_report.pdf [accessed 2024-04-05]
  • van de Sande D, Van Genderen ME, Smit JM, Huiskens J, Visser JJ, Veen RE, et al. Developing, implementing and governing artificial intelligence in medicine: a step-by-step approach to prevent an artificial intelligence winter. BMJ Health Care Inform. Feb 19, 2022;29(1):e100495. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Andaur Navarro CL, Damen JA, Takada T, Nijman SW, Dhiman P, Ma J, et al. Completeness of reporting of clinical prediction models developed using supervised machine learning: a systematic review. BMC Med Res Methodol. Jan 13, 2022;22(1):12. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Yusuf M, Atal I, Li J, Smith P, Ravaud P, Fergie M, et al. Reporting quality of studies using machine learning models for medical diagnosis: a systematic review. BMJ Open. Mar 23, 2020;10(3):e034568. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Wenzel MA, Wiegand T. Towards international standards for the evaluation of artificial intelligence for health. In: Proceedings of the 2019 ITU Kaleidoscope: ICT for Health: Networks, Standards and Innovation. 2019. Presented at: ITU K '19; December 4-6, 2019:1-10; Atlanta, GA. URL: https://ieeexplore.ieee.org/abstract/document/8996131 [ CrossRef ]
  • Varghese J. Artificial intelligence in medicine: chances and challenges for wide clinical adoption. Visc Med. Dec 2020;36(6):443-449. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Nyarairo M, Emami E, Abbasgholizadeh S. Integrating equity, diversity and inclusion throughout the lifecycle of artificial intelligence in health. In: Proceedings of the 13th Augmented Human International Conference. 2022. Presented at: AH '22; May 26-27, 2022:1-4; Winnipeg, MB. URL: https://dl.acm.org/doi/abs/10.1145/3532530.3539565 [ CrossRef ]
  • MacEntee MI. A typology of systematic reviews for synthesising evidence on health care. Gerodontology. Dec 06, 2019;36(4):303-312. [ CrossRef ] [ Medline ]
  • Aromataris E, Fernandez R, Godfrey C, Holly C, Khalil H, Tungpunkom P. Umbrella reviews. In: Aromataris E, Lockwood C, Porritt K, Pilla B, Jordan Z, editors. JBI Manual for Evidence Synthesis. Adelaide, South Australia. Joanna Briggs Institute; 2020.
  • Tricco AC, Langlois EV, Straus SE. Rapid reviews to strengthen health policy and systems: a practical guide. Licence CC BY-NC-SA 3.0 IGO. World Health Organization. 2017. URL: https://apps.who.int/iris/handle/10665/258698 [accessed 2024-04-05]
  • Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. Mar 29, 2021;372:n71. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Russell S, Norvig P. Artificial Intelligence: A Modern Approach. 4th edition. London, UK. Pearson Education; 2021.
  • Ethics guidelines for trustworthy AI. European Commission, Directorate-General for Communications Networks, Content and Technology. URL: https://data.europa.eu/doi/10.2759/346720 [accessed 2024-04-05]
  • Lloyd N, Khuman AS. AI in healthcare: malignant or benign? In: Chen T, Carter J, Mahmud M, Khuman AS, editors. Artificial Intelligence in Healthcare: Recent Applications and Developments. Singapore, Singapore. Springer; 2022:1-46.
  • Abd-Alrazaq A, Alajlani M, Alhuwail D, Schneider J, Al-Kuwari S, Shah Z, et al. Artificial intelligence in the fight against COVID-19: scoping review. J Med Internet Res. Dec 15, 2020;22(12):e20756. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Adamidi ES, Mitsis K, Nikita KS. Artificial intelligence in clinical care amidst COVID-19 pandemic: a systematic review. Comput Struct Biotechnol J. 2021;19:2833-2850. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Barboi C, Tzavelis A, Muhammad LN. Comparison of severity of illness scores and artificial intelligence models that are predictive of intensive care unit mortality: meta-analysis and review of the literature. JMIR Med Inform. May 31, 2022;10(5):e35293. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Battineni G, Hossain MA, Chintalapudi N, Amenta F. A survey on the role of artificial intelligence in biobanking studies: a systematic review. Diagnostics (Basel). May 09, 2022;12(5):1179. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Bertini A, Salas R, Chabert S, Sobrevia L, Pardo F. Using machine learning to predict complications in pregnancy: a systematic review. Front Bioeng Biotechnol. Jan 19, 2021;9:780389. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Bhatt P, Liu J, Gong Y, Wang J, Guo Y. Emerging artificial intelligence-empowered mHealth: scoping review. JMIR Mhealth Uhealth. Jun 09, 2022;10(6):e35053. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Buchanan C, Howitt ML, Wilson R, Booth RG, Risling T, Bamford M. Predicted influences of artificial intelligence on the domains of nursing: scoping review. JMIR Nurs. Dec 17, 2020;3(1):e23939. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Buchanan C, Howitt ML, Wilson R, Booth RG, Risling T, Bamford M. Predicted influences of artificial intelligence on nursing education: scoping review. JMIR Nurs. Jan 28, 2021;4(1):e23933. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Chew HS, Achananuparp P. Perceptions and needs of artificial intelligence in health care to increase adoption: scoping review. J Med Internet Res. Jan 14, 2022;24(1):e32939. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Choudhury A, Renjilian E, Asan O. Use of machine learning in geriatric clinical care for chronic diseases: a systematic literature review. JAMIA Open. Oct 2020;3(3):459-471. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Choudhury A, Asan O. Role of artificial intelligence in patient safety outcomes: systematic literature review. JMIR Med Inform. Jul 24, 2020;8(7):e18599. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Eldaly AS, Avila FR, Torres-Guzman RA, Maita K, Garcia JP, Serrano LP, et al. Artificial intelligence and lymphedema: state of the art. J Clin Transl Res. Jun 29, 2022;8(3):234-242. [ FREE Full text ] [ Medline ]
  • Le Glaz A, Haralambous Y, Kim-Dufor DH, Lenca P, Billot R, Ryan TC, et al. Machine learning and natural language processing in mental health: systematic review. J Med Internet Res. May 04, 2021;23(5):e15708. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Guo Y, Zhang Y, Lyu T, Prosperi M, Wang F, Xu H, et al. The application of artificial intelligence and data integration in COVID-19 studies: a scoping review. J Am Med Inform Assoc. Aug 13, 2021;28(9):2050-2067. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Hassan N, Slight R, Weiand D, Vellinga A, Morgan G, Aboushareb F, et al. Preventing sepsis; how can artificial intelligence inform the clinical decision-making process? A systematic review. Int J Med Inform. Jun 2021;150:104457. [ CrossRef ] [ Medline ]
  • Huang JA, Hartanti IR, Colin MN, Pitaloka DA. Telemedicine and artificial intelligence to support self-isolation of COVID-19 patients: recent updates and challenges. Digit Health. May 15, 2022;8:20552076221100634. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kaelin VC, Valizadeh M, Salgado Z, Parde N, Khetani MA. Artificial intelligence in rehabilitation targeting the participation of children and youth with disabilities: scoping review. J Med Internet Res. Nov 04, 2021;23(11):e25745. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Kirk D, Catal C, Tekinerdogan B. Precision nutrition: a systematic literature review. Comput Biol Med. Jun 2021;133:104365. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Loveys K, Prina M, Axford C, Domènec Ò, Weng W, Broadbent E, et al. Artificial intelligence for older people receiving long-term care: a systematic review of acceptability and effectiveness studies. Lancet Healthy Longev. Apr 2022;3(4):e286-e297. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Mörch CM, Atsu S, Cai W, Li X, Madathil SA, Liu X, et al. Artificial intelligence and ethics in dentistry: a scoping review. J Dent Res. Dec 01, 2021;100(13):1452-1460. [ CrossRef ] [ Medline ]
  • Payedimarri AB, Concina D, Portinale L, Canonico M, Seys D, Vanhaecht K, et al. Prediction models for public health containment measures on COVID-19 using artificial intelligence and machine learning: a systematic review. Int J Environ Res Public Health. Apr 23, 2021;18(9):4499. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Popescu D, El-Khatib M, El-Khatib H, Ichim L. New trends in melanoma detection using neural networks: a systematic review. Sensors (Basel). Jan 10, 2022;22(2):496. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Abbasgholizadeh Rahimi S, Légaré F, Sharma G, Archambault P, Zomahoun HT, Chandavong S, et al. Application of artificial intelligence in community-based primary health care: systematic scoping review and critical appraisal. J Med Internet Res. Sep 03, 2021;23(9):e29839. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Sahu P, Raj Stanly EA, Simon Lewis LE, Prabhu K, Rao M, Kunhikatta V. Prediction modelling in the early detection of neonatal sepsis. World J Pediatr. Mar 05, 2022;18(3):160-175. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Sapci AH, Sapci HA. Artificial intelligence education and tools for medical and health informatics students: systematic review. JMIR Med Educ. Jun 30, 2020;6(1):e19285. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Seibert K, Domhoff D, Bruch D, Schulte-Althoff M, Fürstenau D, Biessmann F, et al. Application scenarios for artificial intelligence in nursing care: rapid review. J Med Internet Res. Nov 29, 2021;23(11):e26522. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Syeda HB, Syed M, Sexton KW, Syed S, Begum S, Syed F, et al. Role of machine learning techniques to tackle the COVID-19 crisis: systematic review. JMIR Med Inform. Jan 11, 2021;9(1):e23811. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Talpur S, Azim F, Rashid M, Syed SA, Talpur BA, Khan SJ. Uses of different machine learning algorithms for diagnosis of dental caries. J Healthc Eng. Mar 31, 2022;2022:5032435-5032413. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Vélez-Guerrero MA, Callejas-Cuervo M, Mazzoleni S. Artificial intelligence-based wearable robotic exoskeletons for upper limb rehabilitation: a review. Sensors (Basel). Mar 18, 2021;21(6):2146. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Welch V, Wy TJ, Ligezka A, Hassett LC, Croarkin PE, Athreya AP, et al. Use of mobile and wearable artificial intelligence in child and adolescent psychiatry: scoping review. J Med Internet Res. Mar 14, 2022;24(3):e33560. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Zhao IY, Ma YX, Yu MW, Liu J, Dong WN, Pang Q, et al. Ethics, integrity, and retributions of digital detection surveillance systems for infectious diseases: systematic literature review. J Med Internet Res. Oct 20, 2021;23(10):e32328. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Zheng Y, Dickson VV, Blecker S, Ng JM, Rice BC, Melkus GD, et al. Identifying patients with hypoglycemia using natural language processing: systematic literature review. JMIR Diabetes. May 16, 2022;7(2):e34681. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Zidaru T, Morrow EM, Stockley R. Ensuring patient and public involvement in the transition to AI-assisted mental health care: a systematic scoping review and agenda for design justice. Health Expect. Aug 12, 2021;24(4):1072-1124. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Khan ZF, Alotaibi SR. Applications of artificial intelligence and big data analytics in m-Health: a healthcare system perspective. J Healthc Eng. 2020;2020:8894694. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Sarker S, Jamal L, Ahmed SF, Irtisam N. Robotics and artificial intelligence in healthcare during COVID-19 pandemic: a systematic review. Rob Auton Syst. Dec 2021;146:103902. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Herrmann H. The arcanum of artificial intelligence in enterprise applications: toward a unified framework. J Eng Technol Manag. Oct 2022;66:101716. [ CrossRef ]
  • Wu E, Wu K, Daneshjou R, Ouyang D, Ho DE, Zou J. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nat Med. Apr 05, 2021;27(4):582-584. [ CrossRef ] [ Medline ]
  • Bohr A, Memarzadeh K. The rise of artificial intelligence in healthcare applications. In: Bohr A, Memarzadeh K, editors. Artificial Intelligence in Healthcare. New York, NY. Academic Press; 2020:25-57.
  • Jacobson FL, Krupinski EA. Clinical validation is the key to adopting AI in clinical practice. Radiol Artif Intell. Jul 01, 2021;3(4):e210104. [ FREE Full text ] [ CrossRef ] [ Medline ]
  • Tran K, Bøtker JP, Aframian A, Memarzadeh K. Artificial intelligence for medical imaging. In: Bohr A, Memarzadeh K, editors. Artificial Intelligence in Healthcare. New York, NY. Academic Press; 2020:143-161.
  • Hassija V, Chamola V, Mahapatra A, Singal A, Goel D, Huang K, et al. Interpreting black-box models: a review on explainable artificial intelligence. Cogn Comput. Aug 24, 2023;16(1):45-74. [ FREE Full text ] [ CrossRef ]
  • Chrimes D. Using decision trees as an expert system for clinical decision support for COVID-19. Interact J Med Res. Jan 30, 2023;12:e42540. [ FREE Full text ] [ CrossRef ] [ Medline ]

Abbreviations

Edited by S Ma, T Leung; submitted 19.11.23; peer-reviewed by K Seibert, K Washington, X Yan; comments to author 21.03.24; revised version received 03.04.24; accepted 04.04.24; published 22.05.24.

©Craig E Kuziemsky, Dillon Chrimes, Simon Minshall, Michael Mannerow, Francis Lau. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 22.05.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

Development of Artificial Intelligence in Healthcare in Russia

  • First Online: 27 November 2021

Cite this chapter

medical research ai

  • A. Gusev 8 , 9 ,
  • S. Morozov 8 ,
  • G. Lebedev 10 , 11 ,
  • A. Vladzymyrskyy 8 , 10 ,
  • V. Zinchenko 8 ,
  • D. Sharova 8 ,
  • E. Akhmad 8 ,
  • D. Shutov 8 ,
  • R. Reshetnikov 8 , 12 ,
  • K. Sergunova 13 ,
  • S. Izraylit 14 ,
  • E. Meshkova 14 ,
  • M. Natenzon 15 , 16 &
  • A. Ignatev 17  

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 212))

1712 Accesses

1 Citations

Research and development in the field of artificial intelligence in Russia has been conducted for several decades. Amid a global increase in attention to this area, the Russian Federation has also developed and systematically implemented its own national strategy, which includes healthcare as a priority sector for the introduction of AI products. Government agencies in collaboration with the expert community and market are developing several key areas at once, including legal and technical regulation. Based on the IMDRF recommendations, software products created using AI technologies for application in the diagnostic and treatment process are considered in Russia as the Software as Medical Devices (SaMD). Over the last year, the Government of the Russian Federation, the Ministry of Health and Roszdravnadzor made many targeted changes to the current legislation to enable a state registration of software products based on AI technologies and their introduction to the market. More than 20 regions of the Russian Federation have launched various projects to implement AI technologies in the real clinical practice. Work is underway to create the first series of national technical standards to accelerate products’ development and create more trust from healthcare practitioners.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

medical research ai

Healthcare Systems and Artificial Intelligence: Focus on Challenges and the International Regulatory Framework

medical research ai

Artificial intelligence tools in clinical neuroradiology: essential medico-legal aspects

medical research ai

Artificial Intelligence in Healthcare: Doctors, Patients and Liabilities

A. Kuleshov, A. Ignatiev, A. Abramova, et al, Addressing AI ethics through codification, in International Conference Engineering Technologies and Computer Science (EnT) (2020), pp. 24–30

Google Scholar  

S. Gerke, T. Minssen, G. Cohen, Ethical and legal challenges of artificial intelligence-driven healthcare, in Artificial Intelligence in Healthcare (2020), pp. 295–336

E. Topol, Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again (Basic Books, New York, 2019)

Decree of the Government of the Russian Federation No. 2129-r, http://publication.pravo.gov.ru/Document/View/0001202008260005 . Accessed 19 Aug 2020

A.V. Gusev, S.P. Morozov, V.A. Kutichev et al., Legal regulation of artificial intelligence software in healthcare in the Russian Federation. Med. Technol. Assess. Choice 1 (43), 36–45 (2021)

Article   Google Scholar  

Decree of the President of the Russian Federation No. 490 On the Development of Artificial Intelligence in the Russian Federation and adoption of National Strategy for the Development of Artificial Intelligence over the Period up to the Year 2030, http://www.kremlin.ru/acts/bank/44731 . Accessed 10 Oct 2019

Decree of the President of the Russian Federation No. 204 On the National Goals and Strategic Tasks of the Development of the Russian Federation for the Period up to 2024, https://minenergo.gov.ru/view-pdf/11246/84473 . Accessed 7 May 2018

IMDRF/SaMD WG/N10:2013 Software as a Medical Device (SaMD): Key Definitions, http://www.imdrf.org/docs/imdrf/final/technical/imdrf-tech-131209-samd-key-definitions-140901.pdf . Accessed 9 Dec 2013

IMDRF/SaMD WG/N12:2014 Software as a Medical Device: Possible Framework for Risk Categorization and Corresponding Considerations. International Medical Device Regulators Forum, http://www.imdrf.org/docs/imdrf/final/technical/imdrf-tech-140918-samd-framework-risk-categorization-141013.pdf . Accessed 18 Sep 2014

IMDRF/SaMD WG/N23:2015 Software as a Medical Device (SaMD): Application of Quality Management System. International Medical Device Regulators Forum, http://www.imdrf.org/docs/imdrf/final/technical/imdrf-tech-151002-samd-qms.pdf . Accessed 2 Oct 2015

IMDRF/SaMD WG(PD1)/N41R3:2016 Software as a Medical Device (SaMD): Clinical Evaluation. International Medical Device Regulators Forum, http://www.imdrf.org/docs/imdrf/final/technical/imdrf-tech-170921-samd-n41-clinical-evaluation_1.pdf . Accessed 21 Sep 2017

Federal Law № 323 On the fundamentals of public health protection in the Russian Federation, http://base.garant.ru/57499516 . Accessed 21 Nov 2011

The Code of Administrative Offences of the Russian Federation, 195-FZ Article 6.28 Violation of the existing medical device rules, http://finansovyesovety.ru/statya6.28-koap-rf/ . Accessed 30 Dec 2001

Criminal Code, 63-FZ Article 238.1. The production, import or sale of falsified, inferior, or unregistered medicines or medical devices, as well as the circulation of unregistered falsified active additives containing pharmaceutical substances, http://finansovyesovety.ru/statya238.1-uk-rf/ . Accessed 13 Jun 1996

Informational letter № 02I-297/20 from Roszdravnadzor, https://roszdravnadzor.ru/medproducts/registration/documents/65752 . Accessed 13 Feb 2020

Order of the Russian Ministry of Health № 4n On approval of the nomenclature classification for medical devices, http://www.roszdravnadzor.ru/documents/121 . Accessed 6 Jun 2012

Order of the Ministry of Health No. 11n About approval of requirements to the content of technical and operational documentation of medical devices manufacturer, http://base.garant.ru/71626748/ . Accessed 19Jan 2017

Order of the Ministry of Health № 2n On approval of the procedure for the assessment of the compliance of medical devices in the form of technical trials, toxicological studies and clinical trials for medical device state registration purposes, http://base.garant.ru/70631448/ . Accessed 9 Jan 2014

Government Decree № 1906 On amendments to the rules for state registration of medical devices, https://www.garant.ru/products/ipo/prime/doc/74862496/ . Accessed 24 Nov 2020

Order of the Ministry of Health № 300n Approval of the requirements for medical institutions conducting clinical trials of medical devices and procedures of establishing compliance of medical institutions with these requirements, http://www.roszdravnadzor.ru/documents/106 . Accessed 16 May 2013

Order of the Ministry of health № 1386n On approval of the methodology for determining the amount of payment for examination and testing of medical devices for state registration of medical devices and the maximum amount of fees for examination and testing of medical devices for state registration of medical devices, http://docs.cntd.ru/document/902314476 . Accessed 22 Nov 2011

Government Decree № 1416 On approval of the state registration of medical products”, http://www.roszdravnadzor.ru/documents/121 . Accessed 27 Dec 2012

Order of Roszdravnadzor № 9260 On approval of the administrative regulations of the Federal Service for Supervision in Healthcare for the implementation of state control over the circulation of medical devices, http://publication.pravo.gov.ru/Document/View/0001202002200032 Accessed 9 Dec 2019

Government Decree № 633 On amending the rules for state registration of medical devices, http://government.ru/docs/32763/ . Accessed 31 May 2018

Order of the Ministry of Health № 1353n On approval of the procedure for medical product quality, efficacy and safety review with changes and amendments approved by the Order of the Ministry of Health № 303n, http://www.roszdravnadzor.ru/documents/115 . Accessed 3 Jun 2015

Decision of the Council of the Eurasian Economic Commission № 46 On the rules for registration and expert assessment of safety, quality and efficacy of medical devices, https://docs.eaeunion.org/docs/ru-ru/01510767/cncd_12072016_46 . Accessed 12Feb 2016

Guidelines on the procedure for conducting the examination of the quality, efficacy and safety of medical devices for the state registration, http://www.roszdravnadzor.ru/documents/34129

Guidelines on the procedure for conducting the examination of the quality, efficacy and safety of medical devices (in terms of software) for the state registration within the framework of the national system, https://roszdravnadzor.gov.ru/medproducts/documents/71502 . Accessed 12 Feb 2021

Government Decree № 615 On approval of the Rules for maintaining the state register of medical devices and organizations (individual entrepreneurs) engaged in the production and manufacture of medical devices, http://docs.cntd.ru/document/902353655 . Accessed 19 Jun 2012

Order of Roszdravnadzor№ 40-Pr/13 On approval of the form of registration certificate for a medical device, http://base.garant.ru/70329006/ . Accessed 16 Jan 2013

Decision of the Council of the Eurasian Economic Commission № 174 On approval of the rules for monitoring of safety, quality and efficacy of medical devices, https://docs.eaeunion.org/docs/ru-ru/01510767/cncd_12072016_46 . Accessed 22 Dec 2015

Order of the Russian Ministry of Health № 980n On approval of the Procedure monitoring of safety, quality and efficacy of medical devices, http://www.roszdravnadzor.ru/documents/121 . Accessed 15 Sep 2020

Order of the Ministry of Health № 1113n On approval of the Procedure for reporting by the subjects of circulation of medical devices every detection of side effects not specified in the package leaflet or the user manual of a medical device; about adverse reactions during its use; about the peculiarities of the interaction of medical devices between themselves; about the facts and circumstances that pose a threat to the life and health of citizens and medical workers during the use and operation of a medical device, http://publication.pravo.gov.ru/Document/View/0001202012070057 . Accessed 19 Oct 2020

Order of the Russian Ministry of Health № 980n On approval of the Procedure monitoring of safety, quality and efficacy of medical devices”, http://publication.pravo.gov.ru/Document/View/0001202011020039 . Accessed 15 Sep 2020

Order № 1732 On amendments to the program of national standardization for 2020, approved by Order № 2612 of the Federal Agency for Technical Regulation and Metrology, http://docs.cntd.ru/document/566051442 . Accessed 19 Oct 2020

On amending the Order of the Federal Agency for Technical Regulation and Metrology № 1732 On establishment of the technical committee for standardization ‘Artificial Intelligence’, http://docs.cntd.ru/document/564243465 . Accessed 25 Jul 2019

I. Korsakov, A. Gusev, T. Kuznetsova, et al., Deep and machine learning models to improve risk prediction of cardiovascular disease using data extraction from electronic health records. Euro. Heart J. 40 (1), (2019)

I. Korsakov, D. Gavrilov, L. Serova, et al., Adapting neural network models to predict 10-year CVD development based on regional data calibration. Euro. Heart J. 41 (2), (2020)

Download references

Author information

Authors and affiliations.

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies, Moscow Health Care Department, Moscow, Russia

A. Gusev, S. Morozov, A. Vladzymyrskyy, V. Zinchenko, D. Sharova, E. Akhmad, D. Shutov & R. Reshetnikov

K-Skai, LLC, Petrozavodsk, Republic of Karelia, Russia

I.M. Sechenov First Moscow State Medical University (Sechenov University), Moscow, Russia

G. Lebedev & A. Vladzymyrskyy

Federal Research Institute for Health Organization and Informatics, Moscow, Russia

Institute of Molecular Medicine, I.M. Sechenov First Moscow State Medical University, Moscow, Russia

R. Reshetnikov

National Research Center «Kurchatov Institute», Moscow, Russia

K. Sergunova

Non-Commercial Organization Foundation for Development of the Center Elaboration and Commercialization of New Technologies, the Skolkovo, Foundation, Moscow, Russia

S. Izraylit & E. Meshkova

National Telemedicine Agency, Research-and-Production Corporation, Moscow, Russia

M. Natenzon

Center for Big Data Storage and Analysis Technology, Center for Competence in Digital Economics, Moscow State University M.V. Lomonosov, Moscow, Russia

MGIMO University (Moscow State Institute of International Relations), Moscow, Russia

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to G. Lebedev .

Editor information

Editors and affiliations.

Institute for Intelligent Systems Research and Innovation, Deakin University, Waurn Ponds, VIC, Australia

Chee-Peng Lim

College of Information Science and Engineering, Ritsumeikan University, Shiga, Japan

Yen-Wei Chen

Royal Adelaide Hospital, Adelaide, SA, Australia

Ashlesha Vaidya

Avanti Institute of Cardiology, Nagpur, India

Charu Mahorkar

KES International, Shoreham-by-Sea, UK

Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Gusev, A. et al. (2022). Development of Artificial Intelligence in Healthcare in Russia. In: Lim, CP., Chen, YW., Vaidya, A., Mahorkar, C., Jain, L.C. (eds) Handbook of Artificial Intelligence in Healthcare. Intelligent Systems Reference Library, vol 212. Springer, Cham. https://doi.org/10.1007/978-3-030-83620-7_11

Download citation

DOI : https://doi.org/10.1007/978-3-030-83620-7_11

Published : 27 November 2021

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-83619-1

Online ISBN : 978-3-030-83620-7

eBook Packages : Intelligent Technologies and Robotics Intelligent Technologies and Robotics (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Alzheimer's disease & dementia
  • Arthritis & Rheumatism
  • Attention deficit disorders
  • Autism spectrum disorders
  • Biomedical technology
  • Diseases, Conditions, Syndromes
  • Endocrinology & Metabolism
  • Gastroenterology
  • Gerontology & Geriatrics
  • Health informatics
  • Inflammatory disorders
  • Medical economics
  • Medical research
  • Medications
  • Neuroscience
  • Obstetrics & gynaecology
  • Oncology & Cancer
  • Ophthalmology
  • Overweight & Obesity
  • Parkinson's & Movement disorders
  • Psychology & Psychiatry
  • Radiology & Imaging
  • Sleep disorders
  • Sports medicine & Kinesiology
  • Vaccination
  • Breast cancer
  • Cardiovascular disease
  • Chronic obstructive pulmonary disease
  • Colon cancer
  • Coronary artery disease
  • Heart attack
  • Heart disease
  • High blood pressure
  • Kidney disease
  • Lung cancer
  • Multiple sclerosis
  • Myocardial infarction
  • Ovarian cancer
  • Post traumatic stress disorder
  • Rheumatoid arthritis
  • Schizophrenia
  • Skin cancer
  • Type 2 diabetes
  • Full List »

share this!

May 24, 2024

This article has been reviewed according to Science X's editorial process and policies . Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

reputable news agency

AI might help spot breast cancer's spread without biopsy

by Dennis Thompson

AI might help spot breast cancer's spread without biopsy

New AI can help detect breast cancer that is spreading to other parts of the body, without the need for biopsies, a new study finds.

The AI analyzes MRI scans to detect the presence of cancer cells in the lymph nodes under the arms, researchers said.

In clinical practice , the AI could help avoid 51% of unnecessary surgical biopsies to test lymph nodes for cancer, while correctly identifying 95% of patients whose breast cancer had spread, results showed.

Most breast cancer deaths are due to cancer that's spread elsewhere, and the cancer typically first spreads to an armpit lymph node, explained lead researcher Dr. Basak Dogan, director of breast imaging research at UT Southwestern Medical Center.

Finding cancer that's spread to a lymph node "is critical in guiding treatment decisions, but traditional imaging techniques alone do not have enough sensitivity" to effectively detect it, Dogan said in a medical center news release.

Patients with benign findings from MRI exams or needle biopsies often must undergo surgical lymph node biopsy anyway, because those tests can miss a good number of cancer cells that have spread past the breast, Dogan said.

Researchers trained the AI by feeding the program MRI scans from 350 newly diagnosed breast cancer patients known to have cancer in their lymph nodes.

Testing showed that the newly developed AI was significantly better at identifying these patients than human doctors using MRI or ultrasound, researchers reported in the journal Radiology: Imaging Cancer .

"That's an important advancement because surgical biopsies have side effects and risks, despite having a low probability of a positive result confirming the presence of cancer cells," Dogan explained. "Improving our ability to rule out [cancer cells in lymph nodes ] during a routine MRI—using this model—can reduce that risk while enhancing clinical outcomes."

Copyright © 2024 HealthDay . All rights reserved.

Explore further

Feedback to editors

medical research ai

Integration of pharmacies with physician practices has little impact on cancer drug expenditures, study finds

29 minutes ago

medical research ai

Study finds heat pain perception decreases with age

30 minutes ago

medical research ai

Colon cancers are rising among the young: New study outlines the warning signs

40 minutes ago

medical research ai

New mechanism of immune evasion in squamous cell carcinoma offers potential for improved treatment

medical research ai

Scientists show that serotonin activates brain areas influencing behavior and motivation

medical research ai

Researchers develop new light-controlled 'off switch' for brain cells

2 hours ago

medical research ai

New global targets proposed to reduce AMR-linked deaths and improve access to essential antibiotics

medical research ai

Harnessing the power of viruses to kill cancers

medical research ai

Scientists leverage machine learning to decode gene regulation in the developing human brain

medical research ai

Study connects genetic risk for autism to changes observed in the brain

Related stories.

medical research ai

Some breast cancer patients can retain lymph nodes, avoiding lymphedema

Apr 13, 2024

medical research ai

Study suggests it may be safe to de-escalate surgery in middle-aged breast cancer patients

May 22, 2024

medical research ai

Clinical trial: Less extensive breast cancer surgery results in fewer swollen arms

Apr 4, 2024

medical research ai

Breast cancer patients can safely avoid extensive removal of lymph nodes if they respond well to systemic treatment

Mar 21, 2024

medical research ai

Neoadjuvant chemotherapy may help some breast cancer patients skip regional nodal irradiation

Jan 15, 2024

medical research ai

Axillary surgery may not be necessary for all women with invasive breast cancer

Oct 22, 2020

Recommended for you

medical research ai

New biomarker predicts success of immunotherapy in kidney cancer

medical research ai

HPV vaccines prevent cancer in men as well as women, new research suggests

8 hours ago

medical research ai

Propensity score matching offers more efficient biomarker discovery in cancer research

21 hours ago

Let us know if there is a problem with our content

Use this form if you have come across a typo, inaccuracy or would like to send an edit request for the content on this page. For general inquiries, please use our contact form . For general feedback, use the public comments section below (please adhere to guidelines ).

Please select the most appropriate category to facilitate processing of your request

Thank you for taking time to provide your feedback to the editors.

Your feedback is important to us. However, we do not guarantee individual replies due to the high volume of messages.

E-mail the story

Your email address is used only to let the recipient know who sent the email. Neither your address nor the recipient's address will be used for any other purpose. The information you enter will appear in your e-mail message and is not retained by Medical Xpress in any form.

Newsletter sign up

Get weekly and/or daily updates delivered to your inbox. You can unsubscribe at any time and we'll never share your details to third parties.

More information Privacy policy

Donate and enjoy an ad-free experience

We keep our content available to everyone. Consider supporting Science X's mission by getting a premium account.

E-mail newsletter

ScienceDaily

AI can help improve ER admission decisions

Generative artificial intelligence (AI), such as GPT-4, can help predict whether an emergency room patient needs to be admitted to the hospital even with only minimal training on a limited number of records, according to investigators at the Icahn School of Medicine at Mount Sinai. Details of the research were published in the May 21 online issue of the Journal of the American Medical Informatics Association .

In the retrospective study, the researchers analyzed records from seven Mount Sinai Health System hospitals, using both structured data, such as vital signs, and unstructured data, such as nurse triage notes, from more than 864,000 emergency room visits while excluding identifiable patient data. Of these visits, 159,857 (18.5 percent) led to the patient being admitted to the hospital.

The researchers compared GPT-4 against traditional machine-learning models such as Bio-Clinical-BERT for text and XGBoost for structured data in various scenarios, assessing its performance to predict hospital admissions independently and in combination with the traditional methods.

"We were motivated by the need to test whether generative AI, specifically large language models (LLMs) like GPT-4, could improve our ability to predict admissions in high-volume settings such as the Emergency Department," says co-senior author Eyal Klang, MD, Director of the Generative AI Research Program in the Division of Data-Driven and Digital Medicine (D3M) at Icahn Mount Sinai. "Our goal is to enhance clinical decision-making through this technology. We were surprised by how well GPT-4 adapted to the ER setting and provided reasoning for its decisions. This capability of explaining its rationale sets it apart from traditional models and opens up new avenues for AI in medical decision-making."

While traditional machine-learning models use millions of records for training, LLMs can effectively learn from just a few examples. Moreover, according to the researchers, LLMs can incorporate traditional machine-learning predictions, improving performance

"Our research suggests that AI could soon support doctors in emergency rooms by making quick, informed decisions about patient admissions. This work opens the door for further innovation in health care AI, encouraging the development of models that can reason and learn from limited data, like human experts do," says co-senior author Girish N. Nadkarni, MD, MPH, Irene and Dr. Arthur M. Fishberg Professor of Medicine at Icahn Mount Sinai, Director of The Charles Bronfman Institute of Personalized Medicine, and System Chief of D3M. "However, while the results are encouraging, the technology is still in a supportive role, enhancing the decision-making process by providing additional insights, not taking over the human component of health care, which remains critical."

The research team is investigating how to apply large language models to health care systems, with the goal of harmoniously integrating them with traditional machine-learning methods to address complex challenges and decision-making in real-time clinical settings.

"Our study informs how LLMs can be integrated into health care operations. The ability to rapidly train LLMs highlights their potential to provide valuable insights even in complex environments like health care," says Brendan Carr, MD, MA, MS, a study co-author and emergency room physician who is Chief Executive Officer of Mount Sinai Health System. "Our study sets the stage for further research on AI integration in health care across the many domains of diagnostic, treatment, operational, and administrative tasks that require continuous optimization."

The paper is titled "Evaluating the accuracy of a state-of-the-art large language model for prediction of admissions from the emergency room."

The remaining authors of the paper, all with Icahn Mount Sinai, are Benjamin S. Glicksberg, PhD; Dhaval Patel, BS; Ashwin Sawant, MD; Akhil Vaid, MD; Ganesh Raut, BS; Alexander W. Charney, MD, PhD; Donald Apakama, MD; and Robert Freeman, RN.

The work was supported by the National Heart Lung and Blood Institute NIH grant 5R01HL141841-05.

  • Today's Healthcare
  • Patient Education and Counseling
  • Medical Topics
  • Diseases and Conditions
  • Computer Modeling
  • Information Technology
  • Computer vision
  • Scientific method
  • Data mining
  • Scientific visualization
  • Salmonella infection
  • Emergency management
  • Artificial intelligence
  • Face transplant

Story Source:

Materials provided by The Mount Sinai Hospital / Mount Sinai School of Medicine . Note: Content may be edited for style and length.

Journal Reference :

  • Benjamin S Glicksberg, Prem Timsina, Dhaval Patel, Ashwin Sawant, Akhil Vaid, Ganesh Raut, Alexander W Charney, Donald Apakama, Brendan G Carr, Robert Freeman, Girish N Nadkarni, Eyal Klang. Evaluating the accuracy of a state-of-the-art large language model for prediction of admissions from the emergency room . Journal of the American Medical Informatics Association , 2024; DOI: 10.1093/jamia/ocae103

Cite This Page :

Explore More

  • Treating Cataracts and Other Eye Conditions
  • Early Arrival of Palaeolithic People On Cyprus
  • Networks Regulating Gene Function in Human Brain
  • Birth of Universe's Earliest Galaxies
  • Why the Brain Can Robustly Recognize B&W Images
  • Birth Control Pill for Men?
  • Intriguing World Sized Between Earth, Venus
  • Billions of Orphan Stars Revealed
  • Massive Catalog of Strange Worlds
  • Mental Disorders May Spread Thru Social Networks

Trending Topics

Strange & offbeat.

IMAGES

  1. The Rise of AI in Medicine

    medical research ai

  2. AI in Healthcare

    medical research ai

  3. How AI Systems Can Improve Healthcare Diagnosis and Treatment

    medical research ai

  4. Artificial Intelligence in Medicine

    medical research ai

  5. Artificial Intelligence and Medical Research

    medical research ai

  6. How AI in Healthcare is Transforming Patient Experience

    medical research ai

VIDEO

  1. AI in Healthcare Revolutionizing Diagnostics

  2. AI Innovation in Radiology

  3. How AI is Changing Medical Imaging

  4. AI IN MEDICAL HEALTH CARE

  5. Our latest GenAI research and partnerships

  6. Revolutionizing Healthcare with AI

COMMENTS

  1. Artificial Intelligence and Medical Research

    Learn more about the different types of AI and their use in medical research. NIH Office of Communications and Public Liaison. Building 31, Room 5B52. Bethesda, MD 20892-2094. [email protected]. Tel: 301-451-8224. Editor: Harrison Wein, Ph.D. Managing Editor: Tianna Hicklin, Ph.D. Illustrator: Alan Defibaugh.

  2. AI in health and medicine

    Medical AI research has consequently blossomed in specialties that rely heavily on the interpretation of images, such as radiology, pathology, gastroenterology and ophthalmology.

  3. AI in Medicine

    AI and Medical Education. A. Cooper and A. RodmanN Engl J Med 2023;389:385-387. Artificial intelligence could have broad implications for medical education. Educators could lead the way when it ...

  4. AI in medicine: creating a safe and equitable future

    But the medical applications of generative AI remain largely speculative. Automation of evidence synthesis and identification of de novo drug candidates could expedite clinical research. AI-enabled generation of medical notes could ease the administrative burden for health-care workers, freeing up time to see patients.

  5. Artificial Intelligence and Machine Learning in Clinical Medicine, 2023

    Medical schools of South Asian countries need to incorporate artificial intelligence-powered chatbots in existing undergraduate medical curricula, Journal of Integrative Medicine and Public Health ...

  6. Advancing Healthcare Research & AI in Medicine

    Research shows that 2% of hospitalized patients experience serious preventable medication-related incidents that can be life-threatening, cause permanent harm, or result in death. Published in Clinical Pharmacology and Therapeutics, our best-performing AI model was able to anticipate physician's actual prescribing decisions 75% of the time ...

  7. Artificial Intelligence in Medicine

    The editors announce both a series of articles focusing on AI and machine learning in health care and the 2024 launch of a new journal, NEJM AI, a forum for evidence, resource sharing, and discussi...

  8. Google Health AI

    Health AI. AI has the potential to help save lives by transforming healthcare and medicine through the creation of more personalized, accessible and effective solutions. This is particularly true in more resource challenged communities where there is often a shortage of healthcare workers. In collaboration with healthcare providers, researchers ...

  9. How AI is being used to accelerate clinical trials

    A few companies are developing platforms that integrate many of these AI approaches into one system. Xiaoyan Wang, who heads the life-science department at Intelligent Medical Objects, co ...

  10. WHO issues first global report on Artificial Intelligence (AI) in

    Artificial Intelligence (AI) holds great promise for improving the delivery of healthcare and medicine worldwide, but only if ethics and human rights are put at the heart of its design, deployment, and use, according to new WHO guidance published today. The report, Ethics and governance of artificial intelligence for health, is the result of 2 years of consultations held by a panel of ...

  11. What is Artificial Intelligence in Medicine?

    Research has indicated that AI powered by artificial neural networks can be just as effective as human radiologists at detecting signs of breast cancer as well as other conditions. In addition to helping clinicians spot early signs of disease, AI can also help make the staggering number of medical images that clinicians have to keep track of ...

  12. Multimodal biomedical AI

    Multimodal machine learning (also referred to as multimodal learning) is a subfield of machine learning that aims to develop and train models that can leverage multiple different types of data and ...

  13. AMIE: A research AI system for diagnostic medical reasoning and

    AMIE: an LLM-based conversational diagnostic research AI system. We trained AMIE on real-world datasets comprising medical reasoning, medical summarization and real-world clinical conversations. It is feasible to train LLMs using real-world dialogues developed by passively collecting and transcribing in-person clinical visits, however, two ...

  14. Accelerating AI in clinical trials and research

    Expanding adoption of AI and RWD for clinical development. Today AI can draw upon growing volumes of RWD. Electronic health records and claims data from certain geographies are widely available, and novel data sources—including biobanks, data omics panels, population-wide genomic studies, patient registries, and imaging and digital pathology—are increasing in number and diversity.

  15. AI in Clinical Medicine

    AI medical scribes, new research tools and diagnostic tests, and personalized treatment options are just a few applications of AI that are beginning to have a direct impact on clinicians and the patients they serve. Up until now, most medical practitioners have not received formal training in artificial intelligence. Recognizing that now is the ...

  16. Artificial Intelligence in Medicine (AIM) PhD Track at HMS DBMI

    The Artificial Intelligence in Medicine (AIM) PhD track, newly developed by the Department of Biomedical Informatics (DBMI) at Harvard Medical School, will enable future academic, clinical, industry, and government leaders to rapidly transform patient care, improve health equity and outcomes, and accelerate precision medicine by creating new AI technologies that reason across massive-scale ...

  17. Revolutionizing Healthcare: The Transformative Power of AI

    AI's integration into healthcare is set to usher in transformative changes, including the development of personalized treatment plans tailored to individual genetic profiles and lifestyles, and virtual health assistants available 24/7, providing real-time, accurate medical advice. The expectation is that AI will manage over 85% of customer ...

  18. The Benefits of AI in Healthcare

    According to Statista, the artificial intelligence (AI) healthcare market, which is valued at $11 billion in 2021, is projected to be worth $187 billion in 2030.That massive increase means we will likely continue to see considerable changes in how medical providers, hospitals, pharmaceutical and biotechnology companies, and others in the healthcare industry operate.

  19. Artificial intelligence: Enhanced expertise drives innovation

    May 23, 2024. Mayo Clinic clinician-researchers are at the forefront of developing artificial intelligence (AI) applications for neurosurgery. "Artificial intelligence will provide a good portion of future innovations in healthcare. That's the case not just in knowledge-based fields but certainly in technical and interventional fields like ...

  20. Journal of Medical Internet Research

    Background: In recent years, there has been an upwelling of artificial intelligence (AI) studies in the health care literature. During this period, there has been an increasing number of proposed standards to evaluate the quality of health care AI studies. Objective: This rapid umbrella review examines the use of AI quality standards in a sample of health care AI systematic review articles ...

  21. Development of Artificial Intelligence in Healthcare in Russia

    Research on artificial intelligence began in the USSR in the 60s of the last century. Their basis were the developed mathematical school, emergence of computers and beginning of space exploration. ... This Working Group aims to achieve a harmonized approach to managing artificial intelligence (AI) medical devices. This Working Group will cover ...

  22. AI might help spot breast cancer's spread without biopsy

    The AI analyzes MRI scans to detect the presence of cancer cells in the lymph nodes under the arms, researchers said.. In clinical practice, the AI could help avoid 51% of unnecessary surgical ...

  23. AI can help improve ER admission decisions

    Apr. 22, 2024 — State-of-the-art artificial intelligence systems known as large language models (LLMs) are poor medical coders, according to researchers. Their study emphasizes the necessity for ...

  24. Awards & Recognitions: May 2024

    Compiled by JEFFRY STANTON May 21, 2024 Awards and Recognitions. 6 min read. Five Harvard Medical School faculty members are among the 144 individuals elected this month to the National Academy of Sciences in recognition of their distinguished and continuing achievements in original research. The newly elected members from HMS are: