Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 01 February 2021

An open source machine learning framework for efficient and transparent systematic reviews

  • Rens van de Schoot   ORCID: orcid.org/0000-0001-7736-2091 1 ,
  • Jonathan de Bruin   ORCID: orcid.org/0000-0002-4297-0502 2 ,
  • Raoul Schram 2 ,
  • Parisa Zahedi   ORCID: orcid.org/0000-0002-1610-3149 2 ,
  • Jan de Boer   ORCID: orcid.org/0000-0002-0531-3888 3 ,
  • Felix Weijdema   ORCID: orcid.org/0000-0001-5150-1102 3 ,
  • Bianca Kramer   ORCID: orcid.org/0000-0002-5965-6560 3 ,
  • Martijn Huijts   ORCID: orcid.org/0000-0002-8353-0853 4 ,
  • Maarten Hoogerwerf   ORCID: orcid.org/0000-0003-1498-2052 2 ,
  • Gerbrich Ferdinands   ORCID: orcid.org/0000-0002-4998-3293 1 ,
  • Albert Harkema   ORCID: orcid.org/0000-0002-7091-1147 1 ,
  • Joukje Willemsen   ORCID: orcid.org/0000-0002-7260-0828 1 ,
  • Yongchao Ma   ORCID: orcid.org/0000-0003-4100-5468 1 ,
  • Qixiang Fang   ORCID: orcid.org/0000-0003-2689-6653 1 ,
  • Sybren Hindriks 1 ,
  • Lars Tummers   ORCID: orcid.org/0000-0001-9940-9874 5 &
  • Daniel L. Oberski   ORCID: orcid.org/0000-0001-7467-2297 1 , 6  

Nature Machine Intelligence volume  3 ,  pages 125–133 ( 2021 ) Cite this article

74k Accesses

229 Citations

165 Altmetric

Metrics details

  • Computational biology and bioinformatics
  • Computer science
  • Medical research

A preprint version of the article is available at arXiv.

To help researchers conduct a systematic review or meta-analysis as efficiently and transparently as possible, we designed a tool to accelerate the step of screening titles and abstracts. For many tasks—including but not limited to systematic reviews and meta-analyses—the scientific literature needs to be checked systematically. Scholars and practitioners currently screen thousands of studies by hand to determine which studies to include in their review or meta-analysis. This is error prone and inefficient because of extremely imbalanced data: only a fraction of the screened studies is relevant. The future of systematic reviewing will be an interaction with machine learning algorithms to deal with the enormous increase of available text. We therefore developed an open source machine learning-aided pipeline applying active learning: ASReview. We demonstrate by means of simulation studies that active learning can yield far more efficient reviewing than manual reviewing while providing high quality. Furthermore, we describe the options of the free and open source research software and present the results from user experience tests. We invite the community to contribute to open source projects such as our own that provide measurable and reproducible improvements over current practice.

Similar content being viewed by others

search algorithm literature review

Testing theory of mind in large language models and humans

search algorithm literature review

Scaling neural machine translation to 200 languages

search algorithm literature review

A guide to artificial intelligence for cancer researchers

With the emergence of online publishing, the number of scientific manuscripts on many topics is skyrocketing 1 . All of these textual data present opportunities to scholars and practitioners while simultaneously confronting them with new challenges. Scholars often develop systematic reviews and meta-analyses to develop comprehensive overviews of the relevant topics 2 . The process entails several explicit and, ideally, reproducible steps, including identifying all likely relevant publications in a standardized way, extracting data from eligible studies and synthesizing the results. Systematic reviews differ from traditional literature reviews in that they are more replicable and transparent 3 , 4 . Such systematic overviews of literature on a specific topic are pivotal not only for scholars, but also for clinicians, policy-makers, journalists and, ultimately, the general public 5 , 6 , 7 .

Given that screening the entire research literature on a given topic is too labour intensive, scholars often develop quite narrow searches. Developing a search strategy for a systematic review is an iterative process aimed at balancing recall and precision 8 , 9 ; that is, including as many potentially relevant studies as possible while simultaneously limiting the total number of studies retrieved. The vast number of publications in the field of study often leads to a relatively precise search, with the risk of missing relevant studies. The process of systematic reviewing is error prone and extremely time intensive 10 . In fact, if the literature of a field is growing faster than the amount of time available for systematic reviews, adequate manual review of this field then becomes impossible 11 .

The rapidly evolving field of machine learning has aided researchers by allowing the development of software tools that assist in developing systematic reviews 11 , 12 , 13 , 14 . Machine learning offers approaches to overcome the manual and time-consuming screening of large numbers of studies by prioritizing relevant studies via active learning 15 . Active learning is a type of machine learning in which a model can choose the data points (for example, records obtained from a systematic search) it would like to learn from and thereby drastically reduce the total number of records that require manual screening 16 , 17 , 18 . In most so-called human-in-the-loop 19 machine-learning applications, the interaction between the machine-learning algorithm and the human is used to train a model with a minimum number of labelling tasks. Unique to systematic reviewing is that not only do all relevant records (that is, titles and abstracts) need to seen by a researcher, but an extremely diverse range of concepts also need to be learned, thereby requiring flexibility in the modelling approach as well as careful error evaluation 11 . In the case of systematic reviewing, the algorithm(s) are interactively optimized for finding the most relevant records, instead of finding the most accurate model. The term researcher-in-the-loop was introduced 20 as a special case of human-in-the-loop with three unique components: (1) the primary output of the process is a selection of the records, not a trained machine learning model; (2) all records in the relevant selection are seen by a human at the end of the process 21 ; (3) the use-case requires a reproducible workflow and complete transparency is required 22 .

Existing tools that implement such an active learning cycle for systematic reviewing are described in Table 1 ; see the Supplementary Information for an overview of all of the software that we considered (note that this list was based on a review of software tools 12 ). However, existing tools have two main drawbacks. First, many are closed source applications with black box algorithms, which is problematic as transparency and data ownership are essential in the era of open science 22 . Second, to our knowledge, existing tools lack the necessary flexibility to deal with the large range of possible concepts to be learned by a screening machine. For example, in systematic reviews, the optimal type of classifier will depend on variable parameters, such as the proportion of relevant publications in the initial search and the complexity of the inclusion criteria used by the researcher 23 . For this reason, any successful system must allow for a wide range of classifier types. Benchmark testing is crucial to understand the real-world performance of any machine learning-aided system, but such benchmark options are currently mostly lacking.

In this paper we present an open source machine learning-aided pipeline with active learning for systematic reviews called ASReview. The goal of ASReview is to help scholars and practitioners to get an overview of the most relevant records for their work as efficiently as possible while being transparent in the process. The open, free and ready-to-use software ASReview addresses all concerns mentioned above: it is open source, uses active learning, allows multiple machine learning models. It also has a benchmark mode, which is especially useful for comparing and designing algorithms. Furthermore, it is intended to be easily extensible, allowing third parties to add modules that enhance the pipeline. Although we focus this paper on systematic reviews, ASReview can handle any text source.

In what follows, we first present the pipeline for manual versus machine learning-aided systematic reviews. We then show how ASReview has been set up and how ASReview can be used in different workflows by presenting several real-world use cases. We subsequently demonstrate the results of simulations that benchmark performance and present the results of a series of user-experience tests. Finally, we discuss future directions.

Pipeline for manual and machine learning-aided systematic reviews

The pipeline of a systematic review without active learning traditionally starts with researchers doing a comprehensive search in multiple databases 24 , using free text words as well as controlled vocabulary to retrieve potentially relevant references. The researcher then typically verifies that the key papers they expect to find are indeed included in the search results. The researcher downloads a file with records containing the text to be screened. In the case of systematic reviewing it contains the titles and abstracts (and potentially other metadata such as the authors’s names, journal name, DOI) of potentially relevant references into a reference manager. Ideally, two or more researchers then screen the records’s titles and abstracts on the basis of the eligibility criteria established beforehand 4 . After all records have been screened, the full texts of the potentially relevant records are read to determine which of them will be ultimately included in the review. Most records are excluded in the title and abstract phase. Typically, only a small fraction of the records belong to the relevant class, making title and abstract screening an important bottleneck in systematic reviewing process 25 . For instance, a recent study analysed 10,115 records and excluded 9,847 after title and abstract screening, a drop of more than 95% 26 . ASReview therefore focuses on this labour-intensive step.

The research pipeline of ASReview is depicted in Fig. 1 . The researcher starts with a search exactly as described above and subsequently uploads a file containing the records (that is, metadata containing the text of the titles and abstracts) into the software. Prior knowledge is then selected, which is used for training of the first model and presenting the first record to the researcher. As screening is a binary classification problem, the reviewer must select at least one key record to include and exclude on the basis of background knowledge. More prior knowledge may result in improved efficiency of the active learning process.

figure 1

The symbols indicate whether the action is taken by a human, a computer, or whether both options are available.

A machine learning classifier is trained to predict study relevance (labels) from a representation of the record-containing text (feature space) on the basis of prior knowledge. We have purposefully chosen not to include an author name or citation network representation in the feature space to prevent authority bias in the inclusions. In the active learning cycle, the software presents one new record to be screened and labelled by the user. The user’s binary label (1 for relevant versus 0 for irrelevant) is subsequently used to train a new model, after which a new record is presented to the user. This cycle continues up to a certain user-specified stopping criterion has been reached. The user now has a file with (1) records labelled as either relevant or irrelevant and (2) unlabelled records ordered from most to least probable to be relevant as predicted by the current model. This set-up helps to move through a large database much quicker than in the manual process, while the decision process simultaneously remains transparent.

Software implementation for ASReview

The source code 27 of ASReview is available open source under an Apache 2.0 license, including documentation 28 . Compiled and packaged versions of the software are available on the Python Package Index 29 or Docker Hub 30 . The free and ready-to-use software ASReview implements oracle, simulation and exploration modes. The oracle mode is used to perform a systematic review with interaction by the user, the simulation mode is used for simulation of the ASReview performance on existing datasets, and the exploration mode can be used for teaching purposes and includes several preloaded labelled datasets.

The oracle mode presents records to the researcher and the researcher classifies these. Multiple file formats are supported: (1) RIS files are used by digital libraries such as IEEE Xplore, Scopus and ScienceDirect; the citation managers Mendeley, RefWorks, Zotero and EndNote support the RIS format too. (2) Tabular datasets with the .csv, .xlsx and .xls file extensions. CSV files should be comma separated and UTF-8 encoded; the software for CSV files accepts a set of predetermined labels in line with the ones used in RIS files. Each record in the dataset should hold the metadata on, for example, a scientific publication. Mandatory metadata is text and can, for example, be titles or abstracts from scientific papers. If available, both are used to train the model, but at least one is needed. An advanced option is available that splits the title and abstracts in the feature-extraction step and weights the two feature matrices independently (for TF–IDF only). Other metadata such as author, date, DOI and keywords are optional but not used for training the models. When using ASReview in the simulation or exploration mode, an additional binary variable is required to indicate historical labelling decisions. This column, which is automatically detected, can also be used in the oracle mode as background knowledge for previous selection of relevant papers before entering the active learning cycle. If unavailable, the user has to select at least one relevant record that can be identified by searching the pool of records. At least one irrelevant record should also be identified; the software allows to search for specific records or presents random records that are most likely to be irrelevant due to the extremely imbalanced data.

The software has a simple yet extensible default model: a naive Bayes classifier, TF–IDF feature extraction, a dynamic resampling balance strategy 31 and certainty-based sampling 17 , 32 for the query strategy. These defaults were chosen on the basis of their consistently high performance in benchmark experiments across several datasets 31 . Moreover, the low computation time of these default settings makes them attractive in applications, given that the software should be able to run locally. Users can change the settings, shown in Table 2 , and technical details are described in our documentation 28 . Users can also add their own classifiers, feature extraction techniques, query strategies and balance strategies.

ASReview has a number of implemented features (see Table 2 ). First, there are several classifiers available: (1) naive Bayes; (2) support vector machines; (3) logistic regression; (4) neural networks; (5) random forests; (6) LSTM-base, which consists of an embedding layer, an LSTM layer with one output, a dense layer and a single sigmoid output node; and (7) LSTM-pool, which consists of an embedding layer, an LSTM layer with many outputs, a max pooling layer and a single sigmoid output node. The feature extraction techniques available are Doc2Vec 33 , embedding LSTM, embedding with IDF or TF–IDF 34 (the default is unigram, with the option to run n -grams while other parameters are set to the defaults of Scikit-learn 35 ) and sBERT 36 . The available query strategies for the active learning part are (1) random selection, ignoring model-assigned probabilities; (2) uncertainty-based sampling, which chooses the most uncertain record according to the model (that is, closest to 0.5 probability); (3) certainty-based sampling (max in ASReview), which chooses the record most likely to be included according to the model; and (4) mixed sampling, which uses a combination of random and certainty-based sampling.

There are several balance strategies that rebalance and reorder the training data. This is necessary, because the data is typically extremely imbalanced and therefore we have implemented the following balance strategies: (1) full sampling, which uses all of the labelled records; (2) undersampling the irrelevant records so that the included and excluded records are in some particular ratio (closer to one); and (3) dynamic resampling, a novel method similar to undersampling in that it decreases the imbalance of the training data 31 . However, in dynamic resampling, the number of irrelevant records is decreased, whereas the number of relevant records is increased by duplication such that the total number of records in the training data remains the same. The ratio between relevant and irrelevant records is not fixed over interactions, but dynamically updated depending on the number of labelled records, the total number of records and the ratio between relevant and irrelevant records. Details on all of the described algorithms can be found in the code and documentation referred to above.

By default, ASReview converts the records’s texts into a document-term matrix, terms are converted to lowercase and no stop words are removed by default (but this can be changed). As the document-term matrix is identical in each iteration of the active learning cycle, it is generated in advance of model training and stored in the (active learning) state file. Each row of the document-term matrix can easily be requested from the state-file. Records are internally identified by their row number in the input dataset. In oracle mode, the record that is selected to be classified is retrieved from the state file and the record text and other metadata (such as title and abstract) are retrieved from the original dataset (from the file or the computer’s memory). ASReview can run on your local computer, or on a (self-hosted) local or remote server. Data (all records and their labels) remain on the users’s computer. Data ownership and confidentiality are crucial and no data are processed or used in any way by third parties. This is unique by comparison with some of the existing systems, as shown in the last column of Table 1 .

Real-world use cases and high-level function descriptions

Below we highlight a number of real-world use cases and high-level function descriptions for using the pipeline of ASReview.

ASReview can be integrated in classic systematic reviews or meta-analyses. Such reviews or meta-analyses entail several explicit and reproducible steps, as outlined in the PRISMA guidelines 4 . Scholars identify all likely relevant publications in a standardized way, screen retrieved publications to select eligible studies on the basis of defined eligibility criteria, extract data from eligible studies and synthesize the results. ASReview fits into this process, particularly in the abstract screening phase. ASReview does not replace the initial step of collecting all potentially relevant studies. As such, results from ASReview depend on the quality of the initial search process, including selection of databases 24 and construction of comprehensive searches using keywords and controlled vocabulary. However, ASReview can be used to broaden the scope of the search (by keyword expansion or omitting limitation in the search query), resulting in a higher number of initial papers to limit the risk of missing relevant papers during the search part (that is, more focus on recall instead of precision).

Furthermore, many reviewers nowadays move towards meta-reviews when analysing very large literature streams, that is, systematic reviews of systematic reviews 37 . This can be problematic as the various reviews included could use different eligibility criteria and are therefore not always directly comparable. Due to the efficiency of ASReview, scholars using the tool could conduct the study by analysing the papers directly instead of using the systematic reviews. Furthermore, ASReview supports the rapid update of a systematic review. The included papers from the initial review are used to train the machine learning model before screening of the updated set of papers starts. This allows the researcher to quickly screen the updated set of papers on the basis of decisions made in the initial run.

As an example case, let us look at the current literature on COVID-19 and the coronavirus. An enormous number of papers are being published on COVID-19. It is very time consuming to manually find relevant papers (for example, to develop treatment guidelines). This is especially problematic as urgent overviews are required. Medical guidelines rely on comprehensive systematic reviews, but the medical literature is growing at breakneck pace and the quality of the research is not universally adequate for summarization into policy 38 . Such reviews must entail adequate protocols with explicit and reproducible steps, including identifying all potentially relevant papers, extracting data from eligible studies, assessing potential for bias and synthesizing the results into medical guidelines. Researchers need to screen (tens of) thousands of COVID-19-related studies by hand to find relevant papers to include in their overview. Using ASReview, this can be done far more efficiently by selecting key papers that match their (COVID-19) research question in the first step; this should start the active learning cycle and lead to the most relevant COVID-19 papers for their research question being presented next. A plug-in was therefore developed for ASReview 39 , which contained three databases that are updated automatically whenever a new version is released by the owners of the data: (1) the Cord19 database, developed by the Allen Institute for AI, with over all publications on COVID-19 and other coronavirus research (for example SARS, MERS and so on) from PubMed Central, the WHO COVID-19 database of publications, the preprint servers bioRxiv and medRxiv and papers contributed by specific publishers 40 . The CORD-19 dataset is updated daily by the Allen Institute for AI and updated also daily in the plugin. (2) In addition to the full dataset, we automatically construct a daily subset of the database with studies published after December 1st, 2019 to search for relevant papers published during the COVID-19 crisis. (3) A separate dataset of COVID-19 related preprints, containing metadata of preprints from over 15 preprints servers across disciplines, published since January 1st, 2020 41 . The preprint dataset is updated weekly by the maintainers and then automatically updated in ASReview as well. As this dataset is not readily available to researchers through regular search engines (for example, PubMed), its inclusion in ASReview provided added value to researchers interested in COVID-19 research, especially if they want a quick way to screen preprints specifically.

Simulation study

To evaluate the performance of ASReview on a labelled dataset, users can employ the simulation mode. As an example, we ran simulations based on four labelled datasets with version 0.7.2 of ASReview. All scripts to reproduce the results in this paper can be found on Zenodo ( https://doi.org/10.5281/zenodo.4024122 ) 42 , whereas the results are available at OSF ( https://doi.org/10.17605/OSF.IO/2JKD6 ) 43 .

First, we analysed the performance for a study systematically describing studies that performed viral metagenomic next-generation sequencing in common livestock such as cattle, small ruminants, poultry and pigs 44 . Studies were retrieved from Embase ( n  = 1,806), Medline ( n  = 1,384), Cochrane Central ( n  = 1), Web of Science ( n  = 977) and Google Scholar ( n  = 200, the top relevant references). After deduplication this led to 2,481 studies obtained in the initial search, of which 120 were inclusions (4.84%).

A second simulation study was performed on the results for a systematic review of studies on fault prediction in software engineering 45 . Studies were obtained from ACM Digital Library, IEEExplore and the ISI Web of Science. Furthermore, a snowballing strategy and a manual search were conducted, accumulating to 8,911 publications of which 104 were included in the systematic review (1.2%).

A third simulation study was performed on a review of longitudinal studies that applied unsupervised machine learning techniques to longitudinal data of self-reported symptoms of the post-traumatic stress assessed after trauma exposure 46 , 47 ; 5,782 studies were obtained by searching Pubmed, Embase, PsychInfo and Scopus and through a snowballing strategy in which both the references and the citation of the included papers were screened. Thirty-eight studies were included in the review (0.66%).

A fourth simulation study was performed on the results for a systematic review on the efficacy of angiotensin-converting enzyme inhibitors, from a study collecting various systematic review datasets from the medical sciences 15 . The collection is a subset of 2,544 publications from the TREC 2004 Genomics Track document corpus 48 . This is a static subset from all MEDLINE records from 1994 through 2003, which allows for replicability of results. Forty-one publications were included in the review (1.6%).

Performance metrics

We evaluated the four datasets using three performance metrics. We first assess the work saved over sampling (WSS), which is the percentage reduction in the number of records needed to screen achieved by using active learning instead of screening records at random; WSS is measured at a given level of recall of relevant records, for example 95%, indicating the work reduction in screening effort at the cost of failing to detect 5% of the relevant records. For some researchers it is essential that all relevant literature on the topic is retrieved; this entails that the recall should be 100% (that is, WSS@100%). We also propose the amount of relevant references found after having screened the first 10% of the records (RRF10%). This is a useful metric for getting a quick overview of the relevant literature.

For every dataset, 15 runs were performed with one random inclusion and one random exclusion (see Fig. 2 ). The classical review performance with randomly found inclusions is shown by the dashed line. The average work saved over sampling at 95% recall for ASReview is 83% and ranges from 67% to 92%. Hence, 95% of the eligible studies will be found after screening between only 8% to 33% of the studies. Furthermore, the number of relevant abstracts found after reading 10% of the abstracts ranges from 70% to 100%. In short, our software would have saved many hours of work.

figure 2

a – d , Results of the simulation study for the results for a study systematically review studies that performed viral metagenomic next-generation sequencing in common livestock ( a ), results for a systematic review of studies on fault prediction in software engineering ( b ), results for longitudinal studies that applied unsupervised machine learning techniques on longitudinal data of self-reported symptoms of posttraumatic stress assessed after trauma exposure ( c ), and results for a systematic review on the efficacy of angiotensin-converting enzyme inhibitors ( d ). Fiteen runs (shown with separate lines) were performed for every dataset, with only one random inclusion and one random exclusion. The classical review performances with randomly found inclusions are shown by the dashed lines.

Usability testing (user experience testing)

We conducted a series of user experience tests to learn from end users how they experience the software and implement it in their workflow. The study was approved by the Ethics Committee of the Faculty of Social and Behavioral Sciences of Utrecht University (ID 20-104).

Unstructured interviews

The first user experience (UX) test—carried out in December 2019—was conducted with an academic research team in a substantive research field (public administration and organizational science) that has conducted various systematic reviews and meta-analyses. It was composed of three university professors (ranging from assistant to full) and three PhD candidates. In one 3.5 h session, the participants used the software and provided feedback via unstructured interviews and group discussions. The goal was to provide feedback on installing the software and testing the performance on their own data. After these sessions we prioritized the feedback in a meeting with the ASReview team, which resulted in the release of v.0.4 and v.0.6. An overview of all releases can be found on GitHub 27 .

A second UX test was conducted with four experienced researchers developing medical guidelines based on classical systematic reviews, and two experienced reviewers working at a pharmaceutical non-profit organization who work on updating reviews with new data. In four sessions, held in February to March 2020, these users tested the software following our testing protocol. After each session we implemented the feedback provided by the experts and asked them to review the software again. The main feedback was about how to upload datasets and select prior papers. Their feedback resulted in the release of v.0.7 and v.0.9.

Systematic UX test

In May 2020 we conducted a systematic UX test. Two groups of users were distinguished: an unexperienced group and an experienced user who already used ASReview. Due to the COVID-19 lockdown the usability tests were conducted via video calling where one person gave instructions to the participant and one person observed, called human-moderated remote testing 49 . During the tests, one person (SH) asked the questions and helped the participant with the tasks, the other person observed and made notes, a user experience professional at the IT department of Utrecht University (MH).

To analyse the notes, thematic analysis was used, which is a method to analyse data by dividing the information in subjects that all have a different meaning 50 using the Nvivo 12 software 51 . When something went wrong the text was coded as showstopper, when something did not go smoothly the text was coded as doubtful, and when something went well the subject was coded as superb. The features the participants requested for future versions of the ASReview tool were discussed with the lead engineer of the ASReview team and were submitted to GitHub as issues or feature requests.

The answers to the quantitative questions can be found at the Open Science Framework 52 . The participants ( N  = 11) rated the tool with a grade of 7.9 (s.d. = 0.9) on a scale from one to ten (Table 2 ). The unexperienced users on average rated the tool with an 8.0 (s.d. = 1.1, N  = 6). The experienced user on average rated the tool with a 7.8 (s.d. = 0.9, N  = 5). The participants described the usability test with words such as helpful, accessible, fun, clear and obvious.

The UX tests resulted in the new release v0.10, v0.10.1 and the major release v0.11, which is a major revision of the graphical user interface. The documentation has been upgraded to make installing and launching ASReview more straightforward. We made setting up the project, selecting a dataset and finding past knowledge is more intuitive and flexible. We also added a project dashboard with information on your progress and advanced settings.

Continuous input via the open source community

Finally, the ASReview development team receives continuous feedback from the open science community about, among other things, the user experience. In every new release we implement features listed by our users. Recurring UX tests are performed to keep up with the needs of users and improve the value of the tool.

We designed a system to accelerate the step of screening titles and abstracts to help researchers conduct a systematic review or meta-analysis as efficiently and transparently as possible. Our system uses active learning to train a machine learning model that predicts relevance from texts using a limited number of labelled examples. The classifier, feature extraction technique, balance strategy and active learning query strategy are flexible. We provide an open source software implementation, ASReview with state-of-the-art systems across a wide range of real-world systematic reviewing applications. Based on our experiments, ASReview provides defaults on its parameters, which exhibited good performance on average across the applications we examined. However, we stress that in practical applications, these defaults should be carefully examined; for this purpose, the software provides a simulation mode to users. We encourage users and developers to perform further evaluation of the proposed approach in their application, and to take advantage of the open source nature of the project by contributing further developments.

Drawbacks of machine learning-based screening systems, including our own, remain. First, although the active learning step greatly reduces the number of manuscripts that must be screened, it also prevents a straightforward evaluation of the system’s error rates without further onerous labelling. Providing users with an accurate estimate of the system’s error rate in the application at hand is therefore a pressing open problem. Second, although, as argued above, the use of such systems is not limited in principle to reviewing, no empirical benchmarks of actual performance in these other situations yet exist to our knowledge. Third, machine learning-based screening systems automate the screening step only; although the screening step is time-consuming and a good target for automation, it is just one part of a much larger process, including the initial search, data extraction, coding for risk of bias, summarizing results and so on. Although some other works, similar to our own, have looked at (semi-)automating some of these steps in isolation 53 , 54 , to our knowledge the field is still far removed from an integrated system that would truly automate the review process while guaranteeing the quality of the produced evidence synthesis. Integrating the various tools that are currently under development to aid the systematic reviewing pipeline is therefore a worthwhile topic for future development.

Possible future research could also focus on the performance of identifying full text articles with different document length and domain-specific terminologies or even other types of text, such as newspaper articles and court cases. When the selection of past knowledge is not possible based on expert knowledge, alternative methods could be explored. For example, unsupervised learning or pseudolabelling algorithms could be used to improve training 55 , 56 . In addition, as the NLP community pushes forward the state of the art in feature extraction methods, these are easily added to our system as well. In all cases, performance benefits should be carefully evaluated using benchmarks for the task at hand. To this end, common benchmark challenges should be constructed that allow for an even comparison of the various tools now available. To facilitate such a benchmark, we have constructed a repository of publicly available systematic reviewing datasets 57 .

The future of systematic reviewing will be an interaction with machine learning algorithms to deal with the enormous increase of available text. We invite the community to contribute to open source projects such as our own, as well as to common benchmark challenges, so that we can provide measurable and reproducible improvement over current practice.

Data availability

The results described in this paper are available at the Open Science Framework ( https://doi.org/10.17605/OSF.IO/2JKD6 ) 43 . The answers to the quantitative questions of the UX test can be found at the Open Science Framework (OSF.IO/7PQNM) 52 .

Code availability

All code to reproduce the results described in this paper can be found on Zenodo ( https://doi.org/10.5281/zenodo.4024122 ) 42 . All code for the software ASReview is available under an Apache 2.0 license ( https://doi.org/10.5281/zenodo.3345592 ) 27 , is maintained on GitHub 63 and includes documentation ( https://doi.org/10.5281/zenodo.4287120 ) 28 .

Bornmann, L. & Mutz, R. Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J. Assoc. Inf. Sci. Technol. 66 , 2215–2222 (2015).

Article   Google Scholar  

Gough, D., Oliver, S. & Thomas, J. An Introduction to Systematic Reviews (Sage, 2017).

Cooper, H. Research Synthesis and Meta-analysis: A Step-by-Step Approach (SAGE Publications, 2015).

Liberati, A. et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J. Clin. Epidemiol. 62 , e1–e34 (2009).

Boaz, A. et al. Systematic Reviews: What have They Got to Offer Evidence Based Policy and Practice? (ESRC UK Centre for Evidence Based Policy and Practice London, 2002).

Oliver, S., Dickson, K. & Bangpan, M. Systematic Reviews: Making Them Policy Relevant. A Briefing for Policy Makers and Systematic Reviewers (UCL Institute of Education, 2015).

Petticrew, M. Systematic reviews from astronomy to zoology: myths and misconceptions. Brit. Med. J. 322 , 98–101 (2001).

Lefebvre, C., Manheimer, E. & Glanville, J. in Cochrane Handbook for Systematic Reviews of Interventions (eds. Higgins, J. P. & Green, S.) 95–150 (John Wiley & Sons, 2008); https://doi.org/10.1002/9780470712184.ch6 .

Sampson, M., Tetzlaff, J. & Urquhart, C. Precision of healthcare systematic review searches in a cross-sectional sample. Res. Synth. Methods 2 , 119–125 (2011).

Wang, Z., Nayfeh, T., Tetzlaff, J., O’Blenis, P. & Murad, M. H. Error rates of human reviewers during abstract screening in systematic reviews. PLoS ONE 15 , e0227742 (2020).

Marshall, I. J. & Wallace, B. C. Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. Syst. Rev. 8 , 163 (2019).

Harrison, H., Griffin, S. J., Kuhn, I. & Usher-Smith, J. A. Software tools to support title and abstract screening for systematic reviews in healthcare: an evaluation. BMC Med. Res. Methodol. 20 , 7 (2020).

O’Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M. & Ananiadou, S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst. Rev. 4 , 5 (2015).

Wallace, B. C., Trikalinos, T. A., Lau, J., Brodley, C. & Schmid, C. H. Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinf. 11 , 55 (2010).

Cohen, A. M., Hersh, W. R., Peterson, K. & Yen, P.-Y. Reducing workload in systematic review preparation using automated citation classification. J. Am. Med. Inform. Assoc. 13 , 206–219 (2006).

Kremer, J., Steenstrup Pedersen, K. & Igel, C. Active learning with support vector machines. WIREs Data Min. Knowl. Discov. 4 , 313–326 (2014).

Miwa, M., Thomas, J., O’Mara-Eves, A. & Ananiadou, S. Reducing systematic review workload through certainty-based screening. J. Biomed. Inform. 51 , 242–253 (2014).

Settles, B. Active Learning Literature Survey (Minds@UW, 2009); https://minds.wisconsin.edu/handle/1793/60660

Holzinger, A. Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Inform. 3 , 119–131 (2016).

Van de Schoot, R. & De Bruin, J. Researcher-in-the-loop for Systematic Reviewing of Text Databases (Zenodo, 2020); https://doi.org/10.5281/zenodo.4013207

Kim, D., Seo, D., Cho, S. & Kang, P. Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Inf. Sci. 477 , 15–29 (2019).

Nosek, B. A. et al. Promoting an open research culture. Science 348 , 1422–1425 (2015).

Kilicoglu, H., Demner-Fushman, D., Rindflesch, T. C., Wilczynski, N. L. & Haynes, R. B. Towards automatic recognition of scientifically rigorous clinical research evidence. J. Am. Med. Inform. Assoc. 16 , 25–31 (2009).

Gusenbauer, M. & Haddaway, N. R. Which academic search systems are suitable for systematic reviews or meta‐analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Res. Synth. Methods 11 , 181–217 (2020).

Borah, R., Brown, A. W., Capers, P. L. & Kaiser, K. A. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open 7 , e012545 (2017).

de Vries, H., Bekkers, V. & Tummers, L. Innovation in the Public Sector: a systematic review and future research agenda. Public Adm. 94 , 146–166 (2016).

Van de Schoot, R. et al. ASReview: Active Learning for Systematic Reviews (Zenodo, 2020); https://doi.org/10.5281/zenodo.3345592

De Bruin, J. et al. ASReview Software Documentation 0.14 (Zenodo, 2020); https://doi.org/10.5281/zenodo.4287120

ASReview PyPI Package (ASReview Core Development Team, 2020); https://pypi.org/project/asreview/

Docker container for ASReview (ASReview Core Development Team, 2020); https://hub.docker.com/r/asreview/asreview

Ferdinands, G. et al. Active Learning for Screening Prioritization in Systematic Reviews—A Simulation Study (OSF Preprints, 2020); https://doi.org/10.31219/osf.io/w6qbg

Fu, J. H. & Lee, S. L. Certainty-enhanced active learning for improving imbalanced data classification. In 2011 IEEE 11th International Conference on Data Mining Workshops 405–412 (IEEE, 2011).

Le, Q. V. & Mikolov, T. Distributed representations of sentences and documents. Preprint at https://arxiv.org/abs/1405.4053 (2014).

Ramos, J. Using TF–IDF to determine word relevance in document queries. In Proc. 1st Instructional Conference on Machine Learning Vol. 242, 133–142 (ICML, 2003).

Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12 , 2825–2830 (2011).

MathSciNet   MATH   Google Scholar  

Reimers, N. & Gurevych, I. Sentence-BERT: sentence embeddings using siamese BERT-networks Preprint at https://arxiv.org/abs/1908.10084 (2019).

Smith, V., Devane, D., Begley, C. M. & Clarke, M. Methodology in conducting a systematic review of systematic reviews of healthcare interventions. BMC Med. Res. Methodol. 11 , 15 (2011).

Wynants, L. et al. Prediction models for diagnosis and prognosis of COVID-19: systematic review and critical appraisal. Brit. Med. J . 369 , 1328 (2020).

Van de Schoot, R. et al. Extension for COVID-19 Related Datasets in ASReview (Zenodo, 2020). https://doi.org/10.5281/zenodo.3891420 .

Lu Wang, L. et al. CORD-19: The COVID-19 open research dataset. Preprint at https://arxiv.org/abs/2004.10706 (2020).

Fraser, N. & Kramer, B. Covid19_preprints (FigShare, 2020); https://doi.org/10.6084/m9.figshare.12033672.v18

Ferdinands, G., Schram, R., Van de Schoot, R. & De Bruin, J. Scripts for ‘ASReview: Open Source Software for Efficient and Transparent Active Learning for Systematic Reviews’ (Zenodo, 2020); https://doi.org/10.5281/zenodo.4024122

Ferdinands, G., Schram, R., van de Schoot, R. & de Bruin, J. Results for ‘ASReview: Open Source Software for Efficient and Transparent Active Learning for Systematic Reviews’ (OSF, 2020); https://doi.org/10.17605/OSF.IO/2JKD6

Kwok, K. T. T., Nieuwenhuijse, D. F., Phan, M. V. T. & Koopmans, M. P. G. Virus metagenomics in farm animals: a systematic review. Viruses 12 , 107 (2020).

Hall, T., Beecham, S., Bowes, D., Gray, D. & Counsell, S. A systematic literature review on fault prediction performance in software engineering. IEEE Trans. Softw. Eng. 38 , 1276–1304 (2012).

van de Schoot, R., Sijbrandij, M., Winter, S. D., Depaoli, S. & Vermunt, J. K. The GRoLTS-Checklist: guidelines for reporting on latent trajectory studies. Struct. Equ. Model. Multidiscip. J. 24 , 451–467 (2017).

Article   MathSciNet   Google Scholar  

van de Schoot, R. et al. Bayesian PTSD-trajectory analysis with informed priors based on a systematic literature search and expert elicitation. Multivar. Behav. Res. 53 , 267–291 (2018).

Cohen, A. M., Bhupatiraju, R. T. & Hersh, W. R. Feature generation, feature selection, classifiers, and conceptual drift for biomedical document triage. In Proc. 13th Text Retrieval Conference (TREC, 2004).

Vasalou, A., Ng, B. D., Wiemer-Hastings, P. & Oshlyansky, L. Human-moderated remote user testing: orotocols and applications. In 8th ERCIM Workshop, User Interfaces for All Vol. 19 (ERCIM, 2004).

Joffe, H. in Qualitative Research Methods in Mental Health and Psychotherapy: A Guide for Students and Practitioners (eds Harper, D. & Thompson, A. R.) Ch. 15 (Wiley, 2012).

NVivo v. 12 (QSR International Pty, 2019).

Hindriks, S., Huijts, M. & van de Schoot, R. Data for UX-test ASReview - June 2020. OSF https://doi.org/10.17605/OSF.IO/7PQNM (2020).

Marshall, I. J., Kuiper, J. & Wallace, B. C. RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. J. Am. Med. Inform. Assoc. 23 , 193–201 (2016).

Nallapati, R., Zhou, B., dos Santos, C. N., Gulcehre, Ç. & Xiang, B. Abstractive text summarization using sequence-to-sequence RNNs and beyond. In Proc. 20th SIGNLL Conference on Computational Natural Language Learning 280–290 (Association for Computational Linguistics, 2016).

Xie, Q., Dai, Z., Hovy, E., Luong, M.-T. & Le, Q. V. Unsupervised data augmentation for consistency training. Preprint at https://arxiv.org/abs/1904.12848 (2019).

Ratner, A. et al. Snorkel: rapid training data creation with weak supervision. VLDB J. 29 , 709–730 (2020).

Systematic Review Datasets (ASReview Core Development Team, 2020); https://github.com/asreview/systematic-review-datasets

Wallace, B. C., Small, K., Brodley, C. E., Lau, J. & Trikalinos, T. A. Deploying an interactive machine learning system in an evidence-based practice center: Abstrackr. In Proc. 2nd ACM SIGHIT International Health Informatics Symposium 819–824 (Association for Computing Machinery, 2012).

Cheng, S. H. et al. Using machine learning to advance synthesis and use of conservation and environmental evidence. Conserv. Biol. 32 , 762–764 (2018).

Yu, Z., Kraft, N. & Menzies, T. Finding better active learners for faster literature reviews. Empir. Softw. Eng . 23 , 3161–3186 (2018).

Ouzzani, M., Hammady, H., Fedorowicz, Z. & Elmagarmid, A. Rayyan—a web and mobile app for systematic reviews. Syst. Rev. 5 , 210 (2016).

Przybyła, P. et al. Prioritising references for systematic reviews with RobotAnalyst: a user study. Res. Synth. Methods 9 , 470–488 (2018).

ASReview: Active learning for Systematic Reviews (ASReview Core Development Team, 2020); https://github.com/asreview/asreview

Download references

Acknowledgements

We would like to thank the Utrecht University Library, focus area Applied Data Science, and departments of Information and Technology Services, Test and Quality Services, and Methodology and Statistics, for their support. We also want to thank all researchers who shared data, participated in our user experience tests or who gave us feedback on ASReview in other ways. Furthermore, we would like to thank the editors and reviewers for providing constructive feedback. This project was funded by the Innovation Fund for IT in Research Projects, Utrecht University, the Netherlands.

Author information

Authors and affiliations.

Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, the Netherlands

Rens van de Schoot, Gerbrich Ferdinands, Albert Harkema, Joukje Willemsen, Yongchao Ma, Qixiang Fang, Sybren Hindriks & Daniel L. Oberski

Department of Research and Data Management Services, Information Technology Services, Utrecht University, Utrecht, the Netherlands

Jonathan de Bruin, Raoul Schram, Parisa Zahedi & Maarten Hoogerwerf

Utrecht University Library, Utrecht University, Utrecht, the Netherlands

Jan de Boer, Felix Weijdema & Bianca Kramer

Department of Test and Quality Services, Information Technology Services, Utrecht University, Utrecht, the Netherlands

Martijn Huijts

School of Governance, Faculty of Law, Economics and Governance, Utrecht University, Utrecht, the Netherlands

Lars Tummers

Department of Biostatistics, Data management and Data Science, Julius Center, University Medical Center Utrecht, Utrecht, the Netherlands

Daniel L. Oberski

You can also search for this author in PubMed   Google Scholar

Contributions

R.v.d.S. and D.O. originally designed the project, with later input from L.T. J.d.Br. is the lead engineer, software architect and supervises the code base on GitHub. R.S. coded the algorithms and simulation studies. P.Z. coded the very first version of the software. J.d.Bo., F.W. and B.K. developed the systematic review pipeline. M.Huijts is leading the UX tests and was supported by S.H. M.Hoogerwerf developed the architecture of the produced (meta)data. G.F. conducted the simulation study together with R.S. A.H. performed the literature search comparing the different tools together with G.F. J.W. designed all the artwork and helped with formatting the manuscript. Y.M. and Q.F. are responsible for the preprocessing of the metadata under the supervision of J.d.Br. R.v.d.S, D.O. and L.T. wrote the paper with input from all authors. Each co-author has written parts of the manuscript.

Corresponding author

Correspondence to Rens van de Schoot .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Peer review information Nature Machine Intelligence thanks Jian Wu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information.

Overview of software tools supporting systematic reviews.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

van de Schoot, R., de Bruin, J., Schram, R. et al. An open source machine learning framework for efficient and transparent systematic reviews. Nat Mach Intell 3 , 125–133 (2021). https://doi.org/10.1038/s42256-020-00287-7

Download citation

Received : 04 June 2020

Accepted : 17 December 2020

Published : 01 February 2021

Issue Date : February 2021

DOI : https://doi.org/10.1038/s42256-020-00287-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

A systematic review, meta-analysis, and meta-regression of the prevalence of self-reported disordered eating and associated factors among athletes worldwide.

  • Hadeel A. Ghazzawi
  • Lana S. Nimer
  • Haitham Jahrami

Journal of Eating Disorders (2024)

Systematic review using a spiral approach with machine learning

  • Amirhossein Saeidmehr
  • Piers David Gareth Steel
  • Faramarz F. Samavati

Systematic Reviews (2024)

The spatial patterning of emergency demand for police services: a scoping review

  • Samuel Langton
  • Stijn Ruiter
  • Linda Schoonmade

Crime Science (2024)

The SAFE procedure: a practical stopping heuristic for active learning-based screening in systematic reviews and meta-analyses

  • Josien Boetje
  • Rens van de Schoot

Effect of capacity building interventions on classroom teacher and early childhood educator perceived capabilities, knowledge, and attitudes relating to physical activity and fundamental movement skills: a systematic review and meta-analysis

  • Matthew Bourke
  • Ameena Haddara
  • Patricia Tucker

BMC Public Health (2024)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

search algorithm literature review

Online ordering is currently unavailable due to technical issues. We apologise for any delays responding to customers while we resolve this. For further updates please visit our website: https://www.cambridge.org/news-and-insights/technical-incident Due to planned maintenance there will be periods of time where the website may be unavailable. We apologise for any inconvenience.

We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings .

Login Alert

search algorithm literature review

  • > Journals
  • > BJPsych Advances
  • > Volume 24 Issue 2
  • > How to carry out a literature search for a systematic...

search algorithm literature review

Article contents

  • LEARNING OBJECTIVES
  • DECLARATION OF INTEREST

Defining the clinical question

Scoping search, search strategy, sources to search, developing a search strategy, searching electronic databases, supplementary search techniques, obtaining unpublished literature, conclusions, how to carry out a literature search for a systematic review: a practical guide.

Published online by Cambridge University Press:  01 March 2018

Performing an effective literature search to obtain the best available evidence is the basis of any evidence-based discipline, in particular evidence-based medicine. However, with a vast and growing volume of published research available, searching the literature can be challenging. Even when journals are indexed in electronic databases, it can be difficult to identify all relevant studies without an effective search strategy. It is also important to search unpublished literature to reduce publication bias, which occurs from a tendency for authors and journals to preferentially publish statistically significant studies. This article is intended for clinicians and researchers who are approaching the field of evidence synthesis and would like to perform a literature search. It aims to provide advice on how to develop the search protocol and the strategy to identify the most relevant evidence for a given research or clinical question. It will also focus on how to search not only the published but also the unpublished literature using a number of online resources.

• Understand the purpose of conducting a literature search and its integral part of the literature review process

• Become aware of the range of sources that are available, including electronic databases of published data and trial registries to identify unpublished data

• Understand how to develop a search strategy and apply appropriate search terms to interrogate electronic databases or trial registries

A literature search is distinguished from, but integral to, a literature review. Literature reviews are conducted for the purpose of (a) locating information on a topic or identifying gaps in the literature for areas of future study, (b) synthesising conclusions in an area of ambiguity and (c) helping clinicians and researchers inform decision-making and practice guidelines. Literature reviews can be narrative or systematic, with narrative reviews aiming to provide a descriptive overview of selected literature, without undertaking a systematic literature search. By contrast, systematic reviews use explicit and replicable methods in order to retrieve all available literature pertaining to a specific topic to answer a defined question (Higgins Reference Higgins and Green 2011 ). Systematic reviews therefore require a priori strategies to search the literature, with predefined criteria for included and excluded studies that should be reported in full detail in a review protocol.

Performing an effective literature search to obtain the best available evidence is the basis of any evidence-based discipline, in particular evidence-based medicine (Sackett Reference Sackett 1997 ; McKeever Reference McKeever, Nguyen and Peterson 2015 ). However, with a vast and growing volume of published research available, searching the literature can be challenging. Even when journals are indexed in electronic databases, it can be difficult to identify all relevant studies without an effective search strategy (Hopewell Reference Hopewell, Clarke and Lefebvre 2007 ). In addition, unpublished data and ‘grey’ literature (informally published material such as conference abstracts) are now becoming more accessible to the public. It is important to search unpublished literature to reduce publication bias, which occurs because of a tendency for authors and journals to preferentially publish statistically significant studies (Dickersin Reference Dickersin and Min 1993 ). Efforts to locate unpublished and grey literature during the search process can help to reduce bias in the results of systematic reviews (Song Reference Song, Parekh and Hooper 2010 ). A paradigmatic example demonstrating the importance of capturing unpublished data is that of Turner et al ( Reference Turner, Matthews and Linardatos 2008 ), who showed that using only published data in their meta-analysis led to effect sizes for antidepressants that were one-third (32%) larger than effect sizes derived from combining both published and unpublished data. Such differences in findings from published and unpublished data can have real-life implications in clinical decision-making and treatment recommendation. In another relevant publication, Whittington et al ( Reference Whittington, Kendall and Fonagy 2004 ) compared the risks and benefits of selective serotonin reuptake inhibitors (SSRIs) in the treatment of depression in children. They found that published data suggested favourable risk–benefit profiles for SSRIs in this population, but the addition of unpublished data indicated that risk outweighed treatment benefits. The relative weight of drug efficacy to side-effects can be skewed if there has been a failure to search for, or include, unpublished data.

In this guide for clinicians and researchers on how to perform a literature search we use a working example about efficacy of an intervention for bipolar disorder to demonstrate the search techniques outlined. However, the overarching methods described are purposefully broad to make them accessible to all clinicians and researchers, regardless of their research or clinical question.

The review question will guide not only the search strategy, but also the conclusions that can be drawn from the review, as these will depend on which studies or other forms of evidence are included and excluded from the literature review. A narrow question will produce a narrow and precise search, perhaps resulting in too few studies on which to base a review, or be so focused that the results are not useful in wider clinical settings. Using an overly narrow search also increases the chances of missing important studies. A broad question may produce an imprecise search, with many false-positive search results. These search results may be too heterogeneous to evaluate in one review. Therefore from the outset, choices should be made about the remit of the review, which will in turn affect the search.

A number of frameworks can be used to break the review question into concepts. One such is the PICO (population, intervention, comparator and outcome) framework, developed to answer clinical questions such as the effectiveness of a clinical intervention (Richardson Reference Richardson, Wilson and Nishikawa 1995 ). It is noteworthy that ‘outcome’ concepts of the PICO framework are less often used in a search strategy as they are less well defined in the titles and abstracts of available literature (Higgins Reference Higgins and Green 2011 ). Although PICO is widely used, it is not a suitable framework for identifying key elements of all questions in the medical field, and minor adaptations are necessary to enable the structuring of different questions. Other frameworks exist that may be more appropriate for questions about health policy and management, such as ECLIPSE (expectation, client group, location, impact, professionals, service) (Wildridge Reference Wildridge and Bell 2002 ) or SPICE (setting, perspective, intervention, comparison, evaluation) for service evaluation (Booth Reference Booth 2006 ). A detailed overview of frameworks is provided in Davies ( Reference Davies 2011 ).

Before conducting a comprehensive literature search, a scoping search of the literature using just one or two databases (such as PubMed or MEDLINE) can provide valuable information as to how much literature for a given review question already exists. A scoping search may reveal whether systematic reviews have already been undertaken for a review question. Caution should be taken, however, as systematic reviews that may appear to ask the same question may have differing inclusion and exclusion criteria for studies included in the review. In addition, not all systematic reviews are of the same quality. If the original search strategy is of poor quality methodologically, original data are likely to have been missed and the search should not simply be updated (compare, for example, Naughton et al ( Reference Naughton, Clarke and O'Leary 2014 ) and Caddy et al ( Reference Caddy, Amit and McCloud 2015 ) on ketamine for treatment-resistant depression).

The first step in conducting a literature search should be to develop a search strategy. The search strategy should define how relevant literature will be identified. It should identify sources to be searched (list of databases and trial registries) and keywords used in the literature (list of keywords). The search strategy should be documented as an integral part of the systematic review protocol. Just as the rest of a well-conducted systematic review, the search strategy used needs to be explicit and detailed such that it could reproduced using the same methodology, with exactly the same results, or updated at a later time. This not only improves the reliability and accuracy of the review, but also means that if the review is replicated, the difference in reviewers should have little effect, as they will use an identical search strategy. The PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement was developed to standardise the reporting of systematic reviews (Moher Reference Moher, Liberati and Tetzlaff 2009 ). The PRISMA statement consists of a 27-item checklist to assess the quality of each element of a systematic review (items 6, 7 and 8 relate to the quality of literature searching) and also to guide authors when reporting their findings.

There are a number of databases that can be searched for literature, but the identification of relevant sources is dependent on the clinical or research question (different databases have different focuses, from more biology to more social science oriented) and the type of evidence that is sought (i.e. some databases report only randomised controlled trials).

• MEDLINE and Embase are the two main biomedical literature databases. MEDLINE contains more than 22 million references from more than 5600 journals worldwide. In addition, the MEDLINE In-Process & Other Non-Indexed Citations database holds references before they are published on MEDLINE. Embase has a strong coverage of drug and pharmaceutical research and provides over 30 million references from more than 8500 currently published journals, 2900 of which are not in MEDLINE. These two databases, however, are only available to either individual subscribers or through institutional access such as universities and hospitals. PubMed, developed by the National Center for Biotechnology Information of the US National Library of Medicine, provides access to a free version of MEDLINE and is accessible to researchers, clinicians and the public. PubMed comprises medical and biomedical literature indexed in MEDLINE, but provides additional access to life science journals and e-books.

In addition, there are a number of subject- and discipline-specific databases.

• PsycINFO covers a range of psychological, behavioural, social and health sciences research.

• The Cochrane Central Register of Controlled Trials (CENTRAL) hosts the most comprehensive source of randomised and quasi-randomised controlled trials. Although some of the evidence on this register is also included in Embase and MEDLINE, there are over 150 000 reports indexed from other sources, such as conference proceedings and trial registers, that would otherwise be less accessible (Dickersin Reference Dickersin, Manheimer and Wieland 2002 ).

• The Cumulative Index to Nursing and Allied Health Literature (CINAHL), British Nursing Index (BNI) and the British Nursing Database (formerly BNI with Full Text) are databases relevant to nursing, but they span literature across medical, allied health, community and health management journals.

• The Allied and Complementary Medicine Database (AMED) is a database specifically for alternative treatments in medicine.

The examples of specific databases given here are by no means exhaustive, but they are popular and likely to be used for literature searching in medicine, psychiatry and psychology. Website links for these databases are given in Box 1 , along with links to resources not mentioned above. Box 1 also provides a website link to a couple of video tutorials for searching electronic databases. Box 2 shows an example of the search sources chosen for a review of a pharmacological intervention of calcium channel antagonists in bipolar disorder, taken from a recent systematic review (Cipriani Reference Cipriani, Saunders and Attenburrow 2016a ).

BOX 1 Website links of search sources to obtain published and unpublished literature

Electronic databases

• MEDLINE/PubMed: www.ncbi.nlm.nih.gov/pubmed

• Embase: www.embase.com

• PsycINFO: www.apa.org/psycinfo

• Cochrane Central Register of Controlled Trials (CENTRAL): www.cochranelibrary.com

• Cumulative Index of Nursing and Allied Health Literature (CINAHL): www.cinahl.com

• British Nursing Index: www.bniplus.co.uk

• Allied and Complementary Medicine Database: https://www.ebsco.com/products/research-databases/amed-the-allied-and-complementary-medicine-database

Grey literature databases

• BIOSIS Previews (part of Thomson Reuters Web of Science): https://apps.webofknowledge.com

Trial registries

• ClinicalTrials.gov: www.clinicaltrials.gov

• Drugs@FDA: www.accessdata.fda.gov/scripts/cder/daf

• European Medicines Agency (EMA): www.ema.europa.eu

• World Health Organization International Clinical Trials Registry Platform (WHO ICTRP): www.who.int/ictrp

• GlaxoSmithKline Study Register: www.gsk-clinicalstudyregister.com

• Eli-Lilly clinical trial results: https://www.lilly.com/clinical-study-report-csr-synopses

Guides to further resources

• King's College London Library Services: http://libguides.kcl.ac.uk/ld.php?content_id=17678464

• Georgetown University Medical Center Dahlgren Memorial Library: https://dml.georgetown.edu/core

• University of Minnesota Biomedical Library: https://hsl.lib.umn.edu/biomed/help/nursing

Tutorial videos

• Searches in electronic databases: http://library.buffalo.edu/hsl/services/instruction/tutorials.html

• Using the Yale MeSH Analyzer tool: http://library.medicine.yale.edu/tutorials/1559

BOX 2 Example of search sources chosen for a review of calcium channel antagonists in bipolar disorder (Cipriani Reference Cipriani, Saunders and Attenburrow 2016a )

Electronic databases searched:

• MEDLINE In-Process and Other Non-Indexed Citations

For a comprehensive search of the literature it has been suggested that two or more electronic databases should be used (Suarez-Almazor Reference Suarez-Almazor, Belseck and Homik 2000 ). Suarez-Almazor and colleagues demonstrated that, in a search for controlled clinical trials (CCTs) for rheumatoid arthritis, osteoporosis and lower back pain, only 67% of available citations were found by both Embase and MEDLINE. Searching MEDLINE alone would have resulted in 25% of available CCTs being missed and searching Embase alone would have resulted in 15% of CCTs being missed. However, a balance between the sensitivity of a search (an attempt to retrieve all relevant literature in an extensive search) and the specificity of a search (an attempt to retrieve a more manageable number of relevant citations) is optimal. In addition, supplementing electronic database searches with unpublished literature searches (see ‘Obtaining unpublished literature’ below) is likely to reduce publication bias. The capacity of the individuals or review team is likely largely to determine the number of sources searched. In all cases, a clear rationale should be outlined in the review protocol for the sources chosen (the expertise of an information scientist is valuable in this process).

Important methodological considerations (such as study design) may also be included in the search strategy. Dependent on the databases and supplementary sources chosen, filters can be used to search the literature by study design (see ‘Searching electronic databases’). For instance, if the search strategy is confined to one study design term only (e.g. randomised controlled trial, RCT), only the articles labelled in this way will be selected. However, it is possible that in the database some RCTs are not labelled as such, so they will not be picked up by the filtered search. Filters can help reduce the number of references retrieved by the search, but using just one term is not 100% sensitive, especially if only one database is used (i.e. MEDLINE). It is important for systematic reviewers to know how reliable such a strategy can be and treat the results with caution.

Identifying search terms

Standardised search terms are thesaurus and indexing terms that are used by electronic databases as a convenient way to categorise articles, allowing for efficient searching. Individual database records may be assigned several different standardised search terms that describe the same or similar concepts (e.g. bipolar disorder, bipolar depression, manic–depressive psychosis, mania). This has the advantage that even if the original article did not use the standardised term, when the article is catalogued in a database it is allocated that term (Guaiana Reference Guaiana, Barbui and Cipriani 2010 ). For example, an older paper might refer to ‘manic depression’, but would be categorised under the term ‘bipolar disorder’ when catalogued in MEDLINE. These standardised search terms are called MeSH (medical subject headings) in MEDLINE and PubMed, and Emtree in Embase, and are organised in a hierarchal structure ( Fig. 1 ). In both MEDLINE and Embase an ‘explode’ command enables the database to search for a requested term, as well as specific related terms. Both narrow and broader search terms can be viewed and selected to be included in the search if appropriate to a topic. The Yale MeSH Analyzer tool ( mesh.med.yale.edu ) can be used to help identify potential terms and phrases to include in a search. It is also useful to understand why relevant articles may be missing from an initial search, as it produces a comparison grid of MeSH terms used to index each article (see Box 1 for a tutorial video link).

search algorithm literature review

FIG 1 Search terms and hierarchical structure of MeSH (medical subject heading) in MEDLINE and PubMed.

In addition, MEDLINE also distinguishes between MeSH headings (MH) and publication type (PT) terms. Publication terms are less about the content of an article than about its type, specifying for example a review article, meta-analysis or RCT.

Both MeSH and Emtree have their own peculiarities, with variations in thesaurus and indexing terms. In addition, not all concepts are assigned standardised search terms, and not all databases use this method of indexing the literature. It is advisable to check the guidelines of selected databases before undertaking a search. In the absence of a MeSH heading for a particular term, free-text terms could be used.

Free-text terms are used in natural language and are not part of a database’s controlled vocabulary. Free-text terms can be used in addition to standardised search terms in order to identify as many relevant records as possible (Higgins Reference Higgins and Green 2011 ). Using free-text terms allows the reviewer to search using variations in language or spelling (e.g. hypomani* or mania* or manic* – see truncation and wildcard functions below and Fig. 2 ). A disadvantage of free-text terms is that they are only searched for in the title and abstracts of database records, and not in the full texts, meaning that when a free-text word is used only in the body of an article, it will not be retrieved in the search. Additionally, a number of specific considerations should be taken into account when selecting and using free-text terms:

• synonyms, related terms and alternative phrases (e.g. mood instability, affective instability, mood lability or emotion dysregulation)

• abbreviations or acronyms in medical and scientific research (e.g. magnetic resonance imaging or MRI)

• lay and medical terminology (e.g. high blood pressure or hypertension)

• brand and generic drug names (e.g. Prozac or fluoxetine)

• variants in spelling (e.g. UK English and American English: behaviour or behavior; paediatric or pediatric).

search algorithm literature review

FIG 2 Example of a search strategy about bipolar disorder using MEDLINE (Cipriani Reference Cipriani, Saunders and Attenburrow 2016a ). The strategy follows the PICO framework and includes MeSH terms, free-text keywords and a number of other techniques, such as truncation, that have been outlined in this article. Numbers in bold give the number of citations retrieved by each search.

Truncation and wildcard functions can be used in most databases to capture variations in language:

• truncation allows the stem of a word that may have variant endings to be searched: for example, a search for depress* uses truncation to retrieve articles that mention both depression and depressive; truncation symbols may vary by database, but common symbols include: *, ! and #

• wild cards substitute one letter within a word to retrieve alternative spellings: for example, ‘wom?n’ would retrieve the terms ‘woman’ and ‘women’.

Combining search terms

Search terms should be combined in the search strategy using Boolean operators. Boolean operators allow standardised search terms and free-text terms to be combined. There are three main Boolean operators – AND, OR and NOT ( Fig. 3 ).

• OR – this operator is used to broaden a search, finding articles that contain at least one of the search terms within a concept. Sets of terms can be created for each concept, for example the population of interest: (bipolar disorder OR bipolar depression). Parentheses are used to build up search terms, with words within parentheses treated as a unit.

• AND – this can be used to join sets of concepts together, narrowing the retrieved literature to articles that contain all concepts, for example the population or condition of interest and the intervention to be evaluated: (bipolar disorder OR bipolar depression) AND calcium channel blockers. However, if at least one term from each set of concepts is not identified from the title or abstract of an article, this article will not be identified by the search strategy. It is worth mentioning here that some databases can run the search also across the full texts. For example, ScienceDirect and most publishing houses allow this kind of search, which is much more comprehensive than abstract or title searches only.

• NOT – this operator, used less often, can focus a search strategy so that it does not retrieve specific literature, for example human studies NOT animal studies. However, in certain cases the NOT operator can be too restrictive, for example if excluding male gender from a population, using ‘NOT male’ would also mean that any articles about both males and females are not obtained by the search.

search algorithm literature review

FIG 3 Example of Boolean operator concepts (the resulting search is the light red shaded area).

The conventions of each database should be checked before undertaking a literature search, as functions and operators may differ slightly between them (Cipriani Reference Cipriani, Saunders and Attenburrow 2016b ). This is particularly relevant when using limits and filters. Figure 2 shows an example search strategy incorporating many of the concepts described above. The search strategy is taken from Cipriani et al ( Reference Cipriani, Zhou and Del Giovane 2016a ), but simplified to include only one intervention.

Search filters

A number of filters exist to focus a search, including language, date and study design or study focus filters. Language filters can restrict retrieval of articles to the English language, although if language is not an inclusion criterion it should not be restricted, to avoid language bias. Date filters can be used to restrict the search to literature from a specified period, for example if an intervention was only made available after a certain date. In addition, if good systematic reviews exist that are likely to capture all relevant literature (as advised by an information specialist), date restrictions can be used to search additional literature published after the date of that included in the systematic review. In the same way, date filters can be used to update a literature search since the last time it was conducted. Reviewing the literature should be a timely process (new and potentially relevant evidence is produced constantly) and updating the search is an important step, especially if collecting evidence to inform clinical decision-making, as publications in the field of medicine are increasing at an impressive rate (Barber Reference Barber, Corsi and Furukawa 2016 ). The filters chosen will depend on the research question and nature of evidence that is sought through the literature search and the guidelines of the individual database that is used.

  • Google Scholar

Google Scholar allows basic Boolean operators to be used in strings of search terms. However, the search engine does not use standardised search terms that have been tagged as in traditional databases and therefore variations of keywords should always be searched. There are advantages and disadvantages to using a web search engine such as Google Scholar. Google Scholar searches the full text of an article for keywords and also searches a wider range of sources, such as conference proceedings and books, that are not found in traditional databases, making it a good resource to search for grey literature (Haddaway Reference Haddaway, Collins and Coughlin 2015 ). In addition, Google Scholar finds articles cited by other relevant articles produced in the search. However, variable retrieval of content (due to regular updating of Google algorithms and the individual's search history and location) means that search results are not necessarily reproducible and are therefore not in keeping with replicable search methods required by systematic reviews. Google Scholar alone has not been shown to retrieve more literature than other traditional databases discussed in this article and therefore should be used in addition to other sources (Bramer Reference Bramer, Giustini and Kramer 2016 ).

Citation searching

Once the search strategy has identified relevant literature, the reference lists in these sources can be searched. This is called citation searching or backward searching, and it can be used to see where particular research topics led others. This method is particularly useful if the search identifies systematic reviews or meta-analyses of a similar topic.

Conference abstracts

Conference abstracts are considered ‘grey literature’, i.e. literature that is not formally published in journals or books (Alberani Reference Alberani, De Castro Pietrangeli and Mazza 1990 ). Scherer and colleagues found that only 52.6% of all conference abstracts go on to full publication of results, and factors associated with publication were studies that had RCT designs and the reporting of positive or significant results (Scherer Reference Scherer, Langenberg and von Elm 2007 ). Therefore, failure to search relevant grey literature might miss certain data and bias the results of a review. Although conference abstracts are not indexed in most major electronic databases, they are available in databases such as BIOSIS Previews ( Box 1 ). However, as with many unpublished studies, these data did not undergo the peer review process that is often a tool for assessing and possibly improving the quality of the publication.

Searching trial registers and pharmaceutical websites

For reviews of trial interventions, a number of trial registers exist. ClinicalTrials.gov ( clinicaltrials.gov ) provides access to information on public and privately conducted clinical trials in humans. Results for both published and unpublished studies can be found for many trials on the register, in addition to information about studies that are ongoing. Searching each trial register requires a slightly different search strategy, but many of the basic principles described above still apply. Basic searches on ClinicialTrials.gov include searching by condition, specific drugs or interventions and these can be linked using Boolean operators: for example, (bipolar disorder OR manic depressive disorder) AND lithium. As mentioned above, parentheses can be used to build up search terms. More advanced searches allow one to specify further search fields such as the status of studies, study type and age of participants. The US Food and Drug Administration (FDA) hosts a database providing information about FDA-approved drugs, therapeutic products and devices ( www.fda.gov ). The database (with open access to anyone, not only in the USA) can be searched by the drug name, its active ingredient or its approval application number and, for most drugs approved in the past 20 years or so, a review of clinical trial results (some of which remain unpublished) used as evidence in the approval process is available. The European Medicines Agency (EMA) hosts a similar register for medicines developed for use in the European Union ( www.ema.europa.eu ). An internet search will show that many other national and international trial registers exist that, depending on the review question, may be relevant search sources. The World Health Organization International Clinical Trials Registry Platform (WHO ICTRP; www.who.int/ictrp ) provides access to a central database bringing a number of these national and international trial registers together. It can be searched in much the same way as ClinicalTrials.gov.

A number of pharmaceutical companies now share data from company-sponsored clinical trials. GlaxoSmithKline (GSK) is transparent in the sharing of its data from clinical studies and hosts its own clinical study register ( www.gsk-clinicalstudyregister.com ). Eli-Lilly provides clinical trial results both on its website ( www.lillytrialguide.com ) and in external registries. However, other pharmaceutical companies, such as Wyeth and Roche, divert users to clinical trial results in external registries. These registries include both published and previously unpublished studies. Searching techniques differ for each company and hand-searching through documents is often required to identify studies.

Communication with authors

Direct communication with authors of published papers could produce both additional data omitted from published studies and other unpublished studies. Contact details are usually available for the corresponding author of each paper. Although high-quality reviews do make efforts to obtain and include unpublished data, this does have potential disadvantages: the data may be incomplete and are likely not to have been peer-reviewed. It is also important to note that, although reviewers should make every effort to find unpublished data in an effort to minimise publication bias, there is still likely to remain a degree of this bias in the studies selected for a systematic review.

Developing a literature search strategy is a key part of the systematic review process, and the conclusions reached in a systematic review will depend on the quality of the evidence retrieved by the literature search. Sources should therefore be selected to minimise the possibility of bias, and supplementary search techniques should be used in addition to electronic database searching to ensure that an extensive review of the literature has been carried out. It is worth reminding that developing a search strategy should be an iterative and flexible process (Higgins Reference Higgins and Green 2011 ), and only by conducting a search oneself will one learn about the vast literature available and how best to capture it.

Acknowledgements

We thank Sarah Stockton for her help in drafting this article. Andrea Cipriani is supported by the NIHR Oxford cognitive health Clinical Research Facility.

Select the single best option for each question stem

a an explicit and replicable method used to retrieve all available literature pertaining to a specific topic to answer a defined question

b a descriptive overview of selected literature

c an initial impression of a topic which is understood more fully as a research study is conducted

d a method of gathering opinions of all clinicians or researchers in a given field

e a step-by-step process of identifying the earliest published literature through to the latest published literature.

a does not need to be specified in advance of a literature search

b does not need to be reported in a systematic literature review

c defines which sources of literature are to be searched, but not how a search is to be carried out

d defines how relevant literature will be identified and provides a basis for the search strategy

e provides a timeline for searching each electronic database or unpublished literature source.

a the Cochrane Central Register of Controlled Trials (CENTRAL)

d the Cumulative Index to Nursing and Allied Health Literature (CINAHL)

e the British Nursing Index.

a bipolar disorder OR treatment

b bipolar* OR treatment

c bipolar disorder AND treatment

d bipolar disorder NOT treatment

e (bipolar disorder) OR (treatment).

a publication bias

b funding bias

c language bias

d outcome reporting bias

e selection bias.

MCQ answers

1 a 2 d 3 b 4 c 5 a

Figure 0

FIG 2 Example of a search strategy about bipolar disorder using MEDLINE (Cipriani 2016a). The strategy follows the PICO framework and includes MeSH terms, free-text keywords and a number of other techniques, such as truncation, that have been outlined in this article. Numbers in bold give the number of citations retrieved by each search.

Figure 2

This article has been cited by the following publications. This list is generated based on data provided by Crossref .

View all Google Scholar citations for this article.

Save article to Kindle

To save this article to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle .

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • Volume 24, Issue 2
  • Lauren Z. Atkinson and Andrea Cipriani
  • DOI: https://doi.org/10.1192/bja.2017.3

Save article to Dropbox

To save this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Dropbox account. Find out more about saving content to Dropbox .

Save article to Google Drive

To save this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Google Drive account. Find out more about saving content to Google Drive .

Reply to: Submit a response

- No HTML tags allowed - Web page URLs will display as text only - Lines and paragraphs break automatically - Attachments, images or tables are not permitted

Your details

Your email address will be used in order to notify you when your comment has been reviewed by the moderator and in case the author(s) of the article or the moderator need to contact you directly.

You have entered the maximum number of contributors

Conflicting interests.

Please list any fees and grants from, employment by, consultancy for, shared ownership in or any close relationship with, at any time over the preceding 36 months, any organisation whose interests may be affected by the publication of the response. Please also list any non-financial associations or interests (personal, professional, political, institutional, religious or other) that a reasonable reader would want to know about in relation to the submitted work. This pertains to all the authors of the piece, their spouses or partners.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

A systematic approach to searching: an efficient and complete method to develop literature searches

Affiliations.

  • 1 Biomedical Information Specialist, Medical Library, Erasmus MC-Erasmus University Medical Centre, Rotterdam, The Netherlands.
  • 2 Medical Library, Erasmus MC-Erasmus University Medical Centre, Rotterdam, The Netherlands.
  • 3 Spencer S. Eccles Health Sciences Library, University of Utah, Salt Lake City, UT.
  • 4 Department of Family Medicine, School for Public Health and Primary Care (CAPHRI), Maastricht University, Maastricht, The Netherlands, and Kleijnen Systematic Reviews, York, United Kingdom.
  • PMID: 30271302
  • PMCID: PMC6148622
  • DOI: 10.5195/jmla.2018.283

Creating search strategies for systematic reviews, finding the best balance between sensitivity and specificity, and translating search strategies between databases is challenging. Several methods describe standards for systematic search strategies, but a consistent approach for creating an exhaustive search strategy has not yet been fully described in enough detail to be fully replicable. The authors have established a method that describes step by step the process of developing a systematic search strategy as needed in the systematic review. This method describes how single-line search strategies can be prepared in a text document by typing search syntax (such as field codes, parentheses, and Boolean operators) before copying and pasting search terms (keywords and free-text synonyms) that are found in the thesaurus. To help ensure term completeness, we developed a novel optimization technique that is mainly based on comparing the results retrieved by thesaurus terms with those retrieved by the free-text search words to identify potentially relevant candidate search terms. Macros in Microsoft Word have been developed to convert syntaxes between databases and interfaces almost automatically. This method helps information specialists in developing librarian-mediated searches for systematic reviews as well as medical and health care practitioners who are searching for evidence to answer clinical questions. The described method can be used to create complex and comprehensive search strategies for different databases and interfaces, such as those that are needed when searching for relevant references for systematic reviews, and will assist both information specialists and practitioners when they are searching the biomedical literature.

PubMed Disclaimer

Schema for determining the optimal…

Schema for determining the optimal order of elements

Schematic representation of translation between…

Schematic representation of translation between databases used at Erasmus University Medical Center Dotted…

Similar articles

  • Evaluation of a new method for librarian-mediated literature searches for systematic reviews. Bramer WM, Rethlefsen ML, Mast F, Kleijnen J. Bramer WM, et al. Res Synth Methods. 2018 Dec;9(4):510-520. doi: 10.1002/jrsm.1279. Epub 2017 Nov 28. Res Synth Methods. 2018. PMID: 29073718 Free PMC article.
  • Searching biomedical databases on complementary medicine: the use of controlled vocabulary among authors, indexers and investigators. Murphy LS, Reinsch S, Najm WI, Dickerson VM, Seffinger MA, Adams A, Mishra SI. Murphy LS, et al. BMC Complement Altern Med. 2003 Jul 7;3:3. doi: 10.1186/1472-6882-3-3. Epub 2003 Jul 7. BMC Complement Altern Med. 2003. PMID: 12846931 Free PMC article.
  • Comparison of Medical Subject Headings and text-word searches in MEDLINE to retrieve studies on sleep in healthy individuals. Jenuwine ES, Floyd JA. Jenuwine ES, et al. J Med Libr Assoc. 2004 Jul;92(3):349-53. J Med Libr Assoc. 2004. PMID: 15243641 Free PMC article.
  • Developing efficient search strategies to identify reports of adverse effects in MEDLINE and EMBASE. Golder S, McIntosh HM, Duffy S, Glanville J; Centre for Reviews and Dissemination and UK Cochrane Centre Search Filters Design Group. Golder S, et al. Health Info Libr J. 2006 Mar;23(1):3-12. doi: 10.1111/j.1471-1842.2006.00634.x. Health Info Libr J. 2006. PMID: 16466494 Review.
  • An evidence-based practice guideline for the peer review of electronic search strategies. Sampson M, McGowan J, Cogo E, Grimshaw J, Moher D, Lefebvre C. Sampson M, et al. J Clin Epidemiol. 2009 Sep;62(9):944-52. doi: 10.1016/j.jclinepi.2008.10.012. Epub 2009 Feb 20. J Clin Epidemiol. 2009. PMID: 19230612 Review.
  • The psychosocial implication of childhood constipation on the children and family: A scoping review protocol. McCague Y, Somanadhan S, Stokes D, Furlong E. McCague Y, et al. HRB Open Res. 2023 Sep 13;6:48. doi: 10.12688/hrbopenres.13713.1. eCollection 2023. HRB Open Res. 2023. PMID: 38812827 Free PMC article.
  • Use of social network analysis in health research: a scoping review protocol. Grewal E, Godley J, Wheeler J, Tang KL. Grewal E, et al. BMJ Open. 2024 May 24;14(5):e078872. doi: 10.1136/bmjopen-2023-078872. BMJ Open. 2024. PMID: 38803244 Free PMC article.
  • Social Representativeness and Intervention Adherence-A Systematic Review of Clinical Physical Activity Trials in Breast Cancer Patients. Stalsberg R, Darvik MD. Stalsberg R, et al. Int J Public Health. 2024 May 9;69:1607002. doi: 10.3389/ijph.2024.1607002. eCollection 2024. Int J Public Health. 2024. PMID: 38784387 Free PMC article. Review.
  • Identifying meta-research with researchers as study subjects: Protocol for a scoping review. Laynor G, Stevens ER. Laynor G, et al. PLoS One. 2024 May 20;19(5):e0303905. doi: 10.1371/journal.pone.0303905. eCollection 2024. PLoS One. 2024. PMID: 38768101 Free PMC article. Review.
  • Does moral distress in emergency department nurses contribute to intentions to leave their post, specialisation, or profession: A systematic review. Boulton O, Farquharson B. Boulton O, et al. Int J Nurs Stud Adv. 2023 Nov 8;6:100164. doi: 10.1016/j.ijnsa.2023.100164. eCollection 2024 Jun. Int J Nurs Stud Adv. 2023. PMID: 38746824 Free PMC article. Review.
  • Harris MR. The librarian’s roles in the systematic review process: a case study. J Med Libr Assoc. 2005 Jan;93(1):81–7. - PMC - PubMed
  • Higgins JPT, Green S, editors. Wiley Online Library. 2008. Cochrane handbook for systematic reviews of interventions.
  • University of York, N.H.S. Centre for Reviews and Dissemination. Systematic reviews: CRD’s guidance for undertaking reviews in health care. York, UK: CRD, University of York; 2009.
  • Sampson M, McGowan J, Cogo E, Grimshaw J, Moher D, Lefebvre C. An evidence-based practice guideline for the peer review of electronic search strategies. J Clin Epidemiol. 2009 Sep;62(9):944–52. doi: 10.1016/j.jclinepi.2008.10.012. - DOI - PubMed
  • Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. J Clin Epidemiol. 2009 Oct;62(10):1006–12. doi: 10.1016/j.jclinepi.2009.06.005. - DOI - PubMed
  • Search in MeSH

Related information

  • Cited in Books

LinkOut - more resources

Full text sources.

  • Europe PubMed Central
  • PubMed Central
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

  • Research article
  • Open access
  • Published: 15 February 2021

Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making

  • Alan Brnabic 1 &
  • Lisa M. Hess   ORCID: orcid.org/0000-0003-3631-3941 2  

BMC Medical Informatics and Decision Making volume  21 , Article number:  54 ( 2021 ) Cite this article

29k Accesses

54 Citations

3 Altmetric

Metrics details

Machine learning is a broad term encompassing a number of methods that allow the investigator to learn from the data. These methods may permit large real-world databases to be more rapidly translated to applications to inform patient-provider decision making.

This systematic literature review was conducted to identify published observational research of employed machine learning to inform decision making at the patient-provider level. The search strategy was implemented and studies meeting eligibility criteria were evaluated by two independent reviewers. Relevant data related to study design, statistical methods and strengths and limitations were identified; study quality was assessed using a modified version of the Luo checklist.

A total of 34 publications from January 2014 to September 2020 were identified and evaluated for this review. There were diverse methods, statistical packages and approaches used across identified studies. The most common methods included decision tree and random forest approaches. Most studies applied internal validation but only two conducted external validation. Most studies utilized one algorithm, and only eight studies applied multiple machine learning algorithms to the data. Seven items on the Luo checklist failed to be met by more than 50% of published studies.

Conclusions

A wide variety of approaches, algorithms, statistical software, and validation strategies were employed in the application of machine learning methods to inform patient-provider decision making. There is a need to ensure that multiple machine learning approaches are used, the model selection strategy is clearly defined, and both internal and external validation are necessary to be sure that decisions for patient care are being made with the highest quality evidence. Future work should routinely employ ensemble methods incorporating multiple machine learning algorithms.

Peer Review reports

Traditional methods of analyzing large real-world databases (big data) and other observational studies are focused on the outcomes that can inform at the population-based level. The findings from real-world studies are relevant to populations as a whole, but the ability to predict or provide meaningful evidence at the patient level is much less well established due to the complexity with which clinical decision making is made and the variety of factors taken into account by the health care provider [ 1 , 2 ]. Using traditional methods that produce population estimates and measures of variability, it is very challenging to accurately predict how any one patient will perform, even when applying findings from subgroup analyses. The care of patients is nuanced, and multiple non-linear, interconnected factors must be taken into account in decision making. When data are available that are only relevant at the population level, health care decision making is less informed as to the optimal course of care for a given patient.

Clinical prediction models are an approach to utilizing patient-level evidence to help inform healthcare decision makers about patient care. These models are also known as prediction rules or prognostic models and have been used for decades by health care professionals [ 3 ]. Traditionally, these models combine patient demographic, clinical and treatment characteristics in the form of a statistical or mathematical model, usually regression, classification or neural networks, but deal with a limited number of predictor variables (usually below 25). The Framingham Heart Study is a classic example of the use of longitudinal data to build a traditional decision-making model. Multiple risk calculators and estimators have been built to predict a patient’s risk of a variety of cardiovascular outcomes, such as atrial fibrillation and coronary heart disease [ 4 , 5 , 6 ]. In general, these studies use multivariable regression evaluating risk factors identified in the literature. Based on these findings, a scoring system is derived for each factor to predict the likelihood of an adverse outcome based on a patient’s score across all risk factors evaluated.

With the advent of more complex data collection and readily available data sets for patients in routine clinical care, both sample sizes and potential predictor variables (such as genomic data) can exceed the tens of thousands, thus establishing the need for alternative approaches to rapidly process a large amount of information. Artificial intelligence (AI), particularly machine learning methods (a subset of AI), are increasingly being utilized in clinical research for prediction models, pattern recognition and deep-learning techniques used to combine complex information for example genomic and clinical data [ 7 , 8 , 9 ]. In the health care sciences, these methods are applied to replace a human expert to perform tasks that would otherwise take considerable time and expertise, and likely result in potential error. The underlying concept is that a machine will learn by trial and error from the data itself, to make predictions without having a pre-defined set of rules for decision making. Simply, machine learning can simply be better understood as “learning from data.” [ 8 ].

There are two types of learning from the data, unsupervised and supervised. Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labelled responses. The most common unsupervised learning method is cluster analysis, which is used for exploratory data analysis to find hidden patterns or grouping in data. Supervised learning involves making a prediction based on a set of pre-specified input and output variables. There are a number of statistical tools used for supervised learning. Some examples include traditional statistical prediction methods like regression models (e.g. regression splines, projection pursuit regression, penalized regression) that involve fitting a model to data, evaluating the fit and estimating parameters that are later used in a predictive equation. Other tools include tree-based methods (e.g. classification and regression trees [CART] and random forests), which successively partition a data set based on the relationships between predictor variables and a target (outcome) variable. Other examples include neural networks, discriminant functions and linear classifiers, support vector classifiers and machines. Often, predictive tools are built using various forms of model aggregation (or ensemble learning) that may combine models based on resampled or re-weighted data sets. These different types of models can be fitted to the same data using model averaging.

Classical statistical regression methods used for prediction modeling are well understood in the statistical sciences and the scientific community that employs them. These methods tend to be transparent and are usually hypothesis driven but can overlook complex associations with limited flexibility when a high number of variables are investigated. In addition, when using classic regression modeling, choosing the ‘right’ model is not straightforward. Non-traditional machine learning algorithms, and machine learning approaches, may overcome some of these limitations of classical regression models in this new era of big data, but are not a complete solution as they must be considered in the context of the limitations of data used in the analysis [ 2 ].

While machine learning methods can be used for both population-based models as well as for informed patient-provider decision making, it is important to note that the data, model, and outputs used to inform the care of an individual patient must meet the highest standards of research quality, as the choice made will likely have an impact on both the long- and short-term patient outcomes. While a range of uncertainty can be expected for population-based estimates, the risk of error for patient level models must be minimized to ensure quality patient care. The risks and concerns of utilizing machine learning for individual patient decision making have been raised by ethicists [ 10 ]. The risks are not limited to the lack of transparency, limited data regarding the confidence of the findings, and the risk of reducing patient autonomy in choice by relying on data that may foster a more paternalistic model of healthcare. These are all important and valid concerns, and therefore the role of machine learning for patient care must meet the highest standards to ensure that shared, not simply informed, evidence-based decision making be supported by these methods.

A systematic literature review was published in 2018 that evaluated the statistical methods that have been used to enable large, real-world databases to be used at the patient-provider level [ 11 ]. Briefly, this study identified a total of 115 articles that evaluated the use of logistic regression (n = 52, 45.2%), Cox regression (n = 24, 20.9%), and linear regression (n = 17, 14.8%). However, an interesting observation noted several studies utilizing novel statistical approaches such as machine learning, recursive partitioning, and development of mathematical algorithms to predict patient outcomes. More recently, publications are emerging describing the use of Individualized Treatment Recommendation algorithms and Outcome Weighted Learning for personalized medicine using large observational databases [ 12 , 13 ]. Therefore, this systematic literature review was designed to further pursue this observation to more comprehensively evaluate the use of machine learning methods to support patient-provider decision making, and to critically evaluate the strengths and weaknesses of these methods. For the purposes of this work, data supporting patient-provider decision making was defined as that which provided information specifically on a treatment or intervention choice; while both population-based and risk estimator data are certainly valuable for patient care and decision making, this study was designed to evaluate data that would specifically inform a choice for the patient with the provider. The overarching goal is to provide evidence of how large datasets can be used to inform decisions at the patient level using machine learning-based methods, and to evaluate the quality of such work to support informed decision making.

This study originated from a systematic literature review that was conducted in MEDLINE and PsychInfo; a refreshed search was conducted in September 2020 to obtain newer publications (Table 1 ). Eligible studies were those that analyzed prospective or retrospective observational data, reported quantitative results, and described statistical methods specifically applicable to patient-level decision making. Specifically, patient-level decision making referred to studies that provided data for or against a particular intervention at the patient level, so that the data could be used to inform decision making at the patient-provider level. Studies did not meet this criterion if only a population-based estimates, mortality risk predictors, or satisfaction with care were evaluated. Additionally, studies designed to improve diagnostic tools and those evaluating health care system quality indicators did not meet the patient-provider decision-making criterion. Eligible statistical methods for this study were limited to machine learning-based approaches. Eligibility was assessed by two reviewers and any discrepancies were discussed; a third reviewer was available to serve as a tie breaker in case of different opinions. The final set of eligible publications were then abstracted into a Microsoft Excel document. Study quality was evaluated using a modified Luo scale, which was developed specifically as a tool to standardize high-quality publication of machine learning models [ 14 ]. A modified version of this tool was utilized for this study; specifically, the optional item were removed, and three terms were clarified: item 6 (define the prediction problem) was redefined as “define the model,” item 7 (prepare data for model building) was renamed “model building and validation,” and item 8 (build the predictive model) was renamed “model selection” to more succinctly state what was being evaluated under each criterion. Data were abstracted and both extracted data and the Luo checklist items were reviewed and verified by a second reviewer to ensure data comprehensiveness and quality. In all cases of differences in eligibility assessment or data entry, the reviewers met and ensured agreement with the final set of data to be included in the database for data synthesis, with a third reviewer utilized as a tie breaker in case of discrepancies. Data were summarized descriptively and qualitatively, based on the following categories: publication and study characteristics; patient characteristics; statistical methodologies used, including statistical software packages; strengths and weaknesses; and interpretation of findings.

The search strategy was run on September 1, 2020 and identified a total of 34 publications that utilized machine learning methods for individual patient-level decision making (Fig.  1 ). The most common reason for study exclusion, as expected, was due to the study not meeting the patient-level decision making criterion. A summary of the characteristics of eligible studies and the patient data are included in Table 2 . Most of the real-world data sources included retrospective databases or designs (n = 27, 79.4%), primarily utilizing electronic health records. Six analyses utilized prospective cohort studies and one utilized data from a cross sectional study.

figure 1

Prisma diagram of screening and study identification

General approaches to machine learning

The types of classification or prediction machine learning algorithms are reported in Table 2 . These included decision tree/random forest analyses (19 studies) [ 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 ] and neural networks (19 studies) [ 24 , 25 , 26 , 27 , 28 , 29 , 30 , 32 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 ]. Other approaches included latent growth mixture modeling [ 45 ], support vector machine classifiers [ 46 ], LASSO regression [ 47 ], boosting methods [ 23 ], and a novel Bayesian approach [ 26 , 40 , 48 ]. Within the analytical approaches to support machine learning, a variety of methods were used to evaluate model fit, such as Akaike Information Criterion, Bayesian Information Criterion, and the Lo-Mendel-Rubin likelihood ratio test [ 22 , 45 , 47 ], and while most studies included the area under the curve (AUC) of receiver-operator characteristic (ROC) curves (Table 3 ), analyses also included sensitivity/specificity [ 16 , 19 , 24 , 30 , 41 , 42 , 43 ], positive predictive value [ 21 , 26 , 32 , 38 , 40 , 41 , 42 , 43 ], and a variety of less common approaches such as the geometric mean [ 16 ], use of the Matthews correlation coefficient (ranges from -1.0, completely erroneous information, to + 1.0, perfect prediction) [ 46 ], defining true/false negatives/positives by means of a confusion matrix [ 17 ], calculating the root mean square error of the predicted versus original outcome profiles [ 37 ], or identifying the model with the best average performance training and performance cross validation [ 36 ].

Statistical software packages

The statistical programs used to perform machine learning varied widely across these studies, no consistencies were observed (Table 2 ). As noted above, one study using decision tree analysis used Quinlan’s C5.0 decision tree algorithm [ 15 ] while a second used an earlier version of this program (C4.5) [ 20 ]. Other decision tree analyses utilized various versions of R [ 18 , 19 , 22 , 24 , 27 , 47 ], International Business Machines (IBM) Statistical Package for the Social Sciences (SPSS) [ 16 , 17 , 33 , 47 ], the Azure Machine Learning Platform [ 30 ], or programmed the model using Python [ 23 , 25 , 46 ]. Artificial neural network analyses used Neural Designer [ 34 ] or Statistica V10 [ 35 ]. Six studies did not report the software used for analysis [ 21 , 31 , 32 , 37 , 41 , 42 ].

Families of machine learning algorithms

Also as summarized in Table 2 , more than one third of all publications (n = 13, 38.2%) applied only one family of machine learning algorithm to model development [ 16 , 17 , 18 , 19 , 20 , 34 , 37 , 41 , 42 , 43 , 46 , 48 ]; and only four studies utilized five or more methods [ 23 , 25 , 28 , 45 ]. One applied an ensemble of six different algorithms and the software was set to run 200 iterations [ 23 ], and another ran seven algorithms [ 45 ].

Internal and external validation

Evaluation of study publication quality identified the most common gap in publications as the lack of external validation, which was conducted by only two studies [ 15 , 20 ]. Seven studies predefined the success criteria for model performance [ 20 , 21 , 23 , 35 , 36 , 46 , 47 ], and five studies discussed the generalizability of the model [ 20 , 23 , 34 , 45 , 48 ]. Six studies [ 17 , 18 , 21 , 22 , 35 , 36 ] discussed the balance between model accuracy and model simplicity or interpretability, which was also a criterion of quality publication in the Luo scale [ 14 ]. The items on the checklist that were least frequently met are presented in Fig.  2 . The complete quality assessment evaluation for each item in the checklist is included in Additional file 1 : Table S1.

figure 2

Least frequently met study quality items, modified Luo Scale [ 14 ]

There were a variety of approaches taken to validate the models developed (Table 3 ). Internal validation with splitting into a testing and validation dataset was performed in all studies. The cohort splitting approach was conducted in multiple ways, using a 2:1 split [ 26 ], 60/40 split [ 21 , 36 ], a 70/30 split [ 16 , 17 , 22 , 30 , 33 , 35 ], 75/25 split [ 27 , 40 ], 80/20 split [ 46 ], 90/10 split [ 25 , 29 ], splitting the data based on site of care [ 48 ], a 2/1/1 split for training, testing and validation [ 38 ], and splitting 60/20/20, where the third group was selected for model selection purposes prior to validation [ 34 ]. Nine studies did not specifically mention the form of splitting approach used [ 15 , 18 , 19 , 20 , 24 , 29 , 39 , 45 , 47 ], but most of those noted the use of k fold cross validation. One training set corresponded to 90% of the sample [ 23 ], whereas a second study was less clear, as input data were at the observation level with multiple observations per patient, and 3 of the 15 patients were included in the training set [ 37 ]. The remaining studies did not specifically state splitting the data into testing and validation samples, but most specified they performed five-fold cross validation (including one that generally mentioned cohort splitting) [ 18 , 45 ] or ten-fold cross validation strategies [ 15 , 19 , 20 , 28 ].

External validation was conducted by only two studies (5.9%). Hische and colleagues conducted a decision tree analysis, which was designed to identify patients with impaired fasting glucose [ 20 ]. Their model was developed in a cohort study of patients from the Berlin Potsdam Cohort Study (n = 1527) and was found to have a positive predictive value of 56.2% and a negative predictive value of 89.1%. The model was then tested on an independent from the Dresden Cohort (n = 1998) with a family history of type II diabetes. In external validation, positive predictive value was 43.9% and negative predictive value was 90.4% [ 20 ]. Toussi and colleagues conducted both internal and external validation in their decision tree analysis to evaluate individual physician prescribing behaviors using a database of 463 patient electronic medical records [ 15 ]. For the internal validation step, the cross-validation option was used from Quinlan’s C5.0 decision tree learning algorithm as their study sample was too small to split into a testing and validation sample, and external validation was conducted by comparing outcomes to published treatment guidelines. Unfortunately, they found little concordance between physician behavior and guidelines potentially due to the timing of the data not matching the time period in which guidelines were implemented, emphasizing the need for a contemporaneous external control [ 15 ].

Handling of missing values

Missing values were addressed in most studies (n = 21, 61.8%) in this review, but there were thirteen remaining studies that did not mention if there were missing data or how they were handled (Table 3 ). For those that reported methods related to missing data, there were a wide variety of approaches used in real-world datasets. The full information maximum likelihood method was used for estimating model parameters in the presence of missing data for the development of the model by Hertroijs and colleagues, but patients with missing covariate values at baseline were excluded from the validation of the model [ 45 ]. Missing covariate values were included in models as a discrete category [ 48 ]. Four studies removed patients from the model with missing data [ 46 ], resulting in the loss of 16%-41% of samples in three studies [ 17 , 36 , 47 ]. Missing data from primary outcome variables were reported among with 59% (men) and 70% (women) within a study of diabetes [ 16 ]. In this study, single imputation was used; for continuous variables CART (IBM SPSS modeler V14.2.03) and for categorical variables the authors used the weighted K-Nearest Neighbor approach using RapidMiner (V.5) [ 16 ]. Other studies reported exclusion but not specifically the impact on sample size [ 29 , 31 , 38 , 44 ]. Imputation was conducted in a variety of ways for studies with missing data [ 22 , 25 , 28 , 33 ]. Single imputation was used in the study by Bannister and colleagues, but followed by multiple imputation in the final model to evaluate differences in model parameters [ 22 ]. One study imputed with a standard last-imputation-forward approach [ 26 ]. Spline techniques were used to impute missing data in the training set of one study [ 37 ]. Missingness was largely retained as an informative variable, and only variables missing for 85% or more of participants were excluded by Alaa et al. [ 23 ] while Hearn et al. used a combination of imputation and exclusion strategies [ 40 ]. Lastly, missing or incomplete data were imputed using a model-based approach by Toussi et al. [ 15 ] and using an optimal-impute algorithm by Bertsimas et al. [ 21 ].

Strengths and weaknesses noted by authors

Publications summarized the strengths and weaknesses of the machine learning methods employed. Low complexity and simplicity of machine-based learning models were noted as strengths of this approach [ 15 , 20 ]. Machine learning approaches were both powerful and efficient methods to apply to large datasets [ 19 ]. It was noted that parameters in this study that were significant at the patient level were included, even if at the broader population-based level using traditional regression analysis model development they would have not been significant and therefore would have been otherwise excluded using traditional approaches [ 34 ]. One publication noted the value of machine learning being highly dependent on the model selection strategy and parameter optimization, and that machine learning in and of itself will not provide better estimates unless these steps are conducted properly [ 23 ].

Even when properly planned, machine learning approaches are not without issues that deserve attention in future studies that employ these techniques. Within the eligible publications, weaknesses included overfitting the model with the inclusion of too much detail [ 15 ]. Additional limitations are based on the data sources used for machine learning, such as the lack of availability of all desired variables and missing data that can affect the development and performance of these models [ 16 , 34 , 36 , 48 ]. The lack of all relevant variables was noted as a particular concern for retrospective database studies, where the investigator is limited to what has been recorded [ 26 , 28 , 29 , 38 , 40 ]. Importantly and as observed in the studies included in this review, the lack of external validation was stated as a limitation of studies included in this review [ 28 , 30 , 38 , 42 ].

Limitations can also be on the part of the research team, as the need for both clinical and statistical expertise in the development and execution of studies using machine learning-based methodology, and users are warned against applying these methods blindly [ 22 ]. The importance of the role of clinical and statistical experts in the research team was noted in one study and highlighted as a strength of their work [ 21 ].

This study systematically reviewed and summarized the methods and approaches used for machine learning as applied to observational datasets that can inform patient-provider decision making. Machine learning methods have been applied much more broadly across observational studies than in the context of individual decision making, so the summary of this work does not necessarily apply to all machine learning-based studies. The focus of this work is on an area that remains largely unexplored, which is how to use large datasets in a manner that can inform and improve patient care in a way that supports shared decision making with reliable evidence that is applicable to the individual patient. Multiple publications cite the limitations of using population-based estimates for individual decisions [ 49 , 50 , 51 ]. Specifically, a summary statistic at the population level does not apply to each person in that cohort. Population estimates represent a point on a potentially wide distribution, and any one patient could fall anywhere within that distribution and be far from the point estimate value. On the other extreme, case reports or case series provide very specific individual-level data, but are not generalizable to other patients [ 52 ]. This review and summary provides guidance and suggestions of best practices to improve and hopefully increase the use of these methods to provide data and models to inform patient-provider decision making.

It was common for single modeling strategies to be employed within the identified publications. It has long been known that single algorithms to estimation can produce a fair amount of uncertainty and variability [ 53 ]. To overcome this limitation, there is a need for multiple algorithms and multiple iterations of the models to be performed. This, combined with more powerful analytics in recent years, provides a new standard for machine learning algorithm choice and development. While in some cases, a single model may fit the data well and provide an accurate answer, the certainty of the model can be supported through novel approaches, such as model averaging [ 54 ]. Few studies in this review combined multiple families of modeling strategies along with multiple iterations of the models. This should become a best practice in the future and is recommended as an additional criterion to assess study quality among machine learning-based modeling [ 54 ].

External validation is critical to ensure model accuracy, but was rarely conducted in the publications included in this review. The reasons for this could be many, such as lack of appropriate datasets or due to the lack of awareness of the importance of external validation [ 55 ]. As model development using machine learning increases, there is a need for external validation prior to application of models in any patient-provider setting. The generalizability of models is largely unknown without these data. Publications that did not conduct external validation also did not note the need for this to be completed, as generalizability was discussed in only five studies, one of which had also conducted the external validation. Of the remaining four studies, the role of generalizability was noted in terms of the need for future external validation in only one study [ 48 ]. Other reviews that were more broadly conducted to evaluate machine learning methods similarly found a low rate of external validation (6.6% versus 5.9% in this study) [ 56 ]. It was shown that there was lower prediction accuracy by external validation than simply by cross validation alone. The current review, with a focus on machine learning to support decision making at a practical level, suggests external validation is an important gap that should be filled prior to using these models for patient-provider decision making.

Luo and others suggest that k -fold validation may be used with proper stratification of the response variable as part of the model selection strategy [ 14 , 55 ]. The studies identified in this review generally conducted 5- or tenfold validation. There is no formal rule for the selection for the value of k , which is typically based on the size of the dataset; as k increases, bias will be reduced, but in turn variance will increase. While the tradeoff has to be accounted for, k  = 5–10 has been found to be reasonable for most study purposes [ 57 ].

The evidence from identified publications suggests that the ethical concerns of lack of transparency and failure to report confidence in the findings are largely warranted. These limitations can be addressed through the use of multiple modeling approaches (to clarify the ‘black box’ nature of these approaches) and by including both external and high k-fold validation (to demonstrate the confidence in findings). To ensure these methods are used in a manner that improves patient care, the expectations of population-based risk prediction models of the past are no longer sufficient. It is essential that the right data, the right set of models, and appropriate validation are employed to ensure that the resulting data meet standards for high quality patient care.

This study did not evaluate the quality of the underlying real-world data used to develop, test or validate the algorithms. While not directly part of the evaluation in this review, researchers should be aware that all limitations of real-world data sources apply regardless of the methodology employed. However, when observational datasets are used for machine learning-based research, the investigator should be aware of the extent to which the methods they are using depend on the data structure and availability, and should evaluate a proposed data source to ensure it is appropriate for the machine learning project [ 45 ]. Importantly, databases should be evaluated to fully understand the variables included, as well as those variables that may have prognostic or predictive value, but may not be included in the dataset. The lack of important variables remains a concern with the use of retrospective databases for machine learning. The concerns with confounding (particularly unmeasured confounding), bias (including immortal time bias), and patient selection criteria to be in the database must also be evaluated [ 58 , 59 ]. These are factors that should be considered prior to implementing these methods, and not always at the forefront of consideration when applying machine learning approaches. The Luo checklist is a valuable tool to ensure that any machine-learning study meets high research standards for patient care, and importantly includes the evaluation of missing or potentially incorrect data (i.e. outliers) and generalizability [ 14 ]. This should be supplemented by a thorough evaluation of the potential data to inform the modeling work prior to its implementation, and ensuring that multiple modeling methods are applied.

This review found a wide variety of approaches, methods, statistical software and validation strategies that were employed in the application of machine learning methods to inform patient-provider decision making. Based on these findings, there is a need to ensure that multiple modeling approaches are employed in the development of machine learning-based models for patient care, which requires the highest research standards to reliably support shared evidence-based decision making. Models should be evaluated with clear criteria for model selection, and both internal and external validation are needed prior to applying these models to inform patient care. Few studies have yet to reach that bar of evidence to inform patient-provider decision making.

Availability of data and materials

All data generated or analyzed during this study are included in this published article and its supplementary information files.

Abbreviations

Artificial intelligence

Area under the curve

Classification and regression trees

Logistic least absolute shrinkage and selector operator

Steyerberg EW, Claggett B. Towards personalized therapy for multiple sclerosis: limitations of observational data. Brain. 2018;141(5):e38-e.

Fröhlich H, Balling R, Beerenwinkel N, Kohlbacher O, Kumar S, Lengauer T, et al. From hype to reality: data science enabling personalized medicine. BMC Med. 2018;16(1):150.

Article   PubMed   PubMed Central   Google Scholar  

Steyerberg EW. Clinical prediction models. Berlin: Springer; 2019.

Book   Google Scholar  

Schnabel RB, Sullivan LM, Levy D, Pencina MJ, Massaro JM, D’Agostino RB Sr, et al. Development of a risk score for atrial fibrillation (Framingham Heart Study): a community-based cohort study. Lancet. 2009;373(9665):739–45.

D’Agostino RB, Wolf PA, Belanger AJ, Kannel WB. Stroke risk profile: adjustment for antihypertensive medication. Framingham Study Stroke. 1994;25(1):40–3.

Article   CAS   PubMed   Google Scholar  

Framingham Heart Study: Risk Functions 2020. https://www.framinghamheartstudy.org/ .

Gawehn E, Hiss JA, Schneider G. Deep learning in drug discovery. Mol Inf. 2016;35:3–14.

Article   CAS   Google Scholar  

Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, et al. Applications of machine learning in drug discovery and development. Nat Rev Drug Discov. 2019;18(6):463–77.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Marcus G. Deep learning: A critical appraisal. arXiv preprint arXiv:180100631. 2018.

Grote T, Berens P. On the ethics of algorithmic decision-making in healthcare. J Med Ethics. 2020;46(3):205–11.

Article   PubMed   Google Scholar  

Brnabic A, Hess L, Carter GC, Robinson R, Araujo A, Swindle R. Methods used for the applicability of real-world data sources to individual patient decision making. Value Health. 2018;21:S102.

Article   Google Scholar  

Fu H, Zhou J, Faries DE. Estimating optimal treatment regimes via subgroup identification in randomized control trials and observational studies. Stat Med. 2016;35(19):3285–302.

Liang M, Ye T, Fu H. Estimating individualized optimal combination therapies through outcome weighted deep learning algorithms. Stat Med. 2018;37(27):3869–86.

Luo W, Phung D, Tran T, Gupta S, Rana S, Karmakar C, et al. Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res. 2016;18(12):e323.

Toussi M, Lamy J-B, Le Toumelin P, Venot A. Using data mining techniques to explore physicians’ therapeutic decisions when clinical guidelines do not provide recommendations: methods and example for type 2 diabetes. BMC Med Inform Decis Mak. 2009;9(1):28.

Ramezankhani A, Hadavandi E, Pournik O, Shahrabi J, Azizi F, Hadaegh F. Decision tree-based modelling for identification of potential interactions between type 2 diabetes risk factors: a decade follow-up in a Middle East prospective cohort study. BMJ Open. 2016;6(12):e013336.

Pei D, Zhang C, Quan Y, Guo Q. Identification of potential type II diabetes in a Chinese population with a sensitive decision tree approach. J Diabetes Res. 2019;2019:4248218.

Neefjes EC, van der Vorst MJ, Verdegaal BA, Beekman AT, Berkhof J, Verheul HM. Identification of patients with cancer with a high risk to develop delirium. Cancer Med. 2017;6(8):1861–70.

Mubeen AM, Asaei A, Bachman AH, Sidtis JJ, Ardekani BA, Initiative AsDN. A six-month longitudinal evaluation significantly improves accuracy of predicting incipient Alzheimer’s disease in mild cognitive impairment. J Neuroradiol. 2017;44(6):381–7.

Hische M, Luis-Dominguez O, Pfeiffer AF, Schwarz PE, Selbig J, Spranger J. Decision trees as a simple-to-use and reliable tool to identify individuals with impaired glucose metabolism or type 2 diabetes mellitus. Eur J Endocrinol. 2010;163(4):565.

Bertsimas D, Dunn J, Pawlowski C, Silberholz J, Weinstein A, Zhuo YD, et al. Applied informatics decision support tool for mortality predictions in patients with cancer. JCO Clin Cancer Inform. 2018;2:1–11.

Bannister CA, Halcox JP, Currie CJ, Preece A, Spasic I. A genetic programming approach to development of clinical prediction models: a case study in symptomatic cardiovascular disease. PLoS ONE. 2018;13(9):e0202685.

Article   PubMed   PubMed Central   CAS   Google Scholar  

Alaa AM, Bolton T, Di Angelantonio E, Rudd JHF, van der Schaar M. Cardiovascular disease risk prediction using automated machine learning: a prospective study of 423,604 UK Biobank participants. PLoS ONE. 2019;14(5):e0213653.

Baxter SL, Marks C, Kuo TT, Ohno-Machado L, Weinreb RN. Machine learning-based predictive modeling of surgical intervention in glaucoma using systemic data from electronic health records. Am J Ophthalmol. 2019;208:30–40.

Dong Y, Xu L, Fan Y, Xiang P, Gao X, Chen Y, et al. A novel surgical predictive model for Chinese Crohn’s disease patients. Medicine (Baltimore). 2019;98(46):e17510.

Hill NR, Ayoubkhani D, McEwan P, Sugrue DM, Farooqui U, Lister S, et al. Predicting atrial fibrillation in primary care using machine learning. PLoS ONE. 2019;14(11):e0224582.

Kang AR, Lee J, Jung W, Lee M, Park SY, Woo J, et al. Development of a prediction model for hypotension after induction of anesthesia using machine learning. PLoS ONE. 2020;15(4):e0231172.

Karhade AV, Ogink PT, Thio Q, Cha TD, Gormley WB, Hershman SH, et al. Development of machine learning algorithms for prediction of prolonged opioid prescription after surgery for lumbar disc herniation. Spine J. 2019;19(11):1764–71.

Kebede M, Zegeye DT, Zeleke BM. Predicting CD4 count changes among patients on antiretroviral treatment: Application of data mining techniques. Comput Methods Programs Biomed. 2017;152:149–57.

Kim I, Choi HJ, Ryu JM, Lee SK, Yu JH, Kim SW, et al. A predictive model for high/low risk group according to oncotype DX recurrence score using machine learning. Eur J Surg Oncol. 2019;45(2):134–40.

Kwon JM, Jeon KH, Kim HM, Kim MJ, Lim S, Kim KH, et al. Deep-learning-based out-of-hospital cardiac arrest prognostic system to predict clinical outcomes. Resuscitation. 2019;139:84–91.

Kwon JM, Lee Y, Lee Y, Lee S, Park J. An algorithm based on deep learning for predicting in-hospital cardiac arrest. J Am Heart Assoc. 2018;7(13):26.

Scheer JK, Smith JS, Schwab F, Lafage V, Shaffrey CI, Bess S, et al. Development of a preoperative predictive model for major complications following adult spinal deformity surgery. J Neurosurg Spine. 2017;26(6):736–43.

Lopez-de-Andres A, Hernandez-Barrera V, Lopez R, Martin-Junco P, Jimenez-Trujillo I, Alvaro-Meca A, et al. Predictors of in-hospital mortality following major lower extremity amputations in type 2 diabetic patients using artificial neural networks. BMC Med Res Methodol. 2016;16(1):160.

Rau H-H, Hsu C-Y, Lin Y-A, Atique S, Fuad A, Wei L-M, et al. Development of a web-based liver cancer prediction model for type II diabetes patients by using an artificial neural network. Comput Methods Programs Biomed. 2016;125:58–65.

Ng T, Chew L, Yap CW. A clinical decision support tool to predict survival in cancer patients beyond 120 days after palliative chemotherapy. J Palliat Med. 2012;15(8):863–9.

Pérez-Gandía C, Facchinetti A, Sparacino G, Cobelli C, Gómez E, Rigla M, et al. Artificial neural network algorithm for online glucose prediction from continuous glucose monitoring. Diabetes Technol Therapeut. 2010;12(1):81–8.

Azimi P, Mohammadi HR, Benzel EC, Shahzadi S, Azhari S. Use of artificial neural networks to decision making in patients with lumbar spinal canal stenosis. J Neurosurg Sci. 2017;61(6):603–11.

Bowman A, Rudolfer S, Weller P, Bland JDP. A prognostic model for the patient-reported outcome of surgical treatment of carpal tunnel syndrome. Muscle Nerve. 2018;58(6):784–9.

Hearn J, Ross HJ, Mueller B, Fan CP, Crowdy E, Duhamel J, et al. Neural networks for prognostication of patients with heart failure. Circ. 2018;11(8):e005193.

Google Scholar  

Isma’eel HA, Cremer PC, Khalaf S, Almedawar MM, Elhajj IH, Sakr GE, et al. Artificial neural network modeling enhances risk stratification and can reduce downstream testing for patients with suspected acute coronary syndromes, negative cardiac biomarkers, and normal ECGs. Int J Cardiovasc Imaging. 2016;32(4):687–96.

Isma’eel HA, Sakr GE, Serhan M, Lamaa N, Hakim A, Cremer PC, et al. Artificial neural network-based model enhances risk stratification and reduces non-invasive cardiac stress imaging compared to Diamond-Forrester and Morise risk assessment models: a prospective study. J Nucl Cardiol. 2018;25(5):1601–9.

Jovanovic P, Salkic NN, Zerem E. Artificial neural network predicts the need for therapeutic ERCP in patients with suspected choledocholithiasis. Gastrointest Endosc. 2014;80(2):260–8.

Zhou HF, Huang M, Ji JS, Zhu HD, Lu J, Guo JH, et al. Risk prediction for early biliary infection after percutaneous transhepatic biliary stent placement in malignant biliary obstruction. J Vasc Interv Radiol. 2019;30(8):1233-41.e1.

Hertroijs DF, Elissen AM, Brouwers MC, Schaper NC, Köhler S, Popa MC, et al. A risk score including body mass index, glycated haemoglobin and triglycerides predicts future glycaemic control in people with type 2 diabetes. Diabetes Obes Metab. 2018;20(3):681–8.

Oviedo S, Contreras I, Quiros C, Gimenez M, Conget I, Vehi J. Risk-based postprandial hypoglycemia forecasting using supervised learning. Int J Med Inf. 2019;126:1–8.

Khanji C, Lalonde L, Bareil C, Lussier MT, Perreault S, Schnitzer ME. Lasso regression for the prediction of intermediate outcomes related to cardiovascular disease prevention using the TRANSIT quality indicators. Med Care. 2019;57(1):63–72.

Anderson JP, Parikh JR, Shenfeld DK, Ivanov V, Marks C, Church BW, et al. Reverse engineering and evaluation of prediction models for progression to type 2 diabetes: an application of machine learning using electronic health records. J Diabetes Sci Technol. 2016;10(1):6–18.

Patsopoulos NA. A pragmatic view on pragmatic trials. Dialogues Clin Neurosci. 2011;13(2):217–24.

Lu CY. Observational studies: a review of study designs, challenges and strategies to reduce confounding. Int J Clin Pract. 2009;63(5):691–7.

Morgenstern H. Ecologic studies in epidemiology: concepts, principles, and methods. Annu Rev Public Health. 1995;16(1):61–81.

Vandenbroucke JP. In defense of case reports and case series. Ann Intern Med. 2001;134(4):330–4.

Buckland ST, Burnham KP, Augustin NH. Model selection: an integral part of inference. Biometrics. 1997;53:603–18.

Zagar A, Kadziola Z, Lipkovich I, Madigan D, Faries D. Evaluating bias control strategies in observational studies using frequentist model averaging 2020 (submitted).

Kang J, Schwartz R, Flickinger J, Beriwal S. Machine learning approaches for predicting radiation therapy outcomes: a clinician’s perspective. Int J Radiat Oncol Biol Phys. 2015;93(5):1127–35.

Scott IM, Lin W, Liakata M, Wood J, Vermeer CP, Allaway D, et al. Merits of random forests emerge in evaluation of chemometric classifiers by external validation. Anal Chim Acta. 2013;801:22–33.

Kuhn M, Johnson K. Applied predictive modeling. Berlin: Springer; 2013.

Hess L, Winfree K, Muehlenbein C, Zhu Y, Oton A, Princic N. Debunking Myths While Understanding Limitations. Am J Public Health. 2020;110(5):E2-E.

Thesmar D, Sraer D, Pinheiro L, Dadson N, Veliche R, Greenberg P. Combining the power of artificial intelligence with the richness of healthcare claims data: Opportunities and challenges. PharmacoEconomics. 2019;37(6):745–52.

Download references

Acknowledgements

Not applicable.

No funding was received for the conduct of this study.

Author information

Authors and affiliations.

Eli Lilly and Company, Sydney, NSW, Australia

Alan Brnabic

Eli Lilly and Company, Indianapolis, IN, USA

Lisa M. Hess

You can also search for this author in PubMed   Google Scholar

Contributions

AB and LMH contributed to the design, implementation, analysis and interpretation of the data included in this study. AB and LMH wrote, revised and finalized the manuscript for submission. AB and LMH have both read and approved the final manuscript.

Corresponding author

Correspondence to Lisa M. Hess .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

Authors are employees of Eli Lilly and Company and receive salary support in that role.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1..

Table S1. Study quality of eligible publications, modified Luo scale [14].

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Brnabic, A., Hess, L.M. Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making. BMC Med Inform Decis Mak 21 , 54 (2021). https://doi.org/10.1186/s12911-021-01403-2

Download citation

Received : 07 July 2020

Accepted : 20 January 2021

Published : 15 February 2021

DOI : https://doi.org/10.1186/s12911-021-01403-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Machine learning
  • Decision making
  • Decision tree
  • Random forest
  • Automated neural network

BMC Medical Informatics and Decision Making

ISSN: 1472-6947

search algorithm literature review

Breadcrumbs Section. Click here to navigate to respective pages.

Gravitational Search Algorithm

Gravitational Search Algorithm

DOI link for Gravitational Search Algorithm

Click here to navigate to parent product.

Today, many metaheuristics algorithms have been developed are inspired by the physical phenomena or behaviors of natural creatures that are very effective in solving complex engineering optimization problems. These algorithms seek to find optimal solutions in various engineering problems such as machine learning, image processing, pattern matching, decision making, and so on. These algorithms can be implemented in both single-objective and multi-objective optimization problems. Among the most famous techniques in this category include the ant colony optimization (ACO) algorithm, which is designed based on the navigation of ants; particle swarm optimization (PSO), which is based on the movement of organisms in a bird flock; and gravitational search algorithm (GSA). GSA is one of the most popular metaheuristics methods. It is inspired by the law of gravity and motion and creating the appropriate balance between exploration and exploitation capabilities. In this chapter, we are going to introduce the basic GSA and review the newer versions. First, we will introduce the original GSA that is proposed for continuous problems. Then we will discuss other versions of GSA for various optimization problems and analyze the convergence properties of this algorithm. We will also review the GSA in various engineering applications.

  • Privacy Policy
  • Terms & Conditions
  • Cookie Policy
  • Taylor & Francis Online
  • Taylor & Francis Group
  • Students/Researchers
  • Librarians/Institutions

Connect with us

Registered in England & Wales No. 3099067 5 Howick Place | London | SW1P 1WG © 2024 Informa UK Limited

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Indian J Anaesth
  • v.60(9); 2016 Sep

Literature search for research planning and identification of research problem

Anju grewal.

Department of Anaesthesiology, Dayanand Medical College and Hospital, Ludhiana, Punjab, India

Hanish Kataria

1 Department of Surgery, Government Medical College and Hospital, Chandigarh, India

2 Department of Cardiac Anaesthesia, All India Institute of Medical Sciences, New Delhi, India

Literature search is a key step in performing good authentic research. It helps in formulating a research question and planning the study. The available published data are enormous; therefore, choosing the appropriate articles relevant to your study in question is an art. It can be time-consuming, tiring and can lead to disinterest or even abandonment of search in between if not carried out in a step-wise manner. Various databases are available for performing literature search. This article primarily stresses on how to formulate a research question, the various types and sources for literature search, which will help make your search specific and time-saving.

INTRODUCTION

Literature search is a systematic and well-organised search from the already published data to identify a breadth of good quality references on a specific topic.[ 1 ] The reasons for conducting literature search are numerous that include drawing information for making evidence-based guidelines, a step in the research method and as part of academic assessment.[ 2 ] However, the main purpose of a thorough literature search is to formulate a research question by evaluating the available literature with an eye on gaps still amenable to further research.

Research problem[ 3 ] is typically a topic of interest and of some familiarity to the researcher. It needs to be channelised by focussing on information yet to be explored. Once we have narrowed down the problem, seeking and analysing existing literature may further straighten out the research approach.

A research hypothesis[ 4 ] is a carefully created testimony of how you expect the research to proceed. It is one of the most important tools which aids to answer the research question. It should be apt containing necessary components, and raise a question that can be tested and investigated.

The literature search can be exhaustive and time-consuming, but there are some simple steps which can help you plan and manage the process. The most important are formulating the research questions and planning your search.

FORMULATING THE RESEARCH QUESTION

Literature search is done to identify appropriate methodology, design of the study; population sampled and sampling methods, methods of measuring concepts and techniques of analysis. It also helps in determining extraneous variables affecting the outcome and identifying faults or lacunae that could be avoided.

Formulating a well-focused question is a critical step for facilitating good clinical research.[ 5 ] There can be general questions or patient-oriented questions that arise from clinical issues. Patient-oriented questions can involve the effect of therapy or disease or examine advantage versus disadvantage for a group of patients.[ 6 ]

For example, we want to evaluate the effect of a particular drug (e.g., dexmedetomidine) for procedural sedation in day care surgery patients. While formulating a research question, one should consider certain criteria, referred as ‘FINER’ (F-Feasible, I-Interesting, N-Novel, E-Ethical, R-Relevant) criteria.[ 5 ] The idea should be interesting and relevant to clinical research. It should either confirm, refute or add information to already done research work. One should also keep in mind the patient population under study and the resources available in a given set up. Also the entire research process should conform to the ethical principles of research.

The patient or study population, intervention, comparison or control arm, primary outcome, timing of measurement of outcome (PICOT) is a well-known approach for framing a leading research question.[ 7 , 8 ] Dividing the questions into key components makes it easy and searchable. In this case scenario:

  • Patients (P) – What is the important group of patients? for example, day care surgery
  • Intervention (I) – What is the important intervention? for example, intravenous dexmedetomidine
  • Comparison (C) – What is the important intervention of comparison? for example, intravenous ketamine
  • Outcome (O) – What is the effect of intervention? for example, analgesic efficacy, procedural awareness, drug side effects
  • Time (T) – Time interval for measuring the outcome: Hourly for first 4 h then 4 hourly till 24 h post-procedure.

Multiple questions can be formulated from patient's problem and concern. A well-focused question should be chosen for research according to significance for patient interest and relevance to our knowledge. Good research questions address the lacunae in available literature with an aim to impact the clinical practice in a constructive manner. There are limited outcome research and relevant resources, for example, electronic database system, database and hospital information system in India. Even when these factors are available, data about existing resources is not widely accessible.[ 9 ]

TYPES OF MEDICAL LITERATURE

(Further details in chapter ‘Types of studies and research design’ in this issue).

Primary literature

Primary sources are the authentic publication of an expert's new evidence, conclusions and proposals (case reports, clinical trials, etc) and are usually published in a peer-reviewed journal. Preliminary reports, congress papers and preprints also constitute primary literature.[ 2 ]

Secondary literature

Secondary sources are systematic review articles or meta-analyses where material derived from primary source literature are infererred and evaluated.[ 2 ]

Tertiary literature

Tertiary literature consists of collections that compile information from primary or secondary literature (eg., reference books).[ 2 ]

METHODS OF LITERATURE SEARCH

There are various methods of literature search that are used alone or in combination [ Table 1 ]. For past few decades, searching the local as well as national library for books, journals, etc., was the usual practice and still physical literature exploration is an important component of any systematic review search process.[ 10 , 11 ] With the advancement of technology, the Internet is now the gateway to the maze of vast medical literature.[ 12 ] Conducting a literature review involves web-based search engines, i.e., Google, Google Scholar, etc., [ Table 2 ], or using various electronic research databases to identify materials that describe the research topic or those homologous to it.[ 13 , 14 ]

Methods of literature search

An external file that holds a picture, illustration, etc.
Object name is IJA-60-635-g001.jpg

Web based methods of literature search

An external file that holds a picture, illustration, etc.
Object name is IJA-60-635-g002.jpg

The various databases available for literature search include databases for original published articles in the journals [ Table 2 ] and evidence-based databases for integrated information available as systematic reviews and abstracts [ Table 3 ].[ 12 , 14 ] Most of these are not freely available to the individual user. PubMed ( http://www.ncbi.nlm.nih.gov/pubmed/ ) is the largest available resource since 1996; however, a large number of sources now provide free access to literature in the biomedical field.[ 15 ] More than 26 million citations from Medline, life science journals and online books are included in PubMed. Links to the full-text material are included in citations from PubMed Central and publisher web sites.[ 16 ] The choice of databases depends on the subject of interest and potential coverage by the different databases. Education Resources Information Centre is a free online digital library of education research and information sponsored by the Institute of Education Sciences of the U.S. Department of Education, available at http://eric.ed.gov/ . No one database can search all the medical literature. There is need to search several different databases. At a minimum, PubMed or Medline, Embase and the Cochrane central trials Registry need to be searched. When searching these databases, emphasis should be given to meta-analysis, systematic reviews randomised controlled trials and landmark studies.

Electronic source of Evidence-Based Database

An external file that holds a picture, illustration, etc.
Object name is IJA-60-635-g003.jpg

Time allocated to the search needs attention as exploring and selecting data are early steps in the research method and research conducted as part of academic assessment have narrow timeframes.[ 17 ] In Indian scenario, limited outcome research and accessibility to data leads to less thorough knowledge of nature of research problem. This results in the formulation of the inappropriate research question and increases the time to literature search.

TYPES OF SEARCH

Type of search can be described in different forms according to the subject of interest. It increases the chances of retrieving relevant information from a search.

Translating research question to keywords

This will provide results based on any of the words specified; hence, they are the cornerstone of an effective search. Synonyms/alternate terms should be considered to elicit further information, i.e., barbiturates in place of thiopentone. Spellings should also be taken into account, i.e., anesthesia in place of anaesthesia (American and British). Most databases use controlled word-stock to establish common search terms (or keywords). Some of these alternative keywords can be looked from database thesaurus.[ 4 ] Another strategy is combining keywords with Boolean operators. It is important to keep a note of keywords and methods used in exploring the literature as these will need to be described later in the design of search process.

‘Medical Subject Heading (MeSH) is the National Library of Medicine's controlled hierarchical vocabulary that is used for indexing articles in PubMed, with more specific terms organised underneath more general terms’.[ 17 ] This provides a reliable way to retrieve citations that use different terminology for identical ideas, as it indexes articles based on content. Two features of PubMed that can increase yield of specific articles are ‘Automatic term mapping’ and ‘automatic term explosion’.[ 4 ]

For example, if the search keyword is heart attack, this term will match with MeSH transcription table heading and then explode into various subheadings. This helps to construct the search by adding and selecting MeSH subheadings and families of MeSH by use of hyperlinks.[ 4 ]

We can set limits to a clinical trial for retrieving higher level of evidence (i.e., randomised controlled clinical trial). Furthermore, one can browse through the link entitled ‘Related Articles’. This PubMed feature searches for similar citations using an intricate algorithm that scans titles, abstracts and MeSH terms.[ 4 ]

Phrase search

This will provide pages with only the words typed in the phrase, in that exact order and with no words in between them.

Boolean operators

AND, OR and NOT are the three Boolean operators named after the mathematician George Boole.[ 18 ] Combining two words using ‘AND’ will fetch articles that mention both the words. Using ‘OR’ will widen the search and fetch more articles that mention either subject. While using the term ‘NOT’ to combine words will fetch articles containing the first word but not the second, thus narrowing the search.

Filters can also be used to refine the search, for example, article types, text availability, language, age, sex and journal categories.

Overall, the recommendations for methodology of literature search can be as below (Creswell)[ 19 ]

  • Identify keywords and use them to search articles from library and internet resources as described above
  • Search several databases to search articles related to your topic
  • Use thesaurus to identify terms to locate your articles
  • Find an article that is similar to your topic; then look at the terms used to describe it, and use them for your search
  • Use databases that provide full-text articles (free through academic libraries, Internet or for a fee) as much as possible so that you can save time searching for your articles
  • If you are examining a topic for the first time and unaware of the research on it, start with broad syntheses of the literature, such as overviews, summaries of the literature on your topic or review articles
  • Start with the most recent issues of the journals, and look for studies about your topic and then work backward in time. Follow-up on references at the end of the articles for more sources to examine
  • Refer books on a single topic by a single author or group of authors or books that contain chapters written by different authors
  • Next look for recent conference papers. Often, conference papers report the latest research developments. Contact authors of pertinent studies. Write or phone them, asking if they know of studies related to your area of interest
  • The easy access and ability to capture entire articles from the web make it attractive. However, check these articles carefully for authenticity and quality and be cautious about whether they represent systematic research.

The whole process of literature search[ 20 ] is summarised in Figure 1 .

An external file that holds a picture, illustration, etc.
Object name is IJA-60-635-g004.jpg

Process of literature search

Literature search provides not only an opportunity to learn more about a given topic but provides insight on how the topic was studied by previous analysts. It helps to interpret ideas, detect shortcomings and recognise opportunities. In short, systematic and well-organised research may help in designing a novel research.

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.

Help | Advanced Search

Computer Science > Computers and Society

Title: problematic machine behavior: a systematic literature review of algorithm audits.

Abstract: While algorithm audits are growing rapidly in commonality and public importance, relatively little scholarly work has gone toward synthesizing prior work and strategizing future research in the area. This systematic literature review aims to do just that, following PRISMA guidelines in a review of over 500 English articles that yielded 62 algorithm audit studies. The studies are synthesized and organized primarily by behavior (discrimination, distortion, exploitation, and misjudgement), with codes also provided for domain (e.g. search, vision, advertising, etc.), organization (e.g. Google, Facebook, Amazon, etc.), and audit method (e.g. sock puppet, direct scrape, crowdsourcing, etc.). The review shows how previous audit studies have exposed public-facing algorithms exhibiting problematic behavior, such as search algorithms culpable of distortion and advertising algorithms culpable of discrimination. Based on the studies reviewed, it also suggests some behaviors (e.g. discrimination on the basis of intersectional identities), domains (e.g. advertising algorithms), methods (e.g. code auditing), and organizations (e.g. Twitter, TikTok, LinkedIn) that call for future audit attention. The paper concludes by offering the common ingredients of successful audits, and discussing algorithm auditing in the context of broader research working toward algorithmic justice.
Comments: To Appear in the Proceedings of the ACM (PACM) Human-Computer Interaction, CSCW '21
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Cite as: [cs.CY]
  (or [cs.CY] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

DBLP - CS Bibliography

Bibtex formatted citation.

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

  • DOI: 10.1016/j.neucom.2024.127954
  • Corpus ID: 270202442

The use of reinforcement learning algorithms in object tracking: A systematic literature review

  • David J. Barrientos R , Marie Chantelle C. Medina , +1 author Pablo V.A. Barros
  • Published in Neurocomputing 1 June 2024
  • Computer Science

103 References

Trustworthy dynamic object tracking using deep reinforcement learning with the self-attention mechanism, a new approach for drone tracking with drone using proximal policy optimization based distributed deep reinforcement learning, dynamic reward in dqn for autonomous navigation of uavs using object detection, uav dynamic object tracking with lightweight deep vision reinforcement learning, improving cooperative multi-target tracking control for uav swarm using multi-agent reinforcement learning, visual tracking with reinforced template updating and redetection discriminator, i-vital: information aided visual tracking with adversarial learning, on deep recurrent reinforcement learning for active visual tracking of space noncooperative objects, review of deep reinforcement learning, space noncooperative object active tracking with deep reinforcement learning, related papers.

Showing 1 through 3 of 0 Related Papers

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

PLOS ONE 

June 4, 2024

PLOS ONE 

An inclusive journal community working together to advance science by making all rigorous research accessible without barriers

Calling all experts!

Plos one is seeking talented individuals to join our editorial board. .

Cancer Epidemiology

Impact of aging on acute myeloid leukemia epidemiology and survival outcomes: A real-world, population-based longitudinal cohort study

Han and colleagues report an association between aging and incidence of acute myeloid leukemia diagnoses in South Korea, with recommendations to expand treatment options for older patients.

Image credit: Couple by Mabel Amber, Pixabay

Impact of aging on acute myeloid leukemia epidemiology and survival outcomes: A real-world, population-based longitudinal cohort study

Agriculture

Persistence of genetically engineered canola populations in the U.S. and the adventitious presence of transgenes in the environment

Travers and colleagues research reveal that escaped GMO canola plants persist long-term outside farms but may be losing their herbicide resistant transgenes.

Image credit: Canola field in Manitoba, Canada by Ethan Sahagun, Wikimedia Commons

Persistence of genetically engineered canola populations in the U.S. and the adventitious presence of transgenes in the environment

Climate Change

Uncertainty reduction for precipitation prediction in North America

Lou and colleagues investigated the uncertainties across 27 CMIP6 models for projecting future annual precipitation increases in North America. They captured emergent constraint relationships between annual growth rates of simulated historical temperature and future precipitation. This reduced precipitation prediction uncertainties and improved temperature trend accuracy.

Image credit: Puddle by pictures101, Pixabay

Uncertainty reduction for precipitation prediction in North America

Neuroscience 

Orienteering combines vigorous-intensity exercise with navigation to improve human cognition and increase brain-derived neurotrophic factor

Waddington and colleagues report the benefits of the sport of orienteering, which combines vigorous exercise with spatial navigation, on memory and molecular markers of cognition such as BDNF.

Image credit: Fig 7 by Waddington et al., CC BY 4.0

Orienteering combines vigorous-intensity exercise with navigation to improve human cognition and increase brain-derived neurotrophic factor

Official PLOS Blog

Driving Open Science adoption with a global framework: the Open Science Monitoring Initiative

PLOS discusses the recent launch of the Open Science Monitoring Initiative.

Driving Open Science adoption with a global framework: the Open Science Monitoring Initiative

Image credit: Lighthouse by Masami, CC BY 4.0

Editor Spotlight: Frank Kyei-Arthur

In this interview, PLOS ONE Academic Editor Dr Frank Kyei-Arthur discusses assessing reviewers' comments, his research interest in diverse populations, and the importance of Open Science in population health research.

Editor Spotlight: Frank Kyei-Arthur

Image credit: Dr. Frank Kyei-Arthur by Dr. Frank Kyei-Arthur, CC BY 4.0

Editor Spotlight: Bogdan Cristescu

In this interview, PLOS ONE Academic Editor Dr Bogdan Cristescu shares his experiences with PLOS ONE as author, reviewer and editor, his research in wildlife conservation ecology, and memorable places from his fieldwork.

Editor Spotlight: Bogdan Cristescu

Image credit: Dr. Bogdan Cristescu by Dr. Bogdan Cristescu, CC BY 4.0

Biochemistry

Formamide denaturation of double-stranded DNA for fluorescence in situ hybridization (FISH) distorts nanoscale chromatin structure

Shim and colleagues compare DNA labelling methods. 

Formamide denaturation of double-stranded DNA for fluorescence in situ hybridization (FISH) distorts nanoscale chromatin structure

Image credit: Bottoms spiral string by Qimono, Pixabay

Archaeology

Pottery spilled the beans: Patterns in the processing and consumption of dietary lipids in Central Germany from the Early Neolithic to the Bronze Age

Breu and colleagues analyzed pottery vessels to study ancient culinary traditions in Germany.

Pottery spilled the beans: Patterns in the processing and consumption of dietary lipids in Central Germany from the Early Neolithic to the Bronze Age

Image credit: Fig 2 by Breu et al., CC BY 4.0

Rapid respiratory microbiological point-of-care-testing and antibiotic prescribing in primary care: Protocol for the RAPID-TEST randomised controlled trial

Abbs and colleagues report the protocol for the RAPID-TEST randomised controlled trial.

Rapid respiratory microbiological point-of-care-testing and antibiotic prescribing in primary care: Protocol for the RAPID-TEST randomised controlled trial

Image credit: Healthcare worker taking PCR test by Drazen Zigic, Freepik

Animal behaviour

Eurasian jays ( Garrulus glandarius ) show episodic-like memory through the incidental encoding of information

Davies and colleagues show how Eurasian jays can use mental time travel like humans

Eurasian jays (Garrulus glandarius) show episodic-like memory through the incidental encoding of information

Image credit: Eurasian Jay (Garrulus glandarius) by Zeynel Cebeci, Wikimedia Commons

Collections

Browse the lastest collections of papers from across PLOS

Watch this space for future collections of papers in PLOS ONE

RCPSYCH International Congress 2024

Associate Editor Annesha Sil will be representing PLOS ONE at this conference in Edinburgh, UK, June 17-20, 2024.

Sunbelt 2024

Senior Editor Hanna Landenmark will be representing PLOS ONE at this conference in Edinburgh, UK, June 24-30, 2024.

UK Alliance for Disaster Reduction (UKADR) 2024 Conference

Associate Editor Joanna Tindall will be representing PLOS ONE at this conference in London, UK, June 26-27, 2024.

Publish with PLOS ONE

  • Submission Instructions
  • Submit Your Manuscript

Connect with Us

  • PLOS ONE on Twitter
  • PLOS on Facebook

Get new content from PLOS ONE in your inbox

Thank you you have successfully subscribed to the plos one newsletter., sorry, an error occurred while sending your subscription. please try again later..

The pregnancy outcomes among women receiving individualized algorithm dosing with follitropin delta: a systematic review of randomized controlled trials

  • Assisted Reproduction Technologies
  • Open access
  • Published: 29 May 2024

Cite this article

You have full access to this open access article

search algorithm literature review

  • Bogdan Doroftei   ORCID: orcid.org/0000-0002-6618-141X 1 , 2 , 3 ,
  • Ovidiu-Dumitru Ilie   ORCID: orcid.org/0000-0002-4023-1765 1 ,
  • Ana-Maria Dabuleanu 2 , 3 ,
  • Theodora Armeanu 1 , 2 , 3 &
  • Radu Maftei 1 , 2 , 3  

423 Accesses

Explore all metrics

To investigate whether the ovarian stimulation with follitropin delta in an individualized algorithm-based manner is inferior to recombinant human-follicle stimulating’s follitropin alfa or follitropin beta conventional dosing regarding a series of established primary endpoints.

We conducted a registered systematic review (CRD42024512792) on PubMed-MEDLINE, Web of Science™, Cochrane Database of Systematic Reviews, and Scopus. Our search was designed to cover all relevant literature, particularly randomized controlled trials. We critically and comparatively analyzed the outcomes for each primary endpoint based on the intervention, reflected by the positive βhCG test, clinical pregnancy, vital pregnancy, ongoing pregnancy, live birth, live birth at 4 weeks, and multiple pregnancies.

Six randomized controlled trials were included in the quality assessment as priority manuscripts, revealing an 83.3% low risk of bias. Follitropin delta led to non-significant differences in each parameter of interest from positive βhCG test (691; 53.44% vs. 602; 46.55%), ongoing pregnancies (603; 53.79% vs. 518; 46.20%), clinical and vital pregnancies (1,073; 52.80% vs. 959; 47.19%), to live birth and at 4 weeks (595; 54.14% vs. 504; 45.85%) with only 2 losses, and even multiple pregnancies (8; 66.66% vs. 4; 33.33%). However, follitropin delta was well-tolerated among hypo- and hyper-responders without significant risk of ovarian hyperstimulation syndrome and/or preventive interventions in contrast with follitropin alfa or follitropin beta.

The personalized individualized-based algorithm dosing with follitropin delta is non-inferior to conventional follitropin alfa or follitropin beta. It is as effective in promoting a similar response in women without significant comparable adverse effects.

Similar content being viewed by others

search algorithm literature review

Randomized trials of estrogen-alone and breast cancer incidence: a meta-analysis

search algorithm literature review

Current Medical Therapy for Adenomyosis: From Bench to Bedside

search algorithm literature review

A brief insight into the etiology, genetics, and immunology of polycystic ovarian syndrome (PCOS)

Avoid common mistakes on your manuscript.

Introduction

The pioneering work on gonadotropins [ 1 , 2 ] corroborated with the joint applications in translational medicine [ 3 ] paved the way towards synthesizing novel recombinant human follicle-stimulating hormones (r-hFSHs) of high purity through different biological processes [ 4 ]. These preparations [ 5 ] advanced as essential parts for ovarian stimulation and response of in vitro fertilization (IVF)/intracytoplasmic sperm injection (ICSI) within current assisted reproductive technologies (ART) protocols [ 3 ].

Given the inter-individual heterogeneity and variability across ethnic populations [ 6 , 7 , 8 , 9 ], the need for predictive factors adapted for patient stratification in acquiring high-quality embryos for transfer was widely accepted. Thus, it arose the idea of shifting from standardization to individualization to alleviate the risks of cycle cancelation, poor ovarian stimulation, and ovarian hyperstimulation syndrome (OHSS) [ 10 , 11 , 12 ] and culminated in overcoming the initial limitations of implementation in clinical practice [ 10 , 12 , 13 , 14 ].

Follitropin delta (Rekovelle®) produced by Ferring Pharmaceuticals is uniquely derived from a cell line of human fetal retinal origin (PER.C6; Crucell) [ 3 , 15 , 16 ] as the latest r-hFSH used for ovarian stimulation that differentiates from other r-FSHs by their Chinese hamster ovary (CHO) cell lines origin [ 3 , 4 , 16 , 17 ]. This drug retains an identical amino acid sequence in α and β subunits of which post-translational modifications resemble the glycosylation profile of native human FSH [ 15 , 16 ], integrating a high degree of tri- and tetra-sialylated glycans and α2,3- and α2,6-linked sialic acid [ 16 ]. Since it blocks the asialoglycoprotein receptor (ASGPR) in the liver [ 4 ], this biological process explains the lower clearance, similar half-life bioavailability [ 15 , 16 ], and higher ovarian stimulation than other rFSH [ 15 ].

While conventional r-FSHs uses international reference standard (IU) calculation with Steelman–Pohley bioassay, follitropin delta is dosed by mass (µg) because of the specific bioactivity, with an established dosing algorithm following pharmaco-kinetics/dynamic modeling exercise [ 15 ]. In light with the European Society of Human Reproduction and Embryology (ESHRE) guidelines recommendations [ 14 ], the approved dosing algorithm comprises both the body weight and anti-Müllerian hormone (AMH) level that impacts the drug exposure distribution volume and ovarian response [ 18 , 19 , 20 , 21 ].

Although randomized controlled trials represent the best approach to compare interventional treatments, they are limited by generalizability and strict inclusion criteria of patients under specific conditions [ 22 ]. Considering this argument, our purpose in this primary systematic review is to shed light on the non-inferiority potential by comparing the pregnancy outcomes represented by the established primary endpoints (positive βhCG test, clinical pregnancy, vital pregnancy, ongoing pregnancy, live birth, live birth at 4 weeks, and multiple pregnancies) among women undergoing conventional ovarian stimulation with follitropin alfa/follitropin beta compared with the individualized dosing algorithm of follitropin delta.

Materials and methods

Methodology and registration.

This protocol was designed to adhere to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines [ 23 ] and registered in the International Prospective Register of Systematic Reviews database (PROSPERO) (CRD42024512792).

Ethical approval

This systematic review did not required Institutional Review Board (IRB) consent or evaluation from another panel of expertise, as research data were extracted from published studies.

Database search

Inquiries in distinct bibliographic academic databases that include PubMed-MEDLINE - United States National Library of Medicine (NLM, 1996), Web of Science™ (WOS) (Clarivate Analytics, 1997), Cochrane Database of Systematic Reviews (CDSR) (Cochrane Library, 1993), and Scopus (Elsevier, 2004) [ 24 ] were performed to identify, rank, and analyze potential suitable studies using MeSH (Medical Subject Headings) terms. The searches were restricted between December 12, 2016, and January 21, 2024. The date of inception represents the interval from authorization by the European Medicines Agency (EMA) and the current state of knowledge. We used synonyms, from the marketing name „Rekovelle” to “Follitropin Delta” and “FE 999049” to even abbreviations of the two projects entitled “The Evidence-based Stimulation Trial with Human rFSH in Europe and Rest of World” (“ESTHER-1”) and “ESTHER-2.” “Follitropin Delta” and “FE 999049″ are listed as [Supplementary Concept] and not Major Topics [Majr] that were introduced on June 11, 2017, respectively, July 8, 2016, according to the NLM official website. Both can be identified with the following credentials: MeSH Unique ID and Registry Number: C000620228-076WHW89TW and C000608977.

Strategy and strings

We used a dedicated terminology for Search #1 that relies on sole vocabulary components “Follitropin Delta” OR “FE 999049” OR “Rekovelle” followed by “ESTHER-1” OR “ESTHER-2” and complex clusters using Boolean operators (“AND” or “OR”) for Search #2. The complete string sets for each database are available in Supplementary File 1.

Study selection

References retrieved were imported to Mendeley – Reference Management Software (v. 1.19.8) (Elsevier, 2013) and de-duplicated by accessing the “check for duplicates” function followed by a second manual screening for accuracy. O.-D.I. and T.A. assessed the titles ± abstracts of each record for relevance and tangency with the scope, while the list of articles considered was subsequently scanned by assessing the entire content. Divergent opinions were resolved by common consent between each author. A tabular form for the retrieved records can be consulted in Supplementary File 2.

The main resolution of this manuscript consists in answering the question of whether the third r-hFSH is superior to follitropin alfa/beta for one or multiple interest primary endpoints: positive βhCG test, clinical pregnancy, vital pregnancy, ongoing pregnancy, live birth, live birth at 4 weeks, and multiple pregnancies. We designed a Patient (P), Intervention (I), Comparator (C), and Outcome (O) (PICO) structure to develop the main research question and the criteria for inclusion and exclusion. The adopted PICO format is presented in Supplementary File 3.

Data extraction

Series of evidence in a tabular format using Microsoft Excel 2010 (Microsoft Corporation, Redmond, WA, USA) for sorting and coding were independently extracted through a standardized form developed to characterize included studies by B.D., O.-D.I., and R.M. that describe methodological data: author’s first name, year of publication, journal, country or countries, participant’s age, study design and population, and outcome measures reported as number and percentage (%).

Quality assessment

B.D. and O.-D.I. independently evaluated the quality of each included study using the Revised Cochrane risk-of-bias tool for randomized trials (RoB 2) [ 25 ] via https://sites.google.com/site/riskofbiastool/ (accessed on 28 January 2024) by providing guidance and different packages depending on the type of study. This tool provides a framework that classifies biases into five different domains: (1) bias arising from the randomization process; (2) bias due to deviations from intended interventions; (3) bias due to missing outcome data; (4) bias in measurement of the outcome; (5) bias in selection of the reported result. Answering to series of prompt questions varies from “yes,” “probably yes,” “probably no,” “no,” or “no information,” and calculating the overall risk may be categorized as “low risk of bias,” “some concerns,” and “high risk of bias.” For transparency of the RoB 2 evaluation, we employed the Risk-of-bias VISualization (ROBVIS) [ 26 ] in the present systematic review.

Inclusion/exclusion criteria

Whether the manuscripts were structured respecting the IMRaD format, they had to be written in English, report original data, and be published in a peer-reviewed journal. Therefore, primary research from RCTs was considered if available as full-length articles.

Studies reviewed

From 55 papers identified between December 12, 2016, and January 21, 2024, including duplicates and studies that had no relevance to the primary research question, 14 were subsequently removed as being duplicates after review. Of the remaining 41 records subjected to full-text assessment, an additional 32 manuscripts were excluded during the first phase and other 3 during the second step, as detailed in the PRISMA flowchart in Fig.  1 . The complete list with the individual and overall number of publications per year and lists of records can be consulted in the Supplementary File 4. Therefore, we retained six exclusive and multinational RCTs [ 27 , 28 , 29 , 30 , 31 , 32 ] conducted primarily in Asia ( n  = 4) and Europe ( n  = 2), n  = 1 in 2017, n  = 3 in 2021, and n  = 2 in 2022.

figure 1

PRISMA flow diagram of the systematic review

All these RCTs have been previously registered in https://clinicaltrials.gov/ (accessed on January 30, 2024) with the following ID numbers: Andersen et al. [ 27 ], NCT01956110, 000004, 2013–001669-17 (EudraCT Number), U1111-1147–6826 (Other Identifier) (OTHER: WHO); Ishihara et al. [ 28 ], NCT02309671, 000124; Qiao et al. [ 29 ], NCT03296527, 000145; Ishihara et al. [ 30 ], NCT03228680, 000273; Sánchez et al. [ 31 ], NCT03564509, 2017–003810-13 (EudraCT Number), 000289; Yang et al. [ 32 ], NCT03296527, 000145. Excepting NCT03564509 and NCT02309671 which were in phase 2 of testing and enrolled n  = 779 women, NCT03296527, NCT03296527, NCT03228680, and NCT01956110 were in phase 3 of development and recruited 3446, thus being randomized a total of 4225 women. RTCs’ overview with the associated results for the relevant parameters of interest may be consulted in Table  1 and Table  2 .

Risk of bias assessment

There has been an overall low risk (green color) of bias with an 83.3% quality on the assignment to the intention-to-treat (ITT), with only 16.7% indicating some concerns (yellow color) on one study [ 31 ] in D1 attributable to the “Bias arising from the randomization process” due to scarcity of baseline data to compare the differences between the groups and enrollment of underweight, overweight, and obese participants (Fig.  2 ).

figure 2

Overall assessment quality of the RTCs based on the ROB-2 tool

Positive βhCG

Positive serum βhCG at 13–15 days after blastocyst transfer was established in five [ 27 , 28 , 29 , 31 , 32 ] out of six [ 30 ] eligible RCTs and totals 1293 confirmed tests. Precisely, 691 (53.44%) women experienced OS with individualized follitropin delta in doses ranging from 1 µg to 12 µg/d guided by the AMH concentration. On the other hand, 602 (46.55%) resulted from the administration of the other two conventional r-FSHs, from which 584 (97%) of follitropin alfa and 18 (2.99%) of follitropin beta. However, the study of Sánchez et al. [ 31 ] which relied on adding choriogonadotropin beta (CG beta) and observed a negative response in contrast with the placebo (odds ratio - OR vs. placebo; 0.53; confidence interval - CI 95% (0.30; 0.93); P  = 0.0264). Although Ishihara et al. [ 30 ] conducted a similar study with the same primary endpoints [ 28 ], they did not report the associated data. In their prior experience, they have recorded elevated levels of βhCG among 68 participants (34/34) allocated to receive 6 µg/d (10/10), 9 µg/d (10/10), and 12 µg/d (14/14) per started cycle and cycle with transfer than the arm with follitropin beta with 18 (9/9) women exhibiting positive serum βhCG on 150 IU/daily.

Ongoing pregnancy

Referring to the ongoing pregnancy as having at least one intrauterine viable fetus 10–11 weeks after blastocyst transfer [ 27 , 28 , 29 , 30 , 31 , 32 ], it accounts for an overall 1121 ongoing pregnancies, as 603 (53.79%) were associated with follitropin delta. Moreover, 518 (46.20%) correlate with conventional dosing, 438 (84.55%), follitropin alfa, and 80 (15.44%), follitropin beta. The proportion for the latter was 40 (6/34) per started cycle and 40 (6/34) per cycle with transfer, whereas for the pre-defined fixed-dose were 6 µg/d (6/6), 9 µg/d (7/7), and 12 µg/d (10/10) [ 28 , 30 ]. The maximum allowed dose of 12 µg/daily of follitropin delta promoted the highest response enabling 20 (10/10) ongoing pregnancies [ 28 ], but exerts a dose-treatment effect because of the OR vs. placebo of 0.58; CI 95% (0.33; 1.03) with a P  = 0.0650) [ 31 ] as it decreases depending on the dose concentration.

Vital and clinical pregnancy

In the context of vital and clinical pregnancies regarded as at least one intrauterine gestational sac with a fetal heartbeat 5–6 weeks after transfer, three RCTs [ 27 , 28 , 32 ] have investigated these parameters. The remaining either emphasize the rates of clinical [ 29 , 30 ] or vital pregnancies [ 31 ]. It facilitated around 2032 pregnancies, particularly 1312 clinical and 720 vital pregnancies, irrespective of the intervention drug. Ovarian stimulation with follitropin delta led to obtaining 1073 (52.80%) combined cases, while convetional to 959 (47.19%), from which follitropin alfa to 847 (41.68%) and follitropin beta to 112 (5.51%). Analogous to the investigations conducted by Ishihara et al. [ 28 , 30 ], they highlighted that 308 pregnancies correspond to the ratio of 154 (98/56)/154 (98/56) cases related to the per started cycles and cycle with transfer for follitropin delta and follitropin beta. The ovarian response relying on the distinctive doses for both parameters was as follows: 6 µg/d–16 (9/7), 9 µg/d–16 (8/8), 12 µg/d–23 (13/10), and 15 IU/d–14 (8/6). The findings of Sánchez et al. [ 31 ] reflected a non-significant difference following 4 µg as it has the highest effect compared with placebo (OR vs. placebo; 0.93; CI 95% (0.53; 1.64) with a P  = 0.8116).

Live birth and at 4 weeks

By defining the live birth as having at least one live neonate who subsequently has reached 4 weeks after birth, five teams [ 27 , 28 , 29 , 30 , 32 ] calculated the rates, except for one study [ 31 ]. From a cumulative 1099 births, 595 (54.14%) were attributed to follitropin delta, and 504 (45.85%) to both conventional r-FHS, particularly 426 (38.76%) to follitropin alfa and 78 (7.09%) to follitropin beta. With an equivalent number of live neonates at 4 weeks, two women had losses without further details in this context [ 27 ]. As detailed earlier, the ratio per started cycle and cycle with transfer was identical, 6 µg/d (6/6), 9 µg/d (7/7), 12 µg/d (9/9) and 150 IU/d (6/6) according to the Ishihara et al. [ 28 ]. Dose adjustments of follitropin beta did not differ in terms of neonatal death in comparison with follitropin delta, 80 (40/40) and 66 (33/33) [ 30 ].

Multiple pregnancies

Andersen et al. [ 27 ] are the sole investigators who monitored the likelihood of multiple pregnancies and observed that 12 were obtained following ovarian stimulation: 8 (66.66%) in the follitropin alfa and half in the follitropin delta (33.33%).

In this first systematic review, we aimed to summarize evidence on the non-inferiority potential of individualized algorithm-based follitropin delta administration in terms of pregnancy outcomes compared with conventional dosing. It has been estimated that a daily dose of 10.0 µg follitropin delta corresponds to 150 IU/d [ 33 ] by possessing well tolerability up to 12–24 µg in Chinese women [ 34 ] with respect to the relatively same percentages on pregnancy outcomes [ 35 , 36 ].

Considering the low immunogenicity among women undergoing multiple stimulation cycles [ 35 ], the post hoc analyses emphasize a similar number of oocytes yielded and retrieved [ 33 , 35 , 37 ], irrespective of ovarian reserve, which interestingly was subsequently contradicted on further examinations [ 36 ]. Regardless of the reported mean retrieval of > 10 oocytes [ 38 , 39 , 40 ] except in one instance [ 41 ] and that > 40% achieved the established optimal range between 8 and 14 oocytes [ 38 , 40 , 41 ], the pregnancy outcomes did not reveal significant discrepancies between the groups, neither cumulative or per fresh ET [ 38 , 39 , 40 , 41 , 42 ].

Of note is that Asian women exhibit a higher risk of OHSS owing to the ethnic-related differences in weight than European females [ 36 , 37 , 43 ], but in a predictable dose-dependent and dose-exposure proportionality [ 44 ]. This observation refutes additional evidence on reduced risk of moderate/severe OHSS and preventive interventions in participants that followed ovarian stimulation with follitropin delta [ 43 , 45 ]. OHSS is a common iatrogenic complication reported in clinical trials. Most cases ranged from mild to moderate OHSS [ 38 , 39 , 46 ] or isolated severe [ 38 ] and circumstances of any grade OHSS where the incidence was higher than in the ESTHER-1, but with fewer moderate or severe cases [ 47 ].

Cumulative data highlight the safety profile of follitropin delta irrespective of the ovarian reserve as was tested in categories of patients regarded as normal, low, and high responders [ 36 , 37 ] indicative by the AMH value [ 43 ] even though it appears to differ in a dose–response manner depending on the AMH via a r-hFSH and endocrine parameters-follicular development interconnection [ 19 ]. However, its variability across stages of the menstrual cycle through multiple measurements suggests a limited impact considering the number of oocytes ± 1 when AMH < 15 pmol/L attained and dose adjustments of ± 1.5 μg when AMH ≥ 15 pmol/L [ 48 ]. It is worth noting that previous studies have found that commercial AMH assays may exhibit an intercycle variation exceeding 20% and reach 163% [ 49 , 50 , 51 ], which was associated with a lack of standardized AMH tests [ 52 , 53 , 54 ]. This has first limited result comparisons, but this concern was resolved since current assays started to display concordance for gonadotrophin prescribing [ 55 ].

The MARCS trial results regarding a higher mean number of oocytes retrieved and good quality blastocysts [ 47 ] are conflicting with the reports of another study in which the authors argue the presence of a lower proportion of day 3 good and intermediate blastocysts [ 56 ]. Moreover, PROFILE [ 39 ] and DELTA [ 38 ] trials, in parallel with that who preceded [ 42 ] or compared the results obtained in the ESTHER programs [ 40 , 41 , 46 ], have reached congruent conclusions on primary and secondary outcomes. Even if ET was fresh or frozen during a woman’s first stimulation cycle, the rate of major congenital disorders remained lower throughout the first 4 weeks after birth [ 42 ].

More recent studies from members of the ESTHER group [ 57 ] or conducted on specific populations from Europe [ 58 ] and North America [ 59 , 60 ] that might imply the concomitant administration of long gonadotropin-releasing hormone (GnRH) agonist [ 61 ] added further proof to the applicability into the “real-world” and extrapolation in clinical practice. Follitropin delta combined with menotropin [ 61 ] led to changes in endocrine and reproductive profile [ 57 ], while alone had no effect [ 58 ], and without notable clinical outcomes [ 57 , 61 ]. It appears that even a combined approach necessitates OHSS preventive measures [ 61 ], besides being constantly reported and varying from early to moderate, even late and graded from moderate to severe [ 57 ].

Fertility nurses rated the GONAL-f pen injector to be less prone to handling errors than other injectors [ 62 ] as there was an isolated situation in which the women wrongfully exposed themselves for short term to 72 µg follitropin delta for 3 consecutive days which interestingly had no major adverse events (AEs) [ 63 ].

Strengths and limitations of the study

This is the first systematic review conducted on this topic, and the quality of the RCTs included indicates a low risk of bias. However, it is important to note that there is a relatively low number of studies overall, and all of these have been predominantly conducted by the same team members.

Conclusions

Based on all the aspects covered in this systematic review, it can be concluded that follitropin delta provides a more consistent response than follitropin alfa or beta, thus being non-inferior. Follitropin delta proved to be a reliable r-hFSH dedicated to ovarian stimulation. This manuscript consolidates the actual spectrum of knowledge and may assist clinicians and researchers in future studies to translate this in clinical practice.

Data availability

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Lunenfeld B, Bilger W, Longobardi S, Alam V, D’Hooghe T, Sunkara SK. The Development of Gonadotropins for Clinical Use in the Treatment of Infertility [Internet]. Front Endocrinol. 2019;10:429. https://doi.org/10.3389/fendo.2019.00429 .

Article   Google Scholar  

Lunenfeld B. Gonadotropin stimulation: past, present and future. Reprod Med Biol [Internet]. 2012;11:11–25. https://doi.org/10.1007/s12522-011-0097-2 .

Article   CAS   PubMed   Google Scholar  

Niederberger C, Pellicer A, Cohen J, Gardner DK, Palermo GD, O’Neill CL, et al. Forty years of IVF. Fertil Steril [Internet]. 2018;110:185-324.e5. https://doi.org/10.1016/j.fertnstert.2018.06.005 .

Article   PubMed   Google Scholar  

Dias J, Ulloa-Aguirre A. New human follitropin preparations: how glycan structural differences may affect biochemical and biological function and clinical effect. Front Endocrinol (Lausanne). 2021;12:636038.

De Leo V, Musacchio MC, Di Sabatino A, Tosti C, Morgante G, Petraglia F. Present and future of recombinant gonadotropins in reproductive medicine [Internet]. Curr Pharm Biotechnol. 2012;13(3):379–91. http://www.eurekaselect.com/node/76486/article . Accessed 1 Mar 2024.

Sharara FI, McClamrock HD. Differences in in vitro fertilization (IVF) outcome between white and black women in an inner-city, university-based IVF program. Fertil Steril. 2000;73:1170–3.

Purcell K, Schembri M, Frazier LM, Rall MJ, Shen S, Croughan M, et al. Asian ethnicity is associated with reduced pregnancy outcomes after assisted reproductive technology. Fertil Steril. 2007;87:297–302.

Huddleston HG, Rosen MP, Lamb JD, Modan A, Cedars MI, Fujimoto VY. Asian ethnicity in anonymous oocyte donors is associated with increased estradiol levels but comparable recipient pregnancy rates compared with Caucasians. Fertil Steril. 2010;94:2059–63.

Tabbalat AM, Pereira N, Klauck D, Melhem C, Elias RT, Rosenwaks Z. Arabian Peninsula ethnicity is associated with lower ovarian reserve and ovarian response in women undergoing fresh ICSI cycles. J Assist Reprod Genet. 2018;35:331–7.

Fauser BCJM, Diedrich K, Devroey P. Predictors of ovarian response: progress towards individualized treatment in ovulation induction and ovarian stimulation. Hum Reprod Update. 2008;14:1–14.

Nelson SM. Biomarkers of ovarian response: current and future applications. Fertil Steril [Internet]. 2013;99:963–9. https://doi.org/10.1016/j.fertnstert.2012.11.051 .

La Marca A, Sunkara SK. Individualization of controlled ovarian stimulation in IVF using ovarian reserve markers: from theory to practice. Hum Reprod Update [Internet]. 2014;20:124–40. https://doi.org/10.1093/humupd/dmt037 .

Broekmans FJ, Kwee J, Hendriks DJ, Mol BW, Lambalk CB. A systematic review of tests predicting ovarian reserve and IVF outcome. Hum Reprod Update. 2006;12:685–718.

ESHRE Guidelines. Ovarian stimulation for IVF/ICSI. In: Guideline of the European society of human reproduction and embryology (2019) [Internet]. 2019. https://www.google.com/search?q=.+Available+at%3A+https%3A%2F%2F+www.Eshre.Eu%2F-%2FMedia%2FSitecore-Files%2FGuidelines%2FCos%2FEshre-Cos-Guideline_Final%0209102019_.Pdf%3FLa%3DEn%26Hash%3D2316ea35f8afd21c2fb193c33f3bdc272334c901&oq=.+Available+at%3A+http . Accessed 4 Mar 2024.

Koechling W, Plaksin D, Croston GE, Jeppesen J V, Macklon KT, Andersen CY. Comparative pharmacology of a new recombinant FSH expressed by a human cell line. Endocr Connect [Internet]. 2017;6:297–305. https://www.ncbi.nlm.nih.gov/pubmed/28450423 . Accessed 1 Mar 2024.

Olsson H, Sandström R, Grundemar L. Different pharmacokinetic and pharmacodynamic properties of recombinant follicle-stimulating hormone (rFSH) derived from a human cell line compared with rFSH from a non-human cell line. J Clin Pharmacol [Internet]. 2014;54:1299–307. https://doi.org/10.1002/jcph.328 .

Howles CM. Genetic engineering of human FSH (Gonal-F®). Hum Reprod Update [Internet]. 1996;2:172–91. https://doi.org/10.1093/humupd/2.2.172 .

Arce J-C, Klein BM, Erichsen L. Using AMH for determining a stratified gonadotropin dosing regimen for IVF/ICSI and optimizing outcomes. In: Seifer DB, Tal R, editors. Anti-Müllerian hormone: biology, role in ovarian function and clinical significance. 1st ed. Nova Science Publishers, Inc; 2016. pp. 83–102.

Bosch E, Nyboe Andersen A, Barri P, García-Velasco JA, de Sutter P, Fernández-Sánchez M, et al. Follicular and endocrine dose responses according to anti-Müllerian hormone levels in IVF patients treated with a novel human recombinant FSH (FE 999049). Clin Endocrinol (Oxf) [Internet]. 2015;83:902–12. https://doi.org/10.1111/cen.12864 .

Article   CAS   Google Scholar  

Rose TH, Röshammar D, Erichsen L, Grundemar L, Ottesen JT. Characterisation of Population Pharmacokinetics and Endogenous Follicle-Stimulating Hormone (FSH) Levels After Multiple Dosing of a Recombinant Human FSH (FE 999049) in Healthy Women. Drugs R D [Internet]. 2016;16:165–72. https://doi.org/10.1007/s40268-016-0126-z .

Bergandi L, Canosa S, Carosso AR, Paschero C, Gennarelli G, Silvagno F, et al. Human recombinant FSH and its biosimilars: clinical efficacy, safety, and cost-effectiveness in controlled ovarian stimulation for in vitro fertilization. Pharmaceuticals (Basel). 2020;13:136.

Suvarna V. Phase IV of drug development. Perspect Clin Res. 2010;1:57–60.

Article   PubMed   PubMed Central   Google Scholar  

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ [Internet]. 2021;372:n71. http://www.bmj.com/content/372/bmj.n71.abstract . Accessed 4 Mar 2024.

Falagas ME, Pitsouni EI, Malietzis GA, Pappas G. Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses. FASEB J Off Publ Fed Am Soc Exp Biol. 2008;22:338–42.

CAS   Google Scholar  

Sterne JAC, Savović J, Page MJ, Elbers RG, Blencowe NS, Boutron I, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ [Internet]. 2019;366:l4898. http://www.bmj.com/content/366/bmj.l4898.abstract . Accessed 4 Mar 2024.

McGuinness LA, Higgins JPT. Risk-of-bias VISualization (robvis): An R package and Shiny web app for visualizing risk-of-bias assessments. Res Synth Methods [Internet]. 2020;12:55–61. https://doi.org/10.1002/jrsm.1411 .

Nyboe Andersen A, Nelson SM, Fauser BCJM, García-Velasco JA, Klein BM, Arce J-C, et al. Individualized versus conventional ovarian stimulation for in vitro fertilization: a multicenter, randomized, controlled, assessor-blinded, phase 3 noninferiority trial. Fertil Steril [Internet]. 2017;107:387-396.e4. https://doi.org/10.1016/j.fertnstert.2016.10.033 .

Ishihara O, Klein BM, Arce J-C, Kuramoto T, Yokota Y, Mukaida T, et al. Randomized, assessor-blind, antimüllerian hormone-stratified, dose-response trial in Japanese in vitro fertilization/intracytoplasmic sperm injection patients undergoing controlled ovarian stimulation with follitropin delta. Fertil Steril [Internet]. 2021;115:1478–86. https://doi.org/10.1016/j.fertnstert.2020.10.059 .

Qiao J, Zhang Y, Liang X, Ho T, Huang H-Y, Kim S-H, et al. A randomised controlled trial to clinically validate follitropin delta in its individualised dosing regimen for ovarian stimulation in Asian IVF/ICSI patients. Hum Reprod [Internet]. 2021;36:2452–62. https://doi.org/10.1093/humrep/deab155 .

Ishihara O, Arce J-C. Individualized follitropin delta dosing reduces OHSS risk in Japanese IVF/ICSI patients: a randomized controlled trial. Reprod Biomed Online [Internet]. 2021;42:909–18. https://doi.org/10.1016/j.rbmo.2021.01.023 .

Fernández Sánchez M, Višnová H, Larsson P, Yding Andersen C, Filicori M, Blockeel C, et al. A randomized, controlled, first-in-patient trial of choriogonadotropin beta added to follitropin delta in women undergoing ovarian stimulation in a long GnRH agonist protocol. Hum Reprod [Internet]. 2022;37:1161–74. https://doi.org/10.1093/humrep/deac061 .

Yang R, Zhang Y, Liang X, Song X, Wei Z, Liu J, et al. Comparative clinical outcome following individualized follitropin delta dosing in Chinese women undergoing ovarian stimulation for in vitro fertilization /intracytoplasmic sperm injection. Reprod Biol Endocrinol. 2022;20:147.

Arce J-C, Larsson P, García-Velasco JA. Establishing the follitropin delta dose that provides a comparable ovarian response to 150 IU/day follitropin alfa. Reprod Biomed Online [Internet]. 2020;41:616–22. https://www.sciencedirect.com/science/article/pii/S1472648320303771 . Accessed 1 Mar 2024.

Shao F, Jiang Y, Ding S, Larsson P, Pinton P, Jonker DM. Pharmacokinetics and Safety of Follitropin Delta in Gonadotropin Down-Regulated Healthy Chinese Women. Clin Drug Investig. 2023;43:37–44.

Bosch E, Havelock J, Martin FS, Rasmussen BB, Klein BM, Mannaerts B, et al. Follitropin delta in repeated ovarian stimulation for IVF: a controlled, assessor-blind Phase 3 safety trial. Reprod Biomed Online [Internet]. 2019;38:195–205. https://doi.org/10.1016/j.rbmo.2018.10.012 .

Višnová H, Papaleo E, Martin FS, Koziol K, Klein BM, Mannaerts B. Clinical outcomes of potential high responders after individualized FSH dosing based on anti-Müllerian hormone and body weight. Reprod Biomed Online [Internet]. 2021;43:1019–26. https://doi.org/10.1016/j.rbmo.2021.08.024 .

Ishihara O, Nelson SM, Arce J-C. Comparison of ovarian response to follitropin delta in Japanese and White IVF/ICSI patients. Reprod Biomed Online [Internet]. 2022;44:177–84. https://doi.org/10.1016/j.rbmo.2021.09.014 .

Porcu-Buisson G, Maignien C, Swierkowski-Blanchard N, Rongières C, Ranisavljevic N, Oger P, et al. Prospective multicenter observational real-world study to assess the use, efficacy and safety profile of follitropin delta during IVF/ICSI procedures (DELTA Study). Eur J Obstet Gynecol Reprod Biol. 2024;293:21–6.

Blockeel C, Griesinger G, Rago R, Larsson P, Sonderegger YLY, Rivière S, et al. Prospective multicenter non-interventional real-world study to assess the patterns of use, effectiveness and safety of follitropin delta in routine clinical practice (the PROFILE study). Front Endocrinol (Lausanne). 2022;13:992677.

Bachmann A, Kissler S, Laubert I, Mehrle P, Mempel A, Reissmann C, et al. An eight centre, retrospective, clinical practice data analysis of algorithm-based treatment with follitropin delta. Reprod Biomed Online [Internet]. 2022;44:853–7. https://doi.org/10.1016/j.rbmo.2021.12.013 .

Kovacs P, Jayakumaran J, Lu Y, Lindheim SR. Comparing pregnancy rates following ovarian stimulation with follitropin-Δ to follitropin -α in routine IVF: A retrospective analysis. Eur J Obstet Gynecol Reprod Biol [Internet]. 2023;280:22–7. https://doi.org/10.1016/j.ejogrb.2022.11.006 .

Havelock J, Aaris Henningsen A-K, Mannaerts B, Arce J-C, Groups E-1 and E-2 T. Pregnancy and neonatal outcomes in fresh and frozen cycles using blastocysts derived from ovarian stimulation with follitropin delta. J Assist Reprod Genet [Internet]. 2021;38:2651–61. https://doi.org/10.1007/s10815-021-02271-5

Fernández-Sánchez M, Visnova H, Yuzpe A, Klein BM, Mannaerts B, Arce J-C. Individualization of the starting dose of follitropin delta reduces the overall OHSS risk and/or the need for additional preventive interventions: cumulative data over three stimulation cycles. Reprod Biomed Online [Internet]. 2019;38:528–37. https://doi.org/10.1016/j.rbmo.2018.12.032 .

Olsson H, Sandström R, Bagger Y. Dose-exposure proportionality of a novel recombinant follicle-stimulating hormone (rFSH), FE 999049, derived from a human cell line, with comparison between Caucasian and Japanese women after subcutaneous administration. Clin Drug Investig [Internet]. 2015;35:247–53. https://doi.org/10.1007/s40261-015-0276-8 .

Višnová H, Papaleo E, Martin FS, Koziol K, Klein BM, Mannaerts B. Clinical outcomes of potential high responders after individualized FSH dosing based on anti-Müllerian hormone and body weight. Reprod Biomed Online. 2021;43:1019–26.

Doroftei B, Ilie O-D, Dabuleanu A-M, Diaconu R, Maftei R, Simionescu G, et al. Follitropin delta as a state-of-the-art incorporated companion for assisted reproductive procedures: A two year observational study. Medicina. 2021;57:379.

Bissonnette F, MinanoMasip J, Kadoch I-J, Librach C, Sampalis J, Yuzpe A. Individualized ovarian stimulation for in vitro fertilization: a multicenter, open label, exploratory study with a mixed protocol of follitropin delta and highly purified human menopausal gonadotropin. Fertil Steril [Internet]. 2021;115:991–1000. https://doi.org/10.1016/j.fertnstert.2020.09.158 .

Nelson SM, Larsson P, Mannaerts BMJL, Nyboe Andersen A, Fauser BCJM. Anti-Müllerian hormone variability and its implications for the number of oocytes retrieved following individualized dosing with follitropin delta. Clin Endocrinol (Oxf) [Internet]. 2019;90:719–26. https://doi.org/10.1111/cen.13956 .

Bungum L, Tagevi J, Jokubkiene L, Bungum M, Giwercman A, Macklon N, et al. The impact of the biological variability or assay performance on amh measurements: A prospective cohort study with AMH tested on three analytical assay-platforms. Front Endocrinol (Lausanne) [Internet]. 2018;9:603. https://pubmed.ncbi.nlm.nih.gov/30459709 . Accessed 1 Mar 2024.

Gorkem U, Togrul C. Is There a Need to Alter the Timing of Anti-Müllerian Hormone Measurement During the Menstrual Cycle? Geburtshilfe Frauenheilkd. 2019;79:731–7.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Melado L, Lawrenz B, Sibal J, Abu E, Coughlan C, Navarro AT, et al. Anti-müllerian Hormone During Natural Cycle Presents Significant Intra and Intercycle Variations When Measured With Fully Automated Assay. Front Endocrinol (Lausanne). 2018;9:686.

Iliodromiti S, Salje B, Dewailly D, Fairburn C, Fanchin R, Fleming R, et al. Non-equivalence of anti-Müllerian hormone automated assays-clinical implications for use as a companion diagnostic for individualised gonadotrophin dosing. Hum Reprod [Internet]. 2017;32:1710–5. https://pubmed.ncbi.nlm.nih.gov/28854583 . Accessed 1 Mar 2024.

Magnusson Å, Oleröd G, Thurin-Kjellberg A, Bergh C. The correlation between AMH assays differs depending on actual AMH levels. Hum Reprod Open. 2017;2017:hox026.

ACOG Committee Opinion No. 773 Summary: The Use of Antimüllerian Hormone in Women Not Seeking Fertility Care. Obstet Gynecol. 2019;133:840–1.

La Marca A, Tolani AD, Capuzzo M. The interchangeability of two assays for the measurement of anti-Müllerian hormone when personalizing the dose of FSH in in-vitro fertilization cycles. Gynecol Endocrinol Off J Int Soc Gynecol Endocrinol. 2021;37:372–6.

Haakman O, Liang T, Murray K, Vilos A, Vilos G, Bates C, et al. In vitro fertilization cycles stimulated with follitropin delta result in similar embryo development and quality when compared with cycles stimulated with follitropin alfa or follitropin beta. F&S Reports [Internet]. 2021;2:30–5. https://doi.org/10.1016/j.xfre.2020.12.002 .

Sánchez MF, Larsson P, Serrano MF, Bosch E, Velasco JAG, López ES, et al. Live birth rates following individualized dosing algorithm of follitropin delta in a long GnRH agonist protocol. Reprod Biol Endocrinol. 2023;21:45.

Gazzo I, Bovis F, Colia D, Sozzi F, Costa M, Anserini P, et al. Algorithm vs clinical experience: controlled ovarian stimulations with follitropin delta and individualised doses of follitropin alpha/beta. Reprod Fertil [Internet]. 2024;5:e230045. https://raf.bioscientifica.com/view/journals/raf/5/1/RAF-23-0045.xml . Accessed 29 Feb 2024.

Arab S, Frank R, Ruiter J, Dahan MH. How to dose follitropin delta for the first insemination cycle according to the ESHRE and ASRM guidelines; a retrospective cohort study. J Ovarian Res [Internet]. 2023;16:24. https://doi.org/10.1186/s13048-022-01079-w .

Yacoub S, Cadesky K, Casper RF. Low risk of OHSS with follitropin delta use in women with different polycystic ovary syndrome phenotypes: a retrospective case series. J Ovarian Res [Internet]. 2021;14:31. https://doi.org/10.1186/s13048-021-00773-5 .

Duarte-Filho OB, Miyadahira EH, Matsumoto L, Yamakami LYS, Tomioka RB, Podgaec S. Follitropin delta combined with menotropin in patients at risk for poor ovarian response during in vitro fertilization cycles: a prospective controlled clinical study. Reprod Biol Endocrinol. 2024;22:7.

Longobardi S, Seidler A, Martins J, Beckers F, MacGillivray W, D’Hooghe T. An evaluation of the use and handling errors of currently available recombinant human follicle-stimulating hormone pen injectors by women with infertility and fertility nurses. Expert Opin Drug Deliv [Internet]. 2019;16:1003–14. https://doi.org/10.1080/17425247.2019.1651290 .

Baldini GM, Mastrorocco A, Sciorio R, Palini S, Dellino M, Cascardi E, et al. Inadvertent administration of 72 µg of Follitropin-Δ for three consecutive days does not appear to be dangerous for poor responders: a case series. J Clin Med. 2023;12:5202.

Download references

Author information

Authors and affiliations.

Department of Mother and Child, Faculty of Medicine, University of Medicine and Pharmacy “Grigore T. Popa”, University Street No. 16, 700115, Iasi, Romania

Bogdan Doroftei, Ovidiu-Dumitru Ilie, Theodora Armeanu & Radu Maftei

Clinical Hospital of Obstetrics and Gynecology “Cuza Voda”, Cuza Voda Street No. 34, 700038, Iasi, Romania

Bogdan Doroftei, Ana-Maria Dabuleanu, Theodora Armeanu & Radu Maftei

Origyn Fertility Center, Palace Street No. 3C, 700032, Iasi, Romania

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization, data curation, investigation, formal analysis, methodology, software, visualization, writing—original draft: B.D., O.-D.I., T.A., and R.M. Validation, project administration, writing—review and editing: B.D., O.-D.I., and R.M. Visualization; writing, review and editing: A.-M.D. All authors have read and agreed with this version of the manuscript.

Corresponding author

Correspondence to Ovidiu-Dumitru Ilie .

Ethics declarations

Institutional review board.

Not applicable.

Informed consent

Conflict of interest.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 26.5 KB)

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Doroftei, B., Ilie, OD., Dabuleanu, AM. et al. The pregnancy outcomes among women receiving individualized algorithm dosing with follitropin delta: a systematic review of randomized controlled trials. J Assist Reprod Genet (2024). https://doi.org/10.1007/s10815-024-03146-1

Download citation

Received : 13 March 2024

Accepted : 15 May 2024

Published : 29 May 2024

DOI : https://doi.org/10.1007/s10815-024-03146-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Follitropin delta
  • Ovarian stimulation
  • Infertility
  • Find a journal
  • Publish with us
  • Track your research

COMMENTS

  1. A Systematic Literature Review of A* Pathfinding

    A Systematic Literature Review of A* Pathfinding. A* is a search algorithm that has long been used in the pathfinding research community. Its efficiency, simplicity, and modularity are often highlighted as its strengths compared to other tools. Due to its ubiquity and widespread usage, A* has become a common option for researchers attempting to ...

  2. A Systematic Literature Review of A* Pathfinding

    The A* Algorithm have been a tried-and-true method to use as a basis for pathfinding problems. For Usually A*. algorithm use for broad game and strategy-based game, it's stability and quickness ...

  3. Computational Literature Reviews: Method, Algorithms, and Roadmap

    We respond by introducing and defining computational literature reviews (CLRs) as a new review method and put forward a six-step roadmap, covering the CLR process from identifying the review objectives to selecting algorithms and reporting findings. We make the CLR method accessible to novice and expert users alike by identifying critical ...

  4. A systematic approach to searching: an efficient and complete method to

    INTRODUCTION. Librarians and information specialists are often involved in the process of preparing and completing systematic reviews (SRs), where one of their main tasks is to identify relevant references to include in the review [].Although several recommendations for the process of searching have been published [2-6], none describe the development of a systematic search strategy from ...

  5. An open source machine learning framework for efficient and ...

    Developing a search strategy for a systematic review is an iterative process aimed at balancing recall and precision 8,9; that is, including as many potentially relevant studies as possible while ...

  6. How to carry out a literature search for a systematic review: a

    A literature search is distinguished from, but integral to, a literature review. Literature reviews are conducted for the purpose of (a) locating information on a topic or identifying gaps in the literature for areas of future study, (b) synthesising conclusions in an area of ambiguity and (c) helping clinicians and researchers inform decision-making and practice guidelines.

  7. PDF Problematic Machine Behavior: A Systematic Literature Review of

    This systematic literature review aims todo justthat, following PRISMA guidelines in a review of over 500 English articles that yielded 62 algorithm audit studies. The studies are synthesized and organized primarily by behavior (discrim-ination, distortion, exploitation, and misjudgement), with codes also provided for domain (e.g. search ...

  8. A systematic approach to searching: an efficient and complete ...

    The authors have established a method that describes step by step the process of developing a systematic search strategy as needed in the systematic review. This method describes how single-line search strategies can be prepared in a text document by typing search syntax (such as field codes, parentheses, and Boolean operators) before copying ...

  9. Cuckoo Search Algorithm for Optimization Problems—A Literature Review

    This article presents a literature review of the cuckoo search algorithm. The objective of this study is to summarize the overview and the application of CS in all categories that were reviewed. From the various researches in the literature, it was proven that standard CS in combination with Lévy flight was the main technique that is used in CS.

  10. Guidance on Conducting a Systematic Literature Review

    Literature reviews establish the foundation of academic inquires. However, in the planning field, we lack rigorous systematic reviews. In this article, through a systematic search on the methodology of literature review, we categorize a typology of literature reviews, discuss steps in conducting a systematic literature review, and provide suggestions on how to enhance rigor in literature ...

  11. Cuckoo Search: A Brief Literature Review

    1 Introduction. Since the first introduction of Cuckoo Search (CS) by Xin-She Yang and Suash Deb in 2009 [ 109 ], the literature of this algorithm has exploded. Cuckoo search, which drew its inspiration from the brooding parasitism of cuckoo species in Nature, were firstly proposed as a tool for numerical function optimization and continuous ...

  12. Systematic literature review of machine learning methods used in the

    This study originated from a systematic literature review that was conducted in MEDLINE and PsychInfo; a refreshed search was conducted in September 2020 to obtain newer publications (Table 1).Eligible studies were those that analyzed prospective or retrospective observational data, reported quantitative results, and described statistical methods specifically applicable to patient-level ...

  13. Teaching Algorithm Design: A Literature Review

    and literature review of CS Education studies. We search for research that is both related to algorithm design (as described by the ACM Curricular Guidelines) and evaluated on undergraduate-level students. Across all papers in the ACM Digital Library prior to August 2023, we only find 94 such papers.

  14. Gravitational Search Algorithm

    Theory, Literature Review, and Applications. By Amin Hashemi , Mohammad Bagher Dowlatshahi , Hossein Nezamabadi-Pour. Book Handbook of AI-based Metaheuristics. Click here to navigate to parent product. Edition 1st Edition. First Published 2021. Imprint CRC Press. Pages 32. eBook ISBN 9781003162841.

  15. Literature search for research planning and identification of research

    Abstract. Literature search is a key step in performing good authentic research. It helps in formulating a research question and planning the study. The available published data are enormous; therefore, choosing the appropriate articles relevant to your study in question is an art. It can be time-consuming, tiring and can lead to disinterest or ...

  16. Problematic Machine Behavior: A Systematic Literature Review of

    This systematic literature review aims to do just that, following PRISMA guidelines in a review of over 500 English articles that yielded 62 algorithm audit studies. The studies are synthesized and organized primarily by behavior (discrimination, distortion, exploitation, and misjudgement), with codes also provided for domain (e.g. search ...

  17. A Systematic Review on Harmony Search Algorithm: Theory, Literature

    Harmony search algorithm is the recently developed metaheuristic in the last decade. It mimics the behavior of a musician producing a perfect harmony. It has been used to solve the wide variety of real-life optimization problems due to its easy implementation over other metaheuristics. It has an ability to provide the balance between exploration and exploitation during search. In this paper, a ...

  18. A Systematic Review on Harmony Search Algorithm: Theory, Literature

    In this paper, a systematic review on harmony search algorithm (HSA) is presented. e natural inspiration and. conceptual framework of HSA are discussed. e control parameters of HSA are described ...

  19. A Genetic Algorithm with Lower Neighborhood Search for the Three

    Literature Review. In this section, we review related research on the packing problem from three perspectives: application scenarios, dimensionality, and solution algorithms, respectively. ... To substantiate the efficacy of the lower neighborhood search genetic algorithm (LNSGA) for the three-dimensional multiorder single-box open-dimension ...

  20. The use of reinforcement learning algorithms in object tracking: A

    DOI: 10.1016/j.neucom.2024.127954 Corpus ID: 270202442; The use of reinforcement learning algorithms in object tracking: A systematic literature review @article{BarrientosR2024TheUO, title={The use of reinforcement learning algorithms in object tracking: A systematic literature review}, author={David J. Barrientos R and Marie Chantelle C. Medina and Bruno J.T. Fernandes and Pablo V.A. Barros ...

  21. An automated method for developing search strategies for systematic

    Ananse was applied to a case example towards finding search terms to implement a systematic literature review on cumulative effect studies on forest ... automation in SR has focused chiefly on extracting data or results after a literature search, ... the data models used by the system, the interfaces, and, sometimes, the algorithms used [32 ...

  22. Plos One

    Discover a faster, simpler path to publishing in a high-quality journal. PLOS ONE promises fair, rigorous peer review, broad scope, and wide readership - a perfect fit for your research every time.. Learn More Submit Now

  23. The pregnancy outcomes among women receiving individualized algorithm

    We conducted a registered systematic review (CRD42024512792) on PubMed-MEDLINE, Web of Science™, Cochrane Database of Systematic Reviews, and Scopus. Our search was designed to cover all relevant literature, particularly randomized controlled trials. ... The personalized individualized-based algorithm dosing with follitropin delta is non ...