Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 01 February 2021

An open source machine learning framework for efficient and transparent systematic reviews

  • Rens van de Schoot   ORCID: orcid.org/0000-0001-7736-2091 1 ,
  • Jonathan de Bruin   ORCID: orcid.org/0000-0002-4297-0502 2 ,
  • Raoul Schram 2 ,
  • Parisa Zahedi   ORCID: orcid.org/0000-0002-1610-3149 2 ,
  • Jan de Boer   ORCID: orcid.org/0000-0002-0531-3888 3 ,
  • Felix Weijdema   ORCID: orcid.org/0000-0001-5150-1102 3 ,
  • Bianca Kramer   ORCID: orcid.org/0000-0002-5965-6560 3 ,
  • Martijn Huijts   ORCID: orcid.org/0000-0002-8353-0853 4 ,
  • Maarten Hoogerwerf   ORCID: orcid.org/0000-0003-1498-2052 2 ,
  • Gerbrich Ferdinands   ORCID: orcid.org/0000-0002-4998-3293 1 ,
  • Albert Harkema   ORCID: orcid.org/0000-0002-7091-1147 1 ,
  • Joukje Willemsen   ORCID: orcid.org/0000-0002-7260-0828 1 ,
  • Yongchao Ma   ORCID: orcid.org/0000-0003-4100-5468 1 ,
  • Qixiang Fang   ORCID: orcid.org/0000-0003-2689-6653 1 ,
  • Sybren Hindriks 1 ,
  • Lars Tummers   ORCID: orcid.org/0000-0001-9940-9874 5 &
  • Daniel L. Oberski   ORCID: orcid.org/0000-0001-7467-2297 1 , 6  

Nature Machine Intelligence volume  3 ,  pages 125–133 ( 2021 ) Cite this article

78k Accesses

270 Citations

163 Altmetric

Metrics details

  • Computational biology and bioinformatics
  • Computer science
  • Medical research

A preprint version of the article is available at arXiv.

To help researchers conduct a systematic review or meta-analysis as efficiently and transparently as possible, we designed a tool to accelerate the step of screening titles and abstracts. For many tasks—including but not limited to systematic reviews and meta-analyses—the scientific literature needs to be checked systematically. Scholars and practitioners currently screen thousands of studies by hand to determine which studies to include in their review or meta-analysis. This is error prone and inefficient because of extremely imbalanced data: only a fraction of the screened studies is relevant. The future of systematic reviewing will be an interaction with machine learning algorithms to deal with the enormous increase of available text. We therefore developed an open source machine learning-aided pipeline applying active learning: ASReview. We demonstrate by means of simulation studies that active learning can yield far more efficient reviewing than manual reviewing while providing high quality. Furthermore, we describe the options of the free and open source research software and present the results from user experience tests. We invite the community to contribute to open source projects such as our own that provide measurable and reproducible improvements over current practice.

Similar content being viewed by others

automation tools for literature review

AI-assisted peer review

automation tools for literature review

A typology for exploring the mitigation of shortcut behaviour

automation tools for literature review

Distributed peer review enhanced with natural language processing and machine learning

With the emergence of online publishing, the number of scientific manuscripts on many topics is skyrocketing 1 . All of these textual data present opportunities to scholars and practitioners while simultaneously confronting them with new challenges. Scholars often develop systematic reviews and meta-analyses to develop comprehensive overviews of the relevant topics 2 . The process entails several explicit and, ideally, reproducible steps, including identifying all likely relevant publications in a standardized way, extracting data from eligible studies and synthesizing the results. Systematic reviews differ from traditional literature reviews in that they are more replicable and transparent 3 , 4 . Such systematic overviews of literature on a specific topic are pivotal not only for scholars, but also for clinicians, policy-makers, journalists and, ultimately, the general public 5 , 6 , 7 .

Given that screening the entire research literature on a given topic is too labour intensive, scholars often develop quite narrow searches. Developing a search strategy for a systematic review is an iterative process aimed at balancing recall and precision 8 , 9 ; that is, including as many potentially relevant studies as possible while simultaneously limiting the total number of studies retrieved. The vast number of publications in the field of study often leads to a relatively precise search, with the risk of missing relevant studies. The process of systematic reviewing is error prone and extremely time intensive 10 . In fact, if the literature of a field is growing faster than the amount of time available for systematic reviews, adequate manual review of this field then becomes impossible 11 .

The rapidly evolving field of machine learning has aided researchers by allowing the development of software tools that assist in developing systematic reviews 11 , 12 , 13 , 14 . Machine learning offers approaches to overcome the manual and time-consuming screening of large numbers of studies by prioritizing relevant studies via active learning 15 . Active learning is a type of machine learning in which a model can choose the data points (for example, records obtained from a systematic search) it would like to learn from and thereby drastically reduce the total number of records that require manual screening 16 , 17 , 18 . In most so-called human-in-the-loop 19 machine-learning applications, the interaction between the machine-learning algorithm and the human is used to train a model with a minimum number of labelling tasks. Unique to systematic reviewing is that not only do all relevant records (that is, titles and abstracts) need to seen by a researcher, but an extremely diverse range of concepts also need to be learned, thereby requiring flexibility in the modelling approach as well as careful error evaluation 11 . In the case of systematic reviewing, the algorithm(s) are interactively optimized for finding the most relevant records, instead of finding the most accurate model. The term researcher-in-the-loop was introduced 20 as a special case of human-in-the-loop with three unique components: (1) the primary output of the process is a selection of the records, not a trained machine learning model; (2) all records in the relevant selection are seen by a human at the end of the process 21 ; (3) the use-case requires a reproducible workflow and complete transparency is required 22 .

Existing tools that implement such an active learning cycle for systematic reviewing are described in Table 1 ; see the Supplementary Information for an overview of all of the software that we considered (note that this list was based on a review of software tools 12 ). However, existing tools have two main drawbacks. First, many are closed source applications with black box algorithms, which is problematic as transparency and data ownership are essential in the era of open science 22 . Second, to our knowledge, existing tools lack the necessary flexibility to deal with the large range of possible concepts to be learned by a screening machine. For example, in systematic reviews, the optimal type of classifier will depend on variable parameters, such as the proportion of relevant publications in the initial search and the complexity of the inclusion criteria used by the researcher 23 . For this reason, any successful system must allow for a wide range of classifier types. Benchmark testing is crucial to understand the real-world performance of any machine learning-aided system, but such benchmark options are currently mostly lacking.

In this paper we present an open source machine learning-aided pipeline with active learning for systematic reviews called ASReview. The goal of ASReview is to help scholars and practitioners to get an overview of the most relevant records for their work as efficiently as possible while being transparent in the process. The open, free and ready-to-use software ASReview addresses all concerns mentioned above: it is open source, uses active learning, allows multiple machine learning models. It also has a benchmark mode, which is especially useful for comparing and designing algorithms. Furthermore, it is intended to be easily extensible, allowing third parties to add modules that enhance the pipeline. Although we focus this paper on systematic reviews, ASReview can handle any text source.

In what follows, we first present the pipeline for manual versus machine learning-aided systematic reviews. We then show how ASReview has been set up and how ASReview can be used in different workflows by presenting several real-world use cases. We subsequently demonstrate the results of simulations that benchmark performance and present the results of a series of user-experience tests. Finally, we discuss future directions.

Pipeline for manual and machine learning-aided systematic reviews

The pipeline of a systematic review without active learning traditionally starts with researchers doing a comprehensive search in multiple databases 24 , using free text words as well as controlled vocabulary to retrieve potentially relevant references. The researcher then typically verifies that the key papers they expect to find are indeed included in the search results. The researcher downloads a file with records containing the text to be screened. In the case of systematic reviewing it contains the titles and abstracts (and potentially other metadata such as the authors’s names, journal name, DOI) of potentially relevant references into a reference manager. Ideally, two or more researchers then screen the records’s titles and abstracts on the basis of the eligibility criteria established beforehand 4 . After all records have been screened, the full texts of the potentially relevant records are read to determine which of them will be ultimately included in the review. Most records are excluded in the title and abstract phase. Typically, only a small fraction of the records belong to the relevant class, making title and abstract screening an important bottleneck in systematic reviewing process 25 . For instance, a recent study analysed 10,115 records and excluded 9,847 after title and abstract screening, a drop of more than 95% 26 . ASReview therefore focuses on this labour-intensive step.

The research pipeline of ASReview is depicted in Fig. 1 . The researcher starts with a search exactly as described above and subsequently uploads a file containing the records (that is, metadata containing the text of the titles and abstracts) into the software. Prior knowledge is then selected, which is used for training of the first model and presenting the first record to the researcher. As screening is a binary classification problem, the reviewer must select at least one key record to include and exclude on the basis of background knowledge. More prior knowledge may result in improved efficiency of the active learning process.

figure 1

The symbols indicate whether the action is taken by a human, a computer, or whether both options are available.

A machine learning classifier is trained to predict study relevance (labels) from a representation of the record-containing text (feature space) on the basis of prior knowledge. We have purposefully chosen not to include an author name or citation network representation in the feature space to prevent authority bias in the inclusions. In the active learning cycle, the software presents one new record to be screened and labelled by the user. The user’s binary label (1 for relevant versus 0 for irrelevant) is subsequently used to train a new model, after which a new record is presented to the user. This cycle continues up to a certain user-specified stopping criterion has been reached. The user now has a file with (1) records labelled as either relevant or irrelevant and (2) unlabelled records ordered from most to least probable to be relevant as predicted by the current model. This set-up helps to move through a large database much quicker than in the manual process, while the decision process simultaneously remains transparent.

Software implementation for ASReview

The source code 27 of ASReview is available open source under an Apache 2.0 license, including documentation 28 . Compiled and packaged versions of the software are available on the Python Package Index 29 or Docker Hub 30 . The free and ready-to-use software ASReview implements oracle, simulation and exploration modes. The oracle mode is used to perform a systematic review with interaction by the user, the simulation mode is used for simulation of the ASReview performance on existing datasets, and the exploration mode can be used for teaching purposes and includes several preloaded labelled datasets.

The oracle mode presents records to the researcher and the researcher classifies these. Multiple file formats are supported: (1) RIS files are used by digital libraries such as IEEE Xplore, Scopus and ScienceDirect; the citation managers Mendeley, RefWorks, Zotero and EndNote support the RIS format too. (2) Tabular datasets with the .csv, .xlsx and .xls file extensions. CSV files should be comma separated and UTF-8 encoded; the software for CSV files accepts a set of predetermined labels in line with the ones used in RIS files. Each record in the dataset should hold the metadata on, for example, a scientific publication. Mandatory metadata is text and can, for example, be titles or abstracts from scientific papers. If available, both are used to train the model, but at least one is needed. An advanced option is available that splits the title and abstracts in the feature-extraction step and weights the two feature matrices independently (for TF–IDF only). Other metadata such as author, date, DOI and keywords are optional but not used for training the models. When using ASReview in the simulation or exploration mode, an additional binary variable is required to indicate historical labelling decisions. This column, which is automatically detected, can also be used in the oracle mode as background knowledge for previous selection of relevant papers before entering the active learning cycle. If unavailable, the user has to select at least one relevant record that can be identified by searching the pool of records. At least one irrelevant record should also be identified; the software allows to search for specific records or presents random records that are most likely to be irrelevant due to the extremely imbalanced data.

The software has a simple yet extensible default model: a naive Bayes classifier, TF–IDF feature extraction, a dynamic resampling balance strategy 31 and certainty-based sampling 17 , 32 for the query strategy. These defaults were chosen on the basis of their consistently high performance in benchmark experiments across several datasets 31 . Moreover, the low computation time of these default settings makes them attractive in applications, given that the software should be able to run locally. Users can change the settings, shown in Table 2 , and technical details are described in our documentation 28 . Users can also add their own classifiers, feature extraction techniques, query strategies and balance strategies.

ASReview has a number of implemented features (see Table 2 ). First, there are several classifiers available: (1) naive Bayes; (2) support vector machines; (3) logistic regression; (4) neural networks; (5) random forests; (6) LSTM-base, which consists of an embedding layer, an LSTM layer with one output, a dense layer and a single sigmoid output node; and (7) LSTM-pool, which consists of an embedding layer, an LSTM layer with many outputs, a max pooling layer and a single sigmoid output node. The feature extraction techniques available are Doc2Vec 33 , embedding LSTM, embedding with IDF or TF–IDF 34 (the default is unigram, with the option to run n -grams while other parameters are set to the defaults of Scikit-learn 35 ) and sBERT 36 . The available query strategies for the active learning part are (1) random selection, ignoring model-assigned probabilities; (2) uncertainty-based sampling, which chooses the most uncertain record according to the model (that is, closest to 0.5 probability); (3) certainty-based sampling (max in ASReview), which chooses the record most likely to be included according to the model; and (4) mixed sampling, which uses a combination of random and certainty-based sampling.

There are several balance strategies that rebalance and reorder the training data. This is necessary, because the data is typically extremely imbalanced and therefore we have implemented the following balance strategies: (1) full sampling, which uses all of the labelled records; (2) undersampling the irrelevant records so that the included and excluded records are in some particular ratio (closer to one); and (3) dynamic resampling, a novel method similar to undersampling in that it decreases the imbalance of the training data 31 . However, in dynamic resampling, the number of irrelevant records is decreased, whereas the number of relevant records is increased by duplication such that the total number of records in the training data remains the same. The ratio between relevant and irrelevant records is not fixed over interactions, but dynamically updated depending on the number of labelled records, the total number of records and the ratio between relevant and irrelevant records. Details on all of the described algorithms can be found in the code and documentation referred to above.

By default, ASReview converts the records’s texts into a document-term matrix, terms are converted to lowercase and no stop words are removed by default (but this can be changed). As the document-term matrix is identical in each iteration of the active learning cycle, it is generated in advance of model training and stored in the (active learning) state file. Each row of the document-term matrix can easily be requested from the state-file. Records are internally identified by their row number in the input dataset. In oracle mode, the record that is selected to be classified is retrieved from the state file and the record text and other metadata (such as title and abstract) are retrieved from the original dataset (from the file or the computer’s memory). ASReview can run on your local computer, or on a (self-hosted) local or remote server. Data (all records and their labels) remain on the users’s computer. Data ownership and confidentiality are crucial and no data are processed or used in any way by third parties. This is unique by comparison with some of the existing systems, as shown in the last column of Table 1 .

Real-world use cases and high-level function descriptions

Below we highlight a number of real-world use cases and high-level function descriptions for using the pipeline of ASReview.

ASReview can be integrated in classic systematic reviews or meta-analyses. Such reviews or meta-analyses entail several explicit and reproducible steps, as outlined in the PRISMA guidelines 4 . Scholars identify all likely relevant publications in a standardized way, screen retrieved publications to select eligible studies on the basis of defined eligibility criteria, extract data from eligible studies and synthesize the results. ASReview fits into this process, particularly in the abstract screening phase. ASReview does not replace the initial step of collecting all potentially relevant studies. As such, results from ASReview depend on the quality of the initial search process, including selection of databases 24 and construction of comprehensive searches using keywords and controlled vocabulary. However, ASReview can be used to broaden the scope of the search (by keyword expansion or omitting limitation in the search query), resulting in a higher number of initial papers to limit the risk of missing relevant papers during the search part (that is, more focus on recall instead of precision).

Furthermore, many reviewers nowadays move towards meta-reviews when analysing very large literature streams, that is, systematic reviews of systematic reviews 37 . This can be problematic as the various reviews included could use different eligibility criteria and are therefore not always directly comparable. Due to the efficiency of ASReview, scholars using the tool could conduct the study by analysing the papers directly instead of using the systematic reviews. Furthermore, ASReview supports the rapid update of a systematic review. The included papers from the initial review are used to train the machine learning model before screening of the updated set of papers starts. This allows the researcher to quickly screen the updated set of papers on the basis of decisions made in the initial run.

As an example case, let us look at the current literature on COVID-19 and the coronavirus. An enormous number of papers are being published on COVID-19. It is very time consuming to manually find relevant papers (for example, to develop treatment guidelines). This is especially problematic as urgent overviews are required. Medical guidelines rely on comprehensive systematic reviews, but the medical literature is growing at breakneck pace and the quality of the research is not universally adequate for summarization into policy 38 . Such reviews must entail adequate protocols with explicit and reproducible steps, including identifying all potentially relevant papers, extracting data from eligible studies, assessing potential for bias and synthesizing the results into medical guidelines. Researchers need to screen (tens of) thousands of COVID-19-related studies by hand to find relevant papers to include in their overview. Using ASReview, this can be done far more efficiently by selecting key papers that match their (COVID-19) research question in the first step; this should start the active learning cycle and lead to the most relevant COVID-19 papers for their research question being presented next. A plug-in was therefore developed for ASReview 39 , which contained three databases that are updated automatically whenever a new version is released by the owners of the data: (1) the Cord19 database, developed by the Allen Institute for AI, with over all publications on COVID-19 and other coronavirus research (for example SARS, MERS and so on) from PubMed Central, the WHO COVID-19 database of publications, the preprint servers bioRxiv and medRxiv and papers contributed by specific publishers 40 . The CORD-19 dataset is updated daily by the Allen Institute for AI and updated also daily in the plugin. (2) In addition to the full dataset, we automatically construct a daily subset of the database with studies published after December 1st, 2019 to search for relevant papers published during the COVID-19 crisis. (3) A separate dataset of COVID-19 related preprints, containing metadata of preprints from over 15 preprints servers across disciplines, published since January 1st, 2020 41 . The preprint dataset is updated weekly by the maintainers and then automatically updated in ASReview as well. As this dataset is not readily available to researchers through regular search engines (for example, PubMed), its inclusion in ASReview provided added value to researchers interested in COVID-19 research, especially if they want a quick way to screen preprints specifically.

Simulation study

To evaluate the performance of ASReview on a labelled dataset, users can employ the simulation mode. As an example, we ran simulations based on four labelled datasets with version 0.7.2 of ASReview. All scripts to reproduce the results in this paper can be found on Zenodo ( https://doi.org/10.5281/zenodo.4024122 ) 42 , whereas the results are available at OSF ( https://doi.org/10.17605/OSF.IO/2JKD6 ) 43 .

First, we analysed the performance for a study systematically describing studies that performed viral metagenomic next-generation sequencing in common livestock such as cattle, small ruminants, poultry and pigs 44 . Studies were retrieved from Embase ( n  = 1,806), Medline ( n  = 1,384), Cochrane Central ( n  = 1), Web of Science ( n  = 977) and Google Scholar ( n  = 200, the top relevant references). After deduplication this led to 2,481 studies obtained in the initial search, of which 120 were inclusions (4.84%).

A second simulation study was performed on the results for a systematic review of studies on fault prediction in software engineering 45 . Studies were obtained from ACM Digital Library, IEEExplore and the ISI Web of Science. Furthermore, a snowballing strategy and a manual search were conducted, accumulating to 8,911 publications of which 104 were included in the systematic review (1.2%).

A third simulation study was performed on a review of longitudinal studies that applied unsupervised machine learning techniques to longitudinal data of self-reported symptoms of the post-traumatic stress assessed after trauma exposure 46 , 47 ; 5,782 studies were obtained by searching Pubmed, Embase, PsychInfo and Scopus and through a snowballing strategy in which both the references and the citation of the included papers were screened. Thirty-eight studies were included in the review (0.66%).

A fourth simulation study was performed on the results for a systematic review on the efficacy of angiotensin-converting enzyme inhibitors, from a study collecting various systematic review datasets from the medical sciences 15 . The collection is a subset of 2,544 publications from the TREC 2004 Genomics Track document corpus 48 . This is a static subset from all MEDLINE records from 1994 through 2003, which allows for replicability of results. Forty-one publications were included in the review (1.6%).

Performance metrics

We evaluated the four datasets using three performance metrics. We first assess the work saved over sampling (WSS), which is the percentage reduction in the number of records needed to screen achieved by using active learning instead of screening records at random; WSS is measured at a given level of recall of relevant records, for example 95%, indicating the work reduction in screening effort at the cost of failing to detect 5% of the relevant records. For some researchers it is essential that all relevant literature on the topic is retrieved; this entails that the recall should be 100% (that is, WSS@100%). We also propose the amount of relevant references found after having screened the first 10% of the records (RRF10%). This is a useful metric for getting a quick overview of the relevant literature.

For every dataset, 15 runs were performed with one random inclusion and one random exclusion (see Fig. 2 ). The classical review performance with randomly found inclusions is shown by the dashed line. The average work saved over sampling at 95% recall for ASReview is 83% and ranges from 67% to 92%. Hence, 95% of the eligible studies will be found after screening between only 8% to 33% of the studies. Furthermore, the number of relevant abstracts found after reading 10% of the abstracts ranges from 70% to 100%. In short, our software would have saved many hours of work.

figure 2

a – d , Results of the simulation study for the results for a study systematically review studies that performed viral metagenomic next-generation sequencing in common livestock ( a ), results for a systematic review of studies on fault prediction in software engineering ( b ), results for longitudinal studies that applied unsupervised machine learning techniques on longitudinal data of self-reported symptoms of posttraumatic stress assessed after trauma exposure ( c ), and results for a systematic review on the efficacy of angiotensin-converting enzyme inhibitors ( d ). Fiteen runs (shown with separate lines) were performed for every dataset, with only one random inclusion and one random exclusion. The classical review performances with randomly found inclusions are shown by the dashed lines.

Usability testing (user experience testing)

We conducted a series of user experience tests to learn from end users how they experience the software and implement it in their workflow. The study was approved by the Ethics Committee of the Faculty of Social and Behavioral Sciences of Utrecht University (ID 20-104).

Unstructured interviews

The first user experience (UX) test—carried out in December 2019—was conducted with an academic research team in a substantive research field (public administration and organizational science) that has conducted various systematic reviews and meta-analyses. It was composed of three university professors (ranging from assistant to full) and three PhD candidates. In one 3.5 h session, the participants used the software and provided feedback via unstructured interviews and group discussions. The goal was to provide feedback on installing the software and testing the performance on their own data. After these sessions we prioritized the feedback in a meeting with the ASReview team, which resulted in the release of v.0.4 and v.0.6. An overview of all releases can be found on GitHub 27 .

A second UX test was conducted with four experienced researchers developing medical guidelines based on classical systematic reviews, and two experienced reviewers working at a pharmaceutical non-profit organization who work on updating reviews with new data. In four sessions, held in February to March 2020, these users tested the software following our testing protocol. After each session we implemented the feedback provided by the experts and asked them to review the software again. The main feedback was about how to upload datasets and select prior papers. Their feedback resulted in the release of v.0.7 and v.0.9.

Systematic UX test

In May 2020 we conducted a systematic UX test. Two groups of users were distinguished: an unexperienced group and an experienced user who already used ASReview. Due to the COVID-19 lockdown the usability tests were conducted via video calling where one person gave instructions to the participant and one person observed, called human-moderated remote testing 49 . During the tests, one person (SH) asked the questions and helped the participant with the tasks, the other person observed and made notes, a user experience professional at the IT department of Utrecht University (MH).

To analyse the notes, thematic analysis was used, which is a method to analyse data by dividing the information in subjects that all have a different meaning 50 using the Nvivo 12 software 51 . When something went wrong the text was coded as showstopper, when something did not go smoothly the text was coded as doubtful, and when something went well the subject was coded as superb. The features the participants requested for future versions of the ASReview tool were discussed with the lead engineer of the ASReview team and were submitted to GitHub as issues or feature requests.

The answers to the quantitative questions can be found at the Open Science Framework 52 . The participants ( N  = 11) rated the tool with a grade of 7.9 (s.d. = 0.9) on a scale from one to ten (Table 2 ). The unexperienced users on average rated the tool with an 8.0 (s.d. = 1.1, N  = 6). The experienced user on average rated the tool with a 7.8 (s.d. = 0.9, N  = 5). The participants described the usability test with words such as helpful, accessible, fun, clear and obvious.

The UX tests resulted in the new release v0.10, v0.10.1 and the major release v0.11, which is a major revision of the graphical user interface. The documentation has been upgraded to make installing and launching ASReview more straightforward. We made setting up the project, selecting a dataset and finding past knowledge is more intuitive and flexible. We also added a project dashboard with information on your progress and advanced settings.

Continuous input via the open source community

Finally, the ASReview development team receives continuous feedback from the open science community about, among other things, the user experience. In every new release we implement features listed by our users. Recurring UX tests are performed to keep up with the needs of users and improve the value of the tool.

We designed a system to accelerate the step of screening titles and abstracts to help researchers conduct a systematic review or meta-analysis as efficiently and transparently as possible. Our system uses active learning to train a machine learning model that predicts relevance from texts using a limited number of labelled examples. The classifier, feature extraction technique, balance strategy and active learning query strategy are flexible. We provide an open source software implementation, ASReview with state-of-the-art systems across a wide range of real-world systematic reviewing applications. Based on our experiments, ASReview provides defaults on its parameters, which exhibited good performance on average across the applications we examined. However, we stress that in practical applications, these defaults should be carefully examined; for this purpose, the software provides a simulation mode to users. We encourage users and developers to perform further evaluation of the proposed approach in their application, and to take advantage of the open source nature of the project by contributing further developments.

Drawbacks of machine learning-based screening systems, including our own, remain. First, although the active learning step greatly reduces the number of manuscripts that must be screened, it also prevents a straightforward evaluation of the system’s error rates without further onerous labelling. Providing users with an accurate estimate of the system’s error rate in the application at hand is therefore a pressing open problem. Second, although, as argued above, the use of such systems is not limited in principle to reviewing, no empirical benchmarks of actual performance in these other situations yet exist to our knowledge. Third, machine learning-based screening systems automate the screening step only; although the screening step is time-consuming and a good target for automation, it is just one part of a much larger process, including the initial search, data extraction, coding for risk of bias, summarizing results and so on. Although some other works, similar to our own, have looked at (semi-)automating some of these steps in isolation 53 , 54 , to our knowledge the field is still far removed from an integrated system that would truly automate the review process while guaranteeing the quality of the produced evidence synthesis. Integrating the various tools that are currently under development to aid the systematic reviewing pipeline is therefore a worthwhile topic for future development.

Possible future research could also focus on the performance of identifying full text articles with different document length and domain-specific terminologies or even other types of text, such as newspaper articles and court cases. When the selection of past knowledge is not possible based on expert knowledge, alternative methods could be explored. For example, unsupervised learning or pseudolabelling algorithms could be used to improve training 55 , 56 . In addition, as the NLP community pushes forward the state of the art in feature extraction methods, these are easily added to our system as well. In all cases, performance benefits should be carefully evaluated using benchmarks for the task at hand. To this end, common benchmark challenges should be constructed that allow for an even comparison of the various tools now available. To facilitate such a benchmark, we have constructed a repository of publicly available systematic reviewing datasets 57 .

The future of systematic reviewing will be an interaction with machine learning algorithms to deal with the enormous increase of available text. We invite the community to contribute to open source projects such as our own, as well as to common benchmark challenges, so that we can provide measurable and reproducible improvement over current practice.

Data availability

The results described in this paper are available at the Open Science Framework ( https://doi.org/10.17605/OSF.IO/2JKD6 ) 43 . The answers to the quantitative questions of the UX test can be found at the Open Science Framework (OSF.IO/7PQNM) 52 .

Code availability

All code to reproduce the results described in this paper can be found on Zenodo ( https://doi.org/10.5281/zenodo.4024122 ) 42 . All code for the software ASReview is available under an Apache 2.0 license ( https://doi.org/10.5281/zenodo.3345592 ) 27 , is maintained on GitHub 63 and includes documentation ( https://doi.org/10.5281/zenodo.4287120 ) 28 .

Bornmann, L. & Mutz, R. Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J. Assoc. Inf. Sci. Technol. 66 , 2215–2222 (2015).

Article   Google Scholar  

Gough, D., Oliver, S. & Thomas, J. An Introduction to Systematic Reviews (Sage, 2017).

Cooper, H. Research Synthesis and Meta-analysis: A Step-by-Step Approach (SAGE Publications, 2015).

Liberati, A. et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J. Clin. Epidemiol. 62 , e1–e34 (2009).

Boaz, A. et al. Systematic Reviews: What have They Got to Offer Evidence Based Policy and Practice? (ESRC UK Centre for Evidence Based Policy and Practice London, 2002).

Oliver, S., Dickson, K. & Bangpan, M. Systematic Reviews: Making Them Policy Relevant. A Briefing for Policy Makers and Systematic Reviewers (UCL Institute of Education, 2015).

Petticrew, M. Systematic reviews from astronomy to zoology: myths and misconceptions. Brit. Med. J. 322 , 98–101 (2001).

Lefebvre, C., Manheimer, E. & Glanville, J. in Cochrane Handbook for Systematic Reviews of Interventions (eds. Higgins, J. P. & Green, S.) 95–150 (John Wiley & Sons, 2008); https://doi.org/10.1002/9780470712184.ch6 .

Sampson, M., Tetzlaff, J. & Urquhart, C. Precision of healthcare systematic review searches in a cross-sectional sample. Res. Synth. Methods 2 , 119–125 (2011).

Wang, Z., Nayfeh, T., Tetzlaff, J., O’Blenis, P. & Murad, M. H. Error rates of human reviewers during abstract screening in systematic reviews. PLoS ONE 15 , e0227742 (2020).

Marshall, I. J. & Wallace, B. C. Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. Syst. Rev. 8 , 163 (2019).

Harrison, H., Griffin, S. J., Kuhn, I. & Usher-Smith, J. A. Software tools to support title and abstract screening for systematic reviews in healthcare: an evaluation. BMC Med. Res. Methodol. 20 , 7 (2020).

O’Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M. & Ananiadou, S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst. Rev. 4 , 5 (2015).

Wallace, B. C., Trikalinos, T. A., Lau, J., Brodley, C. & Schmid, C. H. Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinf. 11 , 55 (2010).

Cohen, A. M., Hersh, W. R., Peterson, K. & Yen, P.-Y. Reducing workload in systematic review preparation using automated citation classification. J. Am. Med. Inform. Assoc. 13 , 206–219 (2006).

Kremer, J., Steenstrup Pedersen, K. & Igel, C. Active learning with support vector machines. WIREs Data Min. Knowl. Discov. 4 , 313–326 (2014).

Miwa, M., Thomas, J., O’Mara-Eves, A. & Ananiadou, S. Reducing systematic review workload through certainty-based screening. J. Biomed. Inform. 51 , 242–253 (2014).

Settles, B. Active Learning Literature Survey (Minds@UW, 2009); https://minds.wisconsin.edu/handle/1793/60660

Holzinger, A. Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Inform. 3 , 119–131 (2016).

Van de Schoot, R. & De Bruin, J. Researcher-in-the-loop for Systematic Reviewing of Text Databases (Zenodo, 2020); https://doi.org/10.5281/zenodo.4013207

Kim, D., Seo, D., Cho, S. & Kang, P. Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Inf. Sci. 477 , 15–29 (2019).

Nosek, B. A. et al. Promoting an open research culture. Science 348 , 1422–1425 (2015).

Kilicoglu, H., Demner-Fushman, D., Rindflesch, T. C., Wilczynski, N. L. & Haynes, R. B. Towards automatic recognition of scientifically rigorous clinical research evidence. J. Am. Med. Inform. Assoc. 16 , 25–31 (2009).

Gusenbauer, M. & Haddaway, N. R. Which academic search systems are suitable for systematic reviews or meta‐analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Res. Synth. Methods 11 , 181–217 (2020).

Borah, R., Brown, A. W., Capers, P. L. & Kaiser, K. A. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open 7 , e012545 (2017).

de Vries, H., Bekkers, V. & Tummers, L. Innovation in the Public Sector: a systematic review and future research agenda. Public Adm. 94 , 146–166 (2016).

Van de Schoot, R. et al. ASReview: Active Learning for Systematic Reviews (Zenodo, 2020); https://doi.org/10.5281/zenodo.3345592

De Bruin, J. et al. ASReview Software Documentation 0.14 (Zenodo, 2020); https://doi.org/10.5281/zenodo.4287120

ASReview PyPI Package (ASReview Core Development Team, 2020); https://pypi.org/project/asreview/

Docker container for ASReview (ASReview Core Development Team, 2020); https://hub.docker.com/r/asreview/asreview

Ferdinands, G. et al. Active Learning for Screening Prioritization in Systematic Reviews—A Simulation Study (OSF Preprints, 2020); https://doi.org/10.31219/osf.io/w6qbg

Fu, J. H. & Lee, S. L. Certainty-enhanced active learning for improving imbalanced data classification. In 2011 IEEE 11th International Conference on Data Mining Workshops 405–412 (IEEE, 2011).

Le, Q. V. & Mikolov, T. Distributed representations of sentences and documents. Preprint at https://arxiv.org/abs/1405.4053 (2014).

Ramos, J. Using TF–IDF to determine word relevance in document queries. In Proc. 1st Instructional Conference on Machine Learning Vol. 242, 133–142 (ICML, 2003).

Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12 , 2825–2830 (2011).

MathSciNet   MATH   Google Scholar  

Reimers, N. & Gurevych, I. Sentence-BERT: sentence embeddings using siamese BERT-networks Preprint at https://arxiv.org/abs/1908.10084 (2019).

Smith, V., Devane, D., Begley, C. M. & Clarke, M. Methodology in conducting a systematic review of systematic reviews of healthcare interventions. BMC Med. Res. Methodol. 11 , 15 (2011).

Wynants, L. et al. Prediction models for diagnosis and prognosis of COVID-19: systematic review and critical appraisal. Brit. Med. J . 369 , 1328 (2020).

Van de Schoot, R. et al. Extension for COVID-19 Related Datasets in ASReview (Zenodo, 2020). https://doi.org/10.5281/zenodo.3891420 .

Lu Wang, L. et al. CORD-19: The COVID-19 open research dataset. Preprint at https://arxiv.org/abs/2004.10706 (2020).

Fraser, N. & Kramer, B. Covid19_preprints (FigShare, 2020); https://doi.org/10.6084/m9.figshare.12033672.v18

Ferdinands, G., Schram, R., Van de Schoot, R. & De Bruin, J. Scripts for ‘ASReview: Open Source Software for Efficient and Transparent Active Learning for Systematic Reviews’ (Zenodo, 2020); https://doi.org/10.5281/zenodo.4024122

Ferdinands, G., Schram, R., van de Schoot, R. & de Bruin, J. Results for ‘ASReview: Open Source Software for Efficient and Transparent Active Learning for Systematic Reviews’ (OSF, 2020); https://doi.org/10.17605/OSF.IO/2JKD6

Kwok, K. T. T., Nieuwenhuijse, D. F., Phan, M. V. T. & Koopmans, M. P. G. Virus metagenomics in farm animals: a systematic review. Viruses 12 , 107 (2020).

Hall, T., Beecham, S., Bowes, D., Gray, D. & Counsell, S. A systematic literature review on fault prediction performance in software engineering. IEEE Trans. Softw. Eng. 38 , 1276–1304 (2012).

van de Schoot, R., Sijbrandij, M., Winter, S. D., Depaoli, S. & Vermunt, J. K. The GRoLTS-Checklist: guidelines for reporting on latent trajectory studies. Struct. Equ. Model. Multidiscip. J. 24 , 451–467 (2017).

Article   MathSciNet   Google Scholar  

van de Schoot, R. et al. Bayesian PTSD-trajectory analysis with informed priors based on a systematic literature search and expert elicitation. Multivar. Behav. Res. 53 , 267–291 (2018).

Cohen, A. M., Bhupatiraju, R. T. & Hersh, W. R. Feature generation, feature selection, classifiers, and conceptual drift for biomedical document triage. In Proc. 13th Text Retrieval Conference (TREC, 2004).

Vasalou, A., Ng, B. D., Wiemer-Hastings, P. & Oshlyansky, L. Human-moderated remote user testing: orotocols and applications. In 8th ERCIM Workshop, User Interfaces for All Vol. 19 (ERCIM, 2004).

Joffe, H. in Qualitative Research Methods in Mental Health and Psychotherapy: A Guide for Students and Practitioners (eds Harper, D. & Thompson, A. R.) Ch. 15 (Wiley, 2012).

NVivo v. 12 (QSR International Pty, 2019).

Hindriks, S., Huijts, M. & van de Schoot, R. Data for UX-test ASReview - June 2020. OSF https://doi.org/10.17605/OSF.IO/7PQNM (2020).

Marshall, I. J., Kuiper, J. & Wallace, B. C. RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. J. Am. Med. Inform. Assoc. 23 , 193–201 (2016).

Nallapati, R., Zhou, B., dos Santos, C. N., Gulcehre, Ç. & Xiang, B. Abstractive text summarization using sequence-to-sequence RNNs and beyond. In Proc. 20th SIGNLL Conference on Computational Natural Language Learning 280–290 (Association for Computational Linguistics, 2016).

Xie, Q., Dai, Z., Hovy, E., Luong, M.-T. & Le, Q. V. Unsupervised data augmentation for consistency training. Preprint at https://arxiv.org/abs/1904.12848 (2019).

Ratner, A. et al. Snorkel: rapid training data creation with weak supervision. VLDB J. 29 , 709–730 (2020).

Systematic Review Datasets (ASReview Core Development Team, 2020); https://github.com/asreview/systematic-review-datasets

Wallace, B. C., Small, K., Brodley, C. E., Lau, J. & Trikalinos, T. A. Deploying an interactive machine learning system in an evidence-based practice center: Abstrackr. In Proc. 2nd ACM SIGHIT International Health Informatics Symposium 819–824 (Association for Computing Machinery, 2012).

Cheng, S. H. et al. Using machine learning to advance synthesis and use of conservation and environmental evidence. Conserv. Biol. 32 , 762–764 (2018).

Yu, Z., Kraft, N. & Menzies, T. Finding better active learners for faster literature reviews. Empir. Softw. Eng . 23 , 3161–3186 (2018).

Ouzzani, M., Hammady, H., Fedorowicz, Z. & Elmagarmid, A. Rayyan—a web and mobile app for systematic reviews. Syst. Rev. 5 , 210 (2016).

Przybyła, P. et al. Prioritising references for systematic reviews with RobotAnalyst: a user study. Res. Synth. Methods 9 , 470–488 (2018).

ASReview: Active learning for Systematic Reviews (ASReview Core Development Team, 2020); https://github.com/asreview/asreview

Download references

Acknowledgements

We would like to thank the Utrecht University Library, focus area Applied Data Science, and departments of Information and Technology Services, Test and Quality Services, and Methodology and Statistics, for their support. We also want to thank all researchers who shared data, participated in our user experience tests or who gave us feedback on ASReview in other ways. Furthermore, we would like to thank the editors and reviewers for providing constructive feedback. This project was funded by the Innovation Fund for IT in Research Projects, Utrecht University, the Netherlands.

Author information

Authors and affiliations.

Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, the Netherlands

Rens van de Schoot, Gerbrich Ferdinands, Albert Harkema, Joukje Willemsen, Yongchao Ma, Qixiang Fang, Sybren Hindriks & Daniel L. Oberski

Department of Research and Data Management Services, Information Technology Services, Utrecht University, Utrecht, the Netherlands

Jonathan de Bruin, Raoul Schram, Parisa Zahedi & Maarten Hoogerwerf

Utrecht University Library, Utrecht University, Utrecht, the Netherlands

Jan de Boer, Felix Weijdema & Bianca Kramer

Department of Test and Quality Services, Information Technology Services, Utrecht University, Utrecht, the Netherlands

Martijn Huijts

School of Governance, Faculty of Law, Economics and Governance, Utrecht University, Utrecht, the Netherlands

Lars Tummers

Department of Biostatistics, Data management and Data Science, Julius Center, University Medical Center Utrecht, Utrecht, the Netherlands

Daniel L. Oberski

You can also search for this author in PubMed   Google Scholar

Contributions

R.v.d.S. and D.O. originally designed the project, with later input from L.T. J.d.Br. is the lead engineer, software architect and supervises the code base on GitHub. R.S. coded the algorithms and simulation studies. P.Z. coded the very first version of the software. J.d.Bo., F.W. and B.K. developed the systematic review pipeline. M.Huijts is leading the UX tests and was supported by S.H. M.Hoogerwerf developed the architecture of the produced (meta)data. G.F. conducted the simulation study together with R.S. A.H. performed the literature search comparing the different tools together with G.F. J.W. designed all the artwork and helped with formatting the manuscript. Y.M. and Q.F. are responsible for the preprocessing of the metadata under the supervision of J.d.Br. R.v.d.S, D.O. and L.T. wrote the paper with input from all authors. Each co-author has written parts of the manuscript.

Corresponding author

Correspondence to Rens van de Schoot .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Peer review information Nature Machine Intelligence thanks Jian Wu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information.

Overview of software tools supporting systematic reviews.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

van de Schoot, R., de Bruin, J., Schram, R. et al. An open source machine learning framework for efficient and transparent systematic reviews. Nat Mach Intell 3 , 125–133 (2021). https://doi.org/10.1038/s42256-020-00287-7

Download citation

Received : 04 June 2020

Accepted : 17 December 2020

Published : 01 February 2021

Issue Date : February 2021

DOI : https://doi.org/10.1038/s42256-020-00287-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

automation tools for literature review

  • Open access
  • Published: 11 July 2019

Toward systematic review automation: a practical guide to using machine learning tools in research synthesis

  • Iain J. Marshall   ORCID: orcid.org/0000-0003-2594-2654 1 &
  • Byron C. Wallace 2  

Systematic Reviews volume  8 , Article number:  163 ( 2019 ) Cite this article

47k Accesses

236 Citations

106 Altmetric

Metrics details

Technologies and methods to speed up the production of systematic reviews by reducing the manual labour involved have recently emerged. Automation has been proposed or used to expedite most steps of the systematic review process, including search, screening, and data extraction. However, how these technologies work in practice and when (and when not) to use them is often not clear to practitioners. In this practical guide, we provide an overview of current machine learning methods that have been proposed to expedite evidence synthesis. We also offer guidance on which of these are ready for use, their strengths and weaknesses, and how a systematic review team might go about using them in practice.

Peer Review reports

Evidence-based medicine (EBM) is predicated on the idea of harnessing the entirety of the available evidence to inform patient care. Unfortunately, this is a challenging aim to realize in practice, for a few reasons. First, relevant evidence is primarily disseminated in unstructured, natural language articles describing the conduct and results of clinical trials. Second, the set of such articles is already massive and continues to expand rapidly [ 1 ].

A now outdated estimate from 1999 suggests that conducting a single review requires in excess of 1000 h of (highly skilled) manual labour [ 2 ]. More recent work estimates that conducting a review currently takes, on average, 67 weeks from registration to publication [ 3 ]. Clearly, existing processes are not sustainable: reviews of current evidence cannot be [ 4 ]produced efficiently and in any case often go out of date quickly once they are published . The fundamental problem is that current EBM methods, while rigorous, simply do not scale to meet the demands imposed by the voluminous scale of the (unstructured) evidence base. This problem has been discussed at length elsewhere [ 5 , 6 , 7 , 8 ].

Research on methods for semi-automating systematic reviews via machine learning and natural language processing now constitutes its own (small) subfield, with an accompanying body of work. In this survey, we aim to provide a gentle introduction to automation technologies for the non-computer scientist. We describe the current state of the science and provide practical guidance on which methods we believe are ready for use. We also discuss how a systematic review team might go about using them, and the strengths and limitations of each. We do not attempt an exhaustive review of research in this burgeoning field. Perhaps unsurprisingly, multiple systematic reviews of such efforts already exist [ 9 , 10 ].

Instead, we identified machine learning systems that are available for use in practice at the time of writing, through manual screening of records in SR Toolbox Footnote 1 on January 3, 2019, to identify all systematic review tools which incorporated machine learning [ 11 ]. SR Toolbox is a publicly available online catalogue of software tools to aid systematic review production and is regularly updated via regular literature surveillance plus direct submissions from tool developers and via social media. We have not described machine learning methods from academic papers unless a system to enact them has been made available; we likewise have not described (the very large number of) software tools for facilitating systematic reviews unless they make use of machine learning.

Box 1 Glossary of terms used in systematic review automation

Machine learning: computer algorithms which ‘learn’ to perform a specific task through statistical modelling of (typically large amounts of) data

Natural language processing: computational methods for automatically processing and analysing ‘natural’ (i.e. human) language texts

Text classification: automated categorization of documents into groups of interest

Data extraction: the task of identifying key bits of structured information from texts

Crowd-sourcing: decomposing work into to be performed by distributed workers

Micro-tasks: discrete units of work that together complete a larger undertaking

Semi-automation: using machine learning to tasks, rather than complete them

Human-in-the-loop: workflows in which humans remain involved, rather than being replaced

Supervised learning: estimating model parameters using manually labelled data

Distantly supervised: learning from pseudo, noisy ‘labels’ derived automatically by applying rules to existing databases or other structured data

Unsupervised: learning without any labels (e.g. clustering data)

Machine learning and natural language processing methods: an introduction

Text classification and data extraction: the key tasks for reviewers.

The core natural language processing (NLP) technologies used in systematic reviews are text classification and data extraction . Text classification concerns models that can automatically sort documents (here, article abstracts, full texts, or pieces of text within these) into predefined categories of interest (e.g. report of RCT vs. not ). Data extraction models attempt to identify snippets of text or individual words/numbers that correspond to a particular variable of interest (e.g. extracting the number of people randomized from a clinical trial report).

The most prominent example of text classification in the review pipeline is abstract screening: determining whether individual articles within a candidate set meet the inclusion criteria for a particular review on the basis of their abstracts (and later full texts). In practice, many machine learning systems can additionally estimate a probability that a document should be included (rather than a binary include/exclude decision). These probabilities can be used to automatically rank documents from most to least relevant, thus potentially allowing the human reviewer to identify the studies to include much earlier in the screening process.

Following the screening, reviewers extract the data elements that are relevant to their review. These are naturally viewed as individual data extraction tasks. Data of interest may include numerical data such as study sample sizes and odds ratios, as well as textual data, e.g. snippets of text describing the study randomization procedure or the study population.

Risk of bias assessment is interesting in that it entails both a data extraction task (identifying snippets of text in the article as relevant for bias assessment) and a final classification of an article as being at high or low risk for each type of bias assessed [ 12 ].

State-of-the-art methods for both text classification and data extraction use machine learning (ML) techniques, rather than, e.g. rule-based methods. In ML, one writes programs that specify parameterized models to perform particular tasks; these parameters are then estimated using (ideally large) datasets. In practice, ML methods resemble statistical models used in epidemiological research (e.g. logistic regression is a common method in both disciplines).

We show a simple example of how machine learning could be used to automate the classification of articles as being RCTs or not in Fig. 1 . First, a training set of documents is obtained. This set will be manually labelled for the variable of interest (e.g. as an ‘included study’ or ‘excluded study’).

figure 1

Classifying text using machine learning, in this example logistic regression with a ‘bag of words’ representation of the texts. The system is ‘trained’, learning a coefficient (or weight) for each unique word in a manually labelled set of documents (typically in the 1000s). In use, the learned coefficients are used to predict a probability for an unknown document

Next, documents are vectorized , i.e. transformed into high-dimensional points that are represented by sequences of numbers. A simple, common representation is known as a bag of words (see Fig. 2 ). In this approach, a matrix is constructed in which rows are documents and each column corresponds to a unique word. Documents may then be represented in rows by 1’s and 0’s, indicating the presence or absence of each word, respectively. Footnote 2 The resultant matrix will be sparse (i.e. consist mostly of 0’s and relatively few 1’s), as any individual document will contain a small fraction of the full vocabulary. Footnote 3

figure 2

Bag of words modelling for classifying RCTs. Top left: Example of bag of words for three articles. Each column represents a unique word in the corpus (a real example would likely contain columns for 10,000s of words). Top right: Document labels, where 1 = relevant and 0 = irrelevant. Bottom: Coefficients (or weights) are estimated for each word (in this example using logistic regression). In this example, high +ve weights will increase the predicted probability that an unseen article is an RCT where it contains the words ‘random’ or ‘randomized’. The presence of the word ‘systematic’ (with a large negative weight) would reduce the predicted probability that an unseen document is an RCT

Next, weights (or coefficients) for each word are ‘learned’ (estimated) from the training set. Intuitively for this task, we want to learn which words make a document more, or less, likely to be an RCT. Words which lower the likelihood of being an RCT should have negative weights; those which increase the likelihood (such as ‘random’ or ‘randomly’) should have positive weights. In our running example, the model coefficients correspond to the parameters of a logistic regression model. These are typically estimated (‘learned’) via gradient descent-based methods.

Once the coefficients are learned, they can easily be applied to a new, unlabelled document to predict the label. The new document is vectorized in an identical way to the training documents. The document vector is then multiplied Footnote 4 by the previously learned coefficients, and transformed to a probability via the sigmoid function.

Many state-of-the-art systems use more complex models than logistic regression (and in particular more sophisticated methods for representing documents [ 13 ], obtaining coefficients [ 14 ], or both [ 15 ]). Neural network-based approaches in particular have re-emerged as the dominant model class. Such models are composed of multiple layers , each with its own set of parameters. We do not describe these methods in detail here, Footnote 5 but the general principle is the same: patterns are learned from numerical representations of documents with known labels, and then, these patterns can be applied to new documents to predict the label. In general, these more complex methods achieve (often modest) improvements in predictive accuracy compared with logistic regression, at the expense of computational and methodological complexity.

Methods for automating (or semi-automating) data extraction have been well explored, but for practical use remain less mature than automated screening technologies. Such systems typically operate over either abstracts or full-text articles and aim to extract a defined set of variables from the document.

At its most basic, data extraction can be seen as a type of text classification problem, in which individual words (known as tokens) are classified as relevant or not within a document. Rather than translating the full document into a vector, a data extraction system might encode the word itself, plus additional contextual information (for example, nearby surrounding words and position in the document).

Given such a vector representation of the word at position t in document x (notated as x t ), an extraction system should output a label that indicates whether or not this word belongs to a data type of interest (i.e. something to be extracted). For example, we may want to extract study sample sizes. Doing so may entail converting numbers written in English to numerals and then labelling (or ‘tagging’) all numbers on the basis of feature vectors that encode properties that might be useful for making this prediction (e.g. the value of the number, words that precede and follow it, and so on). This is depicted in Fig. 3 . Here, the ‘target’ token (‘100’) is labelled as 1, and others as 0.

figure 3

Schematic of a typical data extraction process. The above illustration concerns the example task of extracting the study sample size. In general, these tasks involve labelling individual words. The word (or ‘token’) at position t is represented by a vector. This representation may encode which word is at this position and likely also communicates additional features, e.g. whether the word is capitalized or if the word is (inferred to be) a noun. Models for these kinds of tasks attempt to assign labels all T words in a document and for some tasks will attempt to maximize the joint likelihood of these labels to capitalize on correlations between adjacent labels

Such a token by token classification approach often fails to capitalize on the inherently structured nature of language and documents. For example, consider a model for extracting snippets of text that describe the study population, intervention/comparators, and outcomes (i.e. PICO elements), respectively. Labelling words independently of one another would fail to take into account the observation that adjacent words will have a tendency to share designations: if the word at position t is part of a description of the study population, that substantially raises the odds that the word at position t + 1 is as well.

In ML nomenclature, this is referred to as a structured classification problem. More specifically, assigning the words in a text to categories is an instance of sequence tagging . Many models for problems with this structure have been developed. The conditional random field (CRF) is amongst the most prominent of these [ 18 ]. Current state-of-the-art models are based on neural networks, and specifically recurrent neural networks, or RNNs. Long short-term memory networks (LSTMs) [ 19 ] combined with CRFs (LSTM-CRFs) [ 19 , 20 , 21 ] have in particular shown compelling performance on such tasks generally, for extraction of data from RCTs specifically [ 22 , 23 ].

Machine learning tools available for use in practice

The rapidly expanding biomedical literature has made search an appealing target for automation. Two key areas have been investigated to date: filtering articles by study design and automatically finding relevant articles by topic. Text classification systems for identifying RCTs are the most mature, and we regard them as ready for use in practice. Machine learning for identifying RCTs has already been deployed in Cochrane; Cochrane authors may access this technology via the Cochrane Register of Studies [ 24 ]. Footnote 6

Two validated systems are freely available for general use [ 16 , 25 ]. Cohen and colleagues have released RCT tagger, Footnote 7 a system which estimates the probability that PubMed articles are RCTs [ 25 ]. The team validated the performance on a withheld portion of the same dataset, finding the system discriminated accurately between RCTs and non-RCTs (area under the receiver operating characteristics curve (AUROC) = 0.973). A search portal is available freely at their website, which allows the user to select a confidence threshold for their search.

Our own team has produced RobotSearch Footnote 8 , which aims to replace keyword-based study filtering. The system uses neural networks and support vector machines, and was trained on a large set of articles with crowd-sourced labels by Cochrane Crowd [ 16 ]. The system was validated on and achieved state-of-the-art discriminative performance (AUROC = 0.987), reducing the number of irrelevant articles retrieved by roughly half compared with the keyword-based Cochrane Highly Sensitive Search Strategy, without losing any additional RCTs. The system may be freely used by uploading an RIS file to our website; a filtered file containing only the RCTs is then returned.

Study design classification is appealing for machine learning because it is a single, generalizable task: filtering RCTs is common across many systematic reviews. However, finding articles which meet other topic-specific inclusion criteria is review-specific and thus much more difficult—consider that it is unlikely that a systematic review with identical inclusion criteria would have been performed before, and even where it has been, it might yield up to several dozen articles to use a training data, compared with the thousands needed in a typical machine learning system. We discuss how a small set of relevant articles (typically obtained through screening a proportion of abstracts retrieved by a particular search) can seed a machine learning system to identify other relevant articles below.

A further application of machine learning in search is as a method for producing a semantic search engine, i.e. one in which the user can search by concept rather than by keyword. Such a system is akin to searching PubMed by MeSH terms (index terms from a standardized vocabulary, which have traditionally been applied manually by PubMed staff). However, such a manual approach has the obvious drawback of requiring extensive and ongoing manual annotation effort, especially in light of the exponentially increasing volume of articles to index. Even putting costs aside, manual annotation delays the indexing process, meaning the most recent articles may not be retrievable. Thalia is a machine learning system (based on CRFs, reviewed above) that automatically indexes new PubMed articles daily for chemicals, diseases, drugs, genes, metabolites, proteins, species, and anatomical entities. This allows the indexes to be updated daily and provides a user interface to interact with the concepts identified [ 26 ].

Indeed, as of October 2018, PubMed itself has adopted a hybrid approach, where some articles are assigned MeSH terms automatically using their Medical Text Indexer (MTI) system [ 27 ], which uses a combination of machine learning and manually crafted rules to assign terms without human intervention [ 28 ].

Machine learning systems for abstract screening have reached maturity; several such systems with high levels of accuracy are available for reviewers to use. In all of the available systems, human reviewers first need to screen a set of abstracts and then review the system recommendations. Such systems are thus semi-automatic, i.e. keep humans ‘in-the-loop’. We show a typical workflow in Fig. 4 .

figure 4

Typical workflow for semi-automated abstract screening. The asterisk indicates that with uncertainty sampling, the articles which are predicted with least certainty are presented first. This aims to improve the model accuracy more efficiently

After conducting a conventional search, retrieved abstracts are uploaded into the system (e.g. using the common RIS citation format). Next, a human reviewer manually screens a sample (often random) of the retrieved set. This continues until a ‘sufficient’ number of relevant articles have been identified such that a text classifier can be trained. (Exactly how many positive examples will suffice to achieve good predictive performance is an empirical question, but a conservative heuristic is about half of the retrieved set.) The system uses this classifier to predict the relevance of all unscreened abstracts, and these are reordered by rank. The human reviewer is hence presented with the most relevant articles first. This cycle then continues, with the documents being repeatedly re-ranked as additional abstracts are screened manually, until the human reviewer is satisfied that no further relevant articles are being screened.

This is a variant of active learning (AL) [ 29 ]. In AL approaches, the model selects which instances are to be labelled next, with the aim of maximizing predictive performance with minimal human supervision. Here, we have outlined a certainty-based AL criterion, in which the model prioritizes for labelling citations that it believes to be relevant (under its current model parameters). This AL approach is appropriate for the systematic review scenario, in light of the relatively small number of relevant abstracts that will exist in a given set under consideration. However a more standard, general approach is uncertainty sampling , wherein the model asks the human to label instances it is least certain about.

The key limitation of automated abstract screening is that it is not clear at which point it is ‘safe’ for the reviewer to stop manual screening. Moreover, this point will vary across reviews. Screening systems tend to rank articles by the likelihood of relevance, rather than simply providing definitive, dichotomized classifications. However, even low ranking articles have some non-zero probability of being relevant, and there remains the possibility of missing a relevant article by stopping too early. (It is worth noting that all citations not retrieved via whatever initial search strategy is used to retrieve the candidate pool of articles implicitly assign zero probability to all other abstracts; this strong and arguably unwarranted assumption is often overlooked.) Empirical studies have found the optimal stopping point can vary substantially between different reviews; unfortunately, the optimal stopping point can only be determined definitively in retrospect once all abstracts have been screened. Currently available systems include Abstrackr [ 30 ], SWIFT-Review, Footnote 9 EPPI reviewer [ 31 ], and RobotAnalyst [ 32 ] (see Table 1 ).

Data extraction

There have now been many applications of data extraction to support systematic reviews; for a relatively recent survey of these, see [ 9 ]. Yet despite advances, extraction technologies remain in formative stages and are not readily accessible by practitioners. For systematic reviews of RCTs, there exist only a few prototype platforms that make such technologies available (ExaCT [ 33 ] and RobotReviewer [ 12 , 34 , 35 ] being among these). For systematic reviews in the basic sciences, the UK National Centre for Text Mining (NaCTeM) has created a number of systems which use structured models to automatically extract concepts including genes and proteins, yeasts, and anatomical entities [ 36 ], amongst other ML-based text mining tools. Footnote 10

ExaCT and RobotReviewer function in a similar way. The systems are trained on full-text articles, with sentences being manually labelled Footnote 11 as being relevant (or not) to the characteristics of the studies. In practice, both systems over-retrieve candidate sentences (e.g. ExaCT retrieves the five sentences predicted most likely, when the relevant information will generally reside in only one of them). The purpose of this behaviour is to maximize the likelihood that at least one of the sentences will be relevant. Thus, in practice, both systems would likely be used semi-automatically by a human reviewer. The reviewer would read the candidate sentences, choose those which were relevant, or consult the full-text paper where no relevant text was identified.

ExaCT uses RCT reports in HTML format and is designed to retrieve 21 characteristics relating to study design and reporting based on the CONSORT criteria. ExaCT additionally contains a set of rules to identify the words or phrase within a sentence which describe the characteristic of interest. In their evaluation, the ExaCT team found their system had very high recall (72% to 100% for the different variables collected) when the 5 most likely sentences were retrieved.

RobotReviewer takes RCT reports in PDF format and automatically retrieves sentences which describe the PICO (the population, intervention, comparator, and outcomes), and also text describing trial conduct relevant to biases (including the adequacy of the random sequence generation, the allocation concealment, and blinding, using the domains from the Cochrane Risk of Bias tool). RobotReviewer additionally classifies the article as being as to whether it is at ‘low’ risk of bias or not for each bias domain.

Validation studies of RobotReviewer have found that the article bias classifications (i.e. ‘low’ versus ‘high/unclear’ risk of bias) are reasonable but less accurate than those in published Cochrane reviews [ 12 , 15 ]. However, the sentences identified were found to be similarly relevant to bias decisions as those in Cochrane reviews [ 12 ]. We therefore recommend that the system is used with manual input; that the output is treated as a suggestion rather than the final bias assessment. A webtool is available which highlights the text describing biases, and suggests a bias decision aiming to expedite the process compared with fully manual bias assessment.

One obstacle to better models for data extraction has been a dearth of training data for the task. Recall from above the ML systems rely on manual labels to estimate model parameters. Obtaining labels on individual words within documents to train extraction models is an expensive exercise. EXaCT, for example, was trained on a small set (132 total) of full-text articles. RobotReviewer was trained using a much larger dataset, but the ‘labels’ were induced semi-automatically, using a strategy known as ‘distant supervision’ [ 35 ]. This means the annotations used for training were imperfect, thus introducing noise to the model. Recently, Nye et al. released the EBM-NLP dataset [ 23 ], which comprises ~ 5000 abstracts of RCT reports manually annotated in detail. This may provide training data helpful for moving automated extraction models forward.

Although software tools that support the data synthesis component of reviews have long existed (especially for performing meta-analysis), methods for automating this are beyond the capabilities of currently available ML and NLP tools. Nonetheless, research into these areas continues rapidly, and computational methods may allow new forms of synthesis unachievable manually, particularly around visualization [ 37 , 38 ] and automatic summarization [ 39 , 40 ] of large volumes of research evidence.

Conclusions

The torrential volume of unstructured published evidence has rendered existing (rigorous, but manual) approaches to evidence synthesis increasingly costly and impractical. Consequently, researchers have developed methods that aim to semi-automate different steps of the evidence synthesis pipeline via machine learning. This remains an important research direction and has the potential to dramatically reduce the time required to produce standard evidence synthesis products.

At the time of writing, research into machine learning for systematic reviews has begun to mature, but many barriers to its practical use remain. Systematic reviews require very high accuracy in their methods, which may be difficult for automation to attain. Yet accuracy is not the only barrier to full automation. In areas with a degree of subjectivity (e.g. determining whether a trial is at risk of bias), readers are more likely to be reassured by the subjective but considered opinion of an expert human versus a machine. For these reasons, full automation remains a distant goal at present. The majority of the tools we present are designed as ‘human-in-the-loop’ systems: Their user interfaces allowing human reviewers to have the final say.

Most of the tools we encountered were written by academic groups involved in research into evidence synthesis and machine learning. Very often, these groups have produced prototype software to demonstrate a method. However, such prototypes do not age well: we commonly encountered broken web links, difficult to understand and slow user interfaces, and server errors.

For the research field, moving from the research prototypes currently available (e.g. RobotReviewer, ExaCT) to professionally maintained platforms remains an important problem to overcome. In our own experience as an academic team in this area, the resources needed for maintaining professional grade software (including bug fixes, server maintenance, and providing technical support) are difficult to obtain from fixed term academic grant funding, and the lifespan of software is typically many times longer than a grant funding period. Yet commercial software companies are unlikely to dedicate their own resources to adopting these machine learning methods unless there was a substantial demand from users.

Nonetheless, for the pioneering systematic review team, many of the methods described can be used now. Users should expect to remain fully involved in each step of the review and to deal with some rough edges of the software. Searching technologies that expedite retrieval of relevant articles (e.g. by screening out non-RCTs) are the most fully realized of the ML models reviewed here and are more accurate than conventional search filters. Tools for screening are accessible via usable software platforms (Abstrackr, RobotAnalyst, and EPPI reviewer) and could safely be used now as a second screener [ 31 ] or to prioritize abstracts for manual review. Data extraction tools are designed to assist the manual process, e.g. drawing the user’s attention to relevant text or making suggestions to the user that they may validate, or change if needed. Piloting of some of these technologies by early adopters (with appropriate methodological caution) is likely the key next step toward gaining acceptance by the community.

Availability of data and materials

Not applicable.

http://systematicreviewtools.com/

Variants of this approach include using word counts (i.e. the presence of the word ‘trial’ three times in a document would result in a number 3 in the associated column) or affording greater weight to more discriminative words (known as term frequency–inverse document frequency, or tf-idf)

We note that while they remain relatively common, bag of words representations have been largely supplanted by dense ‘embeddings’ learned by neural networks.

This is a dot product.

We refer the interested reader to our brief overview of these methods [ 16 ] for classification and to Bishop [ 17 ] for a comprehensive, technical take.

http://crsweb.cochrane.org

http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/RCT_Tagger.cgi

https://robotsearch.vortext.systems/

https://www.sciome.com/swift-review/

http://www.nactem.ac.uk/

More precisely, RobotReviewer generated labels that comprised our training data algorithmically.

Bastian H, Glasziou P, Chalmers I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med. 2010;7:e1000326.

Article   Google Scholar  

Allen IE, Olkin I. Estimating time to conduct a meta-analysis from number of citations retrieved. JAMA. 1999;282:634–5.

Article   CAS   Google Scholar  

Borah R, Brown AW, Capers PL, Kaiser KA. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open. 2017;7:e012545.

Johnston E. How quickly do systematic reviews go out of date? A survival analysis. J Emerg Med. 2008;34:231.

Tsafnat G, Dunn A, Glasziou P, Coiera E. The automation of systematic reviews. BMJ. 2013;346:–f139.

O’Connor AM, Tsafnat G, Gilbert SB, Thayer KA, Wolfe MS. Moving toward the automation of the systematic review process: a summary of discussions at the second meeting of International Collaboration for the Automation of Systematic Reviews (ICASR). Syst Rev. 2018;7:3.

Thomas J, Noel-Storr A, Marshall I, Wallace B, McDonald S, Mavergames C, et al. Living systematic reviews: 2. Combining human and machine effort. J Clin Epidemiol. 2017;91:31–7.

Wallace BC, Dahabreh IJ, Schmid CH, Lau J, Trikalinos TA. Modernizing evidence synthesis for evidence-based medicine. Clinical Decision Support; 2014. p. 339–61.

Book   Google Scholar  

Jonnalagadda SR, Goyal P, Huffman MD. Automating data extraction in systematic reviews: a systematic review. Syst Rev. 2015;4:78.

O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev. 2015;4:5.

Marshall C, Brereton P. Systematic review toolbox: a catalogue of tools to support systematic reviews. In: Proceedings of the 19th International Conference on Evaluation and Assessment in Software Engineering: ACM; 2015. p. 23.

Marshall IJ, Kuiper J, Wallace BC. RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. J Am Med Inform Assoc. 2016;23:193–201.

Goldberg Y, Levy O. word2vec explained: deriving Mikolov et al.’s negative-sampling word-embedding method; 2014. p. 1–5.

Google Scholar  

Joachims T. Text categorization with support vector machines: learning with many relevant features. In: Nédellec C, Rouveirol C, editors. Machine learning: ECML-98. Berlin, Heidelberg: Springer Berlin Heidelberg; 1998.

Zhang Y, Marshall I, Wallace BC. Rationale-augmented convolutional neural networks for text classification. Proc Conf Empir Methods Nat Lang Process. 2016;2016:795–804.

PubMed   PubMed Central   Google Scholar  

Marshall IJ, Noel-Storr A, Kuiper J, Thomas J, Wallace BC. Machine learning for identifying randomized controlled trials: an evaluation and practitioner’s guide. Res Synth Methods. 2018; Available from: https://doi.org/10.1002/jrsm.1287 .

Bishop CM. Pattern recognition and machine learning. Springer New York; 2016.

Sutton C, McCallum A. An introduction to conditional random fields: Now Pub; 2012.

Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9:1735–80.

Ma X, Hovy E. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016. Available from: http://dx.doi.org/10.18653/v1/p16-1101

Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural architectures for named entity recognition. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016. Available from: http://dx.doi.org/10.18653/v1/n16-1030

Patel R, Yang Y, Marshall I, Nenkova A, Wallace BC. Syntactic patterns improve information extraction for medical search. Proc Conf. 2018;2018:371–7.

PubMed   Google Scholar  

Nye B, Jessy Li J, Patel R, Yang Y, Marshall IJ, Nenkova A, et al. A corpus with multi-level annotations of patients, interventions and outcomes to support language processing for medical literature. Proc Conf Assoc Comput Linguist Meet. 2018;2018:197–207.

Wallace BC, Noel-Storr A, Marshall IJ, Cohen AM, Smalheiser NR, Thomas J. Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach. J Am Med Inform Assoc. 2017;24:1165–8.

Cohen AM, Smalheiser NR, McDonagh MS, Yu C, Adams CE, Davis JM, et al. Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine. J Am Med Inform Assoc. 2015;22:707–17.

Soto, Axel J, Przybyła P, Ananiadou S. “Thalia: Semantic Search Engine for Biomedical Abstracts.” Bioinformatics. 2019;35(10):1799-1801.

Incorporating Values for Indexing Method in MEDLINE/PubMed XML. NLM Technical Bulletin. U.S. National Library of Medicine; 2018 [cited 2019 Jan 18]; Available from: https://www.nlm.nih.gov/pubs/techbull/ja18/ja18_indexing_method.html

Mork J, Aronson A, Demner-Fushman D. 12 years on - is the NLM medical text indexer still useful and relevant? J Biomed Semantics. 2017;8:8.

Settles B. Active learning. Synthesis lectures on artificial intelligence and machine learning. 2012;6:1–114.

Wallace BC, Small K, Brodley CE, Lau J, Trikalinos TA. Deploying an interactive machine learning system in an evidence-based practice center: Abstrackr. Proceedings of the 2Nd ACM SIGHIT International Health Informatics Symposium. New York: ACM; 2012. p. 819–24.

Shemilt I, Khan N, Park S, Thomas J. Use of cost-effectiveness analysis to compare the efficiency of study identification methods in systematic reviews. Syst Rev. 2016;5:140.

Przybyła P, Brockmeier AJ, Kontonatsios G, Le Pogam M-A, McNaught J, von Elm E, et al. Prioritising references for systematic reviews with RobotAnalyst: a user study. Res Synth Methods. 2018;9:470–88.

Kiritchenko S, de Bruijn B, Carini S, Martin J, Sim I. ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC Med Inform Decis Mak. 2010;10:56.

Marshall IJ, Kuiper J, Banner E, Wallace BC. Automating biomedical evidence synthesis: RobotReviewer. Proc Conf Assoc Comput Linguist Meet. 2017;2017:7–12.

Wallace BC, Kuiper J, Sharma A, Zhu MB, Marshall IJ. Extracting PICO sentences from clinical trial reports using supervised distant supervision. J Mach Learn Res. 2016;17:1–25.

Pyysalo S, Ananiadou S. Anatomical entity mention recognition at literature scale. Bioinformatics. 2014;30:868–75.

Mo Y, Kontonatsios G, Ananiadou S. Supporting systematic reviews using LDA-based document representations. Syst Rev. 2015;4:172.

Mu T, Goulermas YJ, Ananiadou S. Data visualization with structural control of global cohort and local data neighborhoods. IEEE Trans Pattern Anal Mach Intell. 2017; Available from: http://dx.doi.org/10.1109/TPAMI.2017.2715806

Sarker A, Mollá D, Paris C. Query-oriented evidence extraction to support evidence-based medicine practice. J Biomed Inform. 2016;59:169–84.

Mollá D, Santiago-Martínez ME. Creation of a corpus for evidence based medicine summarisation. Australas Med J. 2012;5:503–6.

Download references

UK Medical Research Council (MRC), through its Skills Development Fellowship program, grant MR/N015185/1 (IJM); National Library of Medicine, grant: R01-LM012086-01A1 (both IJM and BCW).

Author information

Authors and affiliations.

School of Population Health & Environmental Sciences, Faculty of Life Sciences and Medicine, King’s College London, 3rd Floor, Addison House, Guy’s Campus, London, SE1 1UL, UK

Iain J. Marshall

Khoury College of Computer Sciences, Northeastern University, 202 WVH, 360 Huntington Avenue, Boston, MA, 02115, USA

Byron C. Wallace

You can also search for this author in PubMed   Google Scholar

Contributions

The authors contributed equally to the conception and writing of the manuscript. All authors read and approved the final manuscript.

Authors’ information

Iain Marshall is a Clinical Academic Fellow in the School of Population Health & Environmental Sciences, Faculty of Life Sciences and Medicine, King’s College London, 3rd Floor, Addison House, Guy's Campus, London SE1 1UL. Email: [email protected]

Byron Wallace is faculty in the College of Computer and Information Science, Northeastern University, 440 Huntington Ave #202, Boston, MA 02115. Email: [email protected]

Corresponding author

Correspondence to Iain J. Marshall .

Ethics declarations

Ethics approval and consent to participate, consent for publication.

We consent.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Marshall, I.J., Wallace, B.C. Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. Syst Rev 8 , 163 (2019). https://doi.org/10.1186/s13643-019-1074-9

Download citation

Received : 18 January 2019

Accepted : 24 June 2019

Published : 11 July 2019

DOI : https://doi.org/10.1186/s13643-019-1074-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Machine learning
  • Natural language processing
  • Evidence synthesis

Systematic Reviews

ISSN: 2046-4053

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

automation tools for literature review

FSTA Logo

Start your free trial

Arrange a trial for your organisation and discover why FSTA is the leading database for reliable research on the sciences of food and health.

REQUEST A FREE TRIAL

  • Research Skills Blog

5 software tools to support your systematic review processes

By Dr. Mina Kalantar on 19-Jan-2021 13:01:01

4 software tools to support your systematic review processes | IFIS Publishing

Systematic reviews are a reassessment of scholarly literature to facilitate decision making. This methodical approach of re-evaluating evidence was initially applied in healthcare, to set policies, create guidelines and answer medical questions.

Systematic reviews are large, complex projects and, depending on the purpose, they can be quite expensive to conduct. A team of researchers, data analysts and experts from various fields may collaborate to review and examine incredibly large numbers of research articles for evidence synthesis. Depending on the spectrum, systematic reviews often take at least 6 months, and sometimes upwards of 18 months to complete.

The main principles of transparency and reproducibility require a pragmatic approach in the organisation of the required research activities and detailed documentation of the outcomes. As a result, many software tools have been developed to help researchers with some of the tedious tasks required as part of the systematic review process.

hbspt.cta._relativeUrls=true;hbspt.cta.load(97439, 'ccc20645-09e2-4098-838f-091ed1bf1f4e', {"useNewLoader":"true","region":"na1"});

The first generation of these software tools were produced to accommodate and manage collaborations, but gradually developed to help with screening literature and reporting outcomes. Some of these software packages were initially designed for medical and healthcare studies and have specific protocols and customised steps integrated for various types of systematic reviews. However, some are designed for general processing, and by extending the application of the systematic review approach to other fields, they are being increasingly adopted and used in software engineering, health-related nutrition, agriculture, environmental science, social sciences and education.

Software tools

There are various free and subscription-based tools to help with conducting a systematic review. Many of these tools are designed to assist with the key stages of the process, including title and abstract screening, data synthesis, and critical appraisal. Some are designed to facilitate the entire process of review, including protocol development, reporting of the outcomes and help with fast project completion.

As time goes on, more functions are being integrated into such software tools. Technological advancement has allowed for more sophisticated and user-friendly features, including visual graphics for pattern recognition and linking multiple concepts. The idea is to digitalise the cumbersome parts of the process to increase efficiency, thus allowing researchers to focus their time and efforts on assessing the rigorousness and robustness of the research articles.

This article introduces commonly used systematic review tools that are relevant to food research and related disciplines, which can be used in a similar context to the process in healthcare disciplines.

These reviews are based on IFIS' internal research, thus are unbiased and not affiliated with the companies.

ross-sneddon-sWlDOWk0Jp8-unsplash-1-2

This online platform is a core component of the Cochrane toolkit, supporting parts of the systematic review process, including title/abstract and full-text screening, documentation, and reporting.

The Covidence platform enables collaboration of the entire systematic reviews team and is suitable for researchers and students at all levels of experience.

From a user perspective, the interface is intuitive, and the citation screening is directed step-by-step through a well-defined workflow. Imports and exports are straightforward, with easy export options to Excel and CVS.

Access is free for Cochrane authors (a single reviewer), and Cochrane provides a free trial to other researchers in healthcare. Universities can also subscribe on an institutional basis.

Rayyan is a free and open access web-based platform funded by the Qatar Foundation, a non-profit organisation supporting education and community development initiative . Rayyan is used to screen and code literature through a systematic review process.

Unlike Covidence, Rayyan does not follow a standard SR workflow and simply helps with citation screening. It is accessible through a mobile application with compatibility for offline screening. The web-based platform is known for its accessible user interface, with easy and clear export options.

Function comparison of 5 software tools to support the systematic review process

Protocol development

Database integration

Only PubMed

PubMed 

Ease of import & export

Duplicate removal

Article screening

Inc. full text

Title & abstract

Inc. full text

Inc. full text

Inc. full text 

Critical appraisal

Assist with reporting

Meta-analysis

Cost

Subscription

Free

Subscription

Free

Subscription

EPPI-Reviewer

EPPI-Reviewer is a web-based software programme developed by the Evidence for Policy and Practice Information and Co-ordinating Centre  (EPPI) at the UCL Institute for Education, London .

It provides comprehensive functionalities for coding and screening. Users can create different levels of coding in a code set tool for clustering, screening, and administration of documents. EPPI-Reviewer allows direct search and import from PubMed. The import of search results from other databases is feasible in different formats. It stores, references, identifies and removes duplicates automatically. EPPI-Reviewer allows full-text screening, text mining, meta-analysis and the export of data into different types of reports.

There is no limit for concurrent use of the software and the number of articles being reviewed. Cochrane reviewers can access EPPI reviews using their Cochrane subscription details.

EPPI-Centre has other tools for facilitating the systematic review process, including coding guidelines and data management tools.

CADIMA is a free, online, open access review management tool, developed to facilitate research synthesis and structure documentation of the outcomes.

The Julius Institute and the Collaboration for Environmental Evidence established the software programme to support and guide users through the entire systematic review process, including protocol development, literature searching, study selection, critical appraisal, and documentation of the outcomes. The flexibility in choosing the steps also makes CADIMA suitable for conducting systematic mapping and rapid reviews.

CADIMA was initially developed for research questions in agriculture and environment but it is not limited to these, and as such, can be used for managing review processes in other disciplines. It enables users to export files and work offline.

The software allows for statistical analysis of the collated data using the R statistical software. Unlike EPPI-Reviewer, CADIMA does not have a built-in search engine to allow for searching in literature databases like PubMed.

DistillerSR

DistillerSR is an online software maintained by the Canadian company, Evidence Partners which specialises in literature review automation. DistillerSR provides a collaborative platform for every stage of literature review management. The framework is flexible and can accommodate literature reviews of different sizes. It is configurable to different data curation procedures, workflows and reporting standards. The platform integrates necessary features for screening, quality assessment, data extraction and reporting. The software uses Artificial Learning (AL)-enabled technologies in priority screening. It is to cut the screening process short by reranking the most relevant references nearer to the top. It can also use AL, as a second reviewer, in quality control checks of screened studies by human reviewers. DistillerSR is used to manage systematic reviews in various medical disciplines, surveillance, pharmacovigilance and public health reviews including food and nutrition topics. The software does not support statistical analyses. It provides configurable forms in standard formats for data extraction.

DistillerSR allows direct search and import of references from PubMed. It provides an add on feature called LitConnect which can be set to automatically import newly published references from data providers to keep reviews up to date during their progress.

The systematic review Toolbox is a web-based catalogue of various tools, including software packages which can assist with single or multiple tasks within the evidence synthesis process. Researchers can run a quick search or tailor a more sophisticated search by choosing their approach, budget, discipline, and preferred support features, to find the right tools for their research.

If you enjoyed this blog post, you may also be interested in our recently published blog post addressing the difference between a systematic review and a systematic literature review.

BLOG CTA

  • FSTA - Food Science & Technology Abstracts
  • IFIS Collections
  • Resources Hub
  • Diversity Statement
  • Sustainability Commitment
  • Company news
  • Frequently Asked Questions
  • Privacy Policy
  • Terms of Use for IFIS Collections

Ground Floor, 115 Wharfedale Road,  Winnersh Triangle, Wokingham, Berkshire RG41 5RB

Get in touch with IFIS

© International Food Information Service (IFIS Publishing) operating as IFIS – All Rights Reserved     |     Charity Reg. No. 1068176     |     Limited Company No. 3507902     |     Designed by Blend

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Med Internet Res
  • PMC10541641

Logo of jmir

An Automated Literature Review Tool (LiteRev) for Streamlining and Accelerating Research Using Natural Language Processing and Machine Learning: Descriptive Performance Evaluation Study

1 Institute of Global Health, University of Geneva, Geneva, Switzerland

Iza Ciglenecki

2 Médecins Sans Frontières, Geneva, Switzerland

Amaury Thiabaud

Alexander temerev, alexandra calmy.

3 HIV/AIDS Unit, Division of Infectious Diseases, Geneva University Hospital, Geneva, Switzerland

Olivia Keiser

Aziza merzouki, associated data.

Query for Embase and Web of Science, along with the hyperparameters definition, range and values for each clustering.

The data sets generated and analyzed during this study are available on the Open Science Framework platform named “Burden and care for acute and early HIV infection in sub-Saharan Africa: a scoping review protocol” [ 41 ]. A CSV file with the metadata of the 654 papers before text processing and the term frequency-inverse document frequency matrix (and term frequency-inverse document frequency vectorizer) in a pickle format have been uploaded and are freely available for benchmarking and further research.

Literature reviews (LRs) identify, evaluate, and synthesize relevant papers to a particular research question to advance understanding and support decision-making. However, LRs, especially traditional systematic reviews, are slow, resource-intensive, and become outdated quickly.

LiteRev is an advanced and enhanced version of an existing automation tool designed to assist researchers in conducting LRs through the implementation of cutting-edge technologies such as natural language processing and machine learning techniques. In this paper, we present a comprehensive explanation of LiteRev’s capabilities, its methodology, and an evaluation of its accuracy and efficiency to a manual LR, highlighting the benefits of using LiteRev.

Based on the user’s query, LiteRev performs an automated search on a wide range of open-access databases and retrieves relevant metadata on the resulting papers, including abstracts or full texts when available. These abstracts (or full texts) are text processed and represented as a term frequency-inverse document frequency matrix. Using dimensionality reduction (pairwise controlled manifold approximation) and clustering (hierarchical density-based spatial clustering of applications with noise) techniques, the corpus is divided into different topics described by a list of the most important keywords. The user can then select one or several topics of interest, enter additional keywords to refine its search, or provide key papers to the research question. Based on these inputs, LiteRev performs a k-nearest neighbor (k-NN) search and suggests a list of potentially interesting papers. By tagging the relevant ones, the user triggers new k-NN searches until no additional paper is suggested for screening. To assess the performance of LiteRev, we ran it in parallel to a manual LR on the burden and care for acute and early HIV infection in sub-Saharan Africa. We assessed the performance of LiteRev using true and false predictive values, recall, and work saved over sampling.

LiteRev extracted, processed, and transformed text into a term frequency-inverse document frequency matrix of 631 unique papers from PubMed. The topic modeling module identified 16 topics and highlighted 2 topics of interest to the research question. Based on 18 key papers, the k-NNs module suggested 193 papers for screening out of 613 papers in total (31.5% of the whole corpus) and correctly identified 64 relevant papers out of the 87 papers found by the manual abstract screening (recall rate of 73.6%). Compared to the manual full text screening, LiteRev identified 42 relevant papers out of the 48 papers found manually (recall rate of 87.5%). This represents a total work saved over sampling of 56%.

Conclusions

We presented the features and functionalities of LiteRev, an automation tool that uses natural language processing and machine learning methods to streamline and accelerate LRs and support researchers in getting quick and in-depth overviews on any topic of interest.

Introduction

Recently, the traditional emphasis of literature reviews (LRs) in identifying, evaluating, and synthesizing all relevant papers to a particular research question has shifted toward mapping research activity and consolidating existing knowledge [ 1 ]. Despite this broader scope, manual LRs are still error-prone, time- and resource-intensive, and have become ever more challenging over the years due to the increasing number of papers published in academic databases. It is estimated that within 2 years of publication, about one-fourth of all LRs are outdated, as reviewers fail to incorporate new papers on their topic of interest [ 2 , 3 ].

To shorten the time to completion, automation tools have been developed to either fully automate or semiautomate one or more specific tasks involved in conducting an LR, such as screening titles and abstracts [ 4 , 5 ], sourcing full texts, or automating data extraction [ 6 ]. In addition, recent advances in natural language processing (NLP) and machine learning (ML) have produced new techniques that can accurately mimic manual LRs faster and at lower costs [ 7 - 9 ]. In Vienna, in 2015, the International Collaboration for the Automation of Systematic Reviews was initiated to establish a set of principles to enable tools to be developed and integrated into toolkits [ 10 ].

In 2020, our group of researchers started developing an automation tool for LRs [ 11 ] in order to obtain a comprehensive overview of the sociobehavioral factors influencing HIV prevalence and incidence in Malawi. In this paper, we propose an updated version of the tool called LiteRev, which overcomes some of the shortcomings of the previous version. While previously restricted to Paperity, PubMed, PubMed Central, JSTOR, and arXiv, the search now includes 2 additional primary preprint services in the field of epidemiology and medical sciences, bioRxiv and medRxiv, and CORE, a large collection of open-access research papers. In addition, in our previous tool, the search was systematically performed on the papers’ full texts, and references were included in the processed text. In LiteRev, the user can choose to focus on the abstract or on the full text and include or exclude the references. In addition, multiple parallel application programming interface (API) connections to each database have been implemented, allowing for faster retrieval of papers. In the last years, NLP and ML have rapidly evolved, and LiteRev makes use of the most recent text processing, embedding, and clustering techniques. Finally, we added a k-nearest neighbor (k-NN) search module that allows the user to find papers of high similarities with key papers to the research question.

To assess the performance of LiteRev, we conducted a manual LR on the burden and care for acute and early HIV infection (AEHI) in sub-Saharan Africa using one open-access database, PubMed, and 2 subscription-based databases, Embase and Web of Science. AEHI contributes to continuous HIV transmission despite global achievements in HIV control [ 12 , 13 ]. Acute HIV infection is a brief period between viral acquisition and appearance of HIV antibodies, characterized by extremely high viral load values, seeding of viral reservoirs, and disproportionally high likelihood of onward transmission [ 14 - 16 ]. Diagnosing acute HIV infection is challenging: the symptoms are often unspecific, the infection cannot be detected with antibody-detecting rapid diagnostic tests, and the tests detecting antigens or viruses are more complex and expensive. Nevertheless, the testing and care for AEHI have been part of guidance and practice of routine HIV care in high-income countries for many years [ 17 - 19 ]; yet in sub-Saharan Africa, the diagnosis and care for AEHI are almost nonexistent, and current WHO testing guidelines provide no guidance [ 20 , 21 ]. The objective of the proposed LR is to summarize the current knowledge on the burden of AEHI in sub-Saharan Africa and existing models of care to inform future public health interventions. All papers available by December 20, 2022, and related to burden and care for AEHI in sub-Saharan Africa were retrieved, and after removing duplicates, unique papers were screened for relevance. After screening, papers from PubMed identified as relevant by the manual LR were compared to the list of suggested papers by LiteRev. We discussed the performance using standard classification metrics such as true and false predictive values, recall, and work saved over sampling (WSS).

Metadata Collection and Text Processing

Based on the user’s query, LiteRev performs an automated search, using the corresponding APIs, on 8 different open-access databases: PubMed, PubMed Central, CORE, JSTOR, Paperity, arXiv, bioRxiv, and medRxiv. Available metadata, that is, list of authors and their affiliations, MeSH keywords, digital object identifier, title, abstract, publication date, journal provider, and URL of the PDF version of the full text paper, are retrieved and stored in a PostgreSQL database hosted on the local machine of the user. If the full text is not available as metadata, it is extracted automatically from the available PDF file, then, references, acknowledgments, and other unnecessary terms are removed, and the remaining text is checked to confirm that it still satisfies the search terms. To identify duplicate papers, the tool compares the title and abstract of papers. If duplicates are identified, the metadata of the papers are compared to check for any discrepancies. In cases where there are discrepancies, information is merged from different sources to collect as much information as possible on the same paper. Depending on the user’s needs and requirements, LiteRev can be performed on the abstract or on the full text.

NLP has evolved rapidly, and, in particular, some powerful tools were developed to process text data much more efficiently. We included those features in LiteRev (Gensim [ 22 ] and spaCy [ 23 ]). After removing papers with empty text, emails, newline characters, single quotes, internet addresses, and punctuation are stripped, and papers that do not fulfill the languages (one or multiple) chosen by the user are discarded. Sentences are then split into words and lemmatized to remove as many variations of the same word as possible. Words belonging to a list of stop words (ie, words that are not informative) and words with less than 3 characters are also removed. Next, bigrams, trigrams, and four-grams (ie, the combination of 2, 3, and 4 words) are created using a probabilistic measure. In practice, n-gram models are highly effective in modeling language data. Finally, we remove words that are in only 1 paper or words that occur too often (ie, in more than 60% of the corpus) to have a significant meaning.

Clustering and Topic Modeling

Topic modeling allows organizing documents into clusters based on similarity and identifying abstract topics covered by similar papers. In LiteRev, it allows the user to broaden the search strategy and get a more comprehensive and organized overview of the corpus. It can also help to quickly discard a pool of papers when searching the literature for a specific topic and significantly reduce the amount of text to verify manually.

After abstracts or full texts are processed, each paper’s remaining words (namely, bags of words) are represented as a term frequency-inverse document frequency (TF-IDF) matrix, which is computed using the Scikit-Learn package [ 24 ]. A TF-IDF matrix is similar to a document (in row) and word (in column) co-occurrence matrix normalized by the number of papers in which the word is present. Less meaningful words, often present in the corpus, get a lower score. Because of the often-high dimension of the TF-IDF matrix (size of corpus × size of vocabulary), it is needed to embed the matrix using a pairwise controlled manifold approximation (PaCMAP) dimensionality reduction technique [ 25 ]. The corpus is then divided into different clusters using the hierarchical density-based spatial clustering of applications with noise (HDBSCAN) algorithm [ 26 ].

PaCMAP and HDBSCAN have several important hyperparameters that need to be determined. Table S1 in Multimedia Appendix 1 represents the 4 hyperparameters involved and the ranges of their possible values. To find the best set of hyperparameters possible, we use the Tree-structured Parzen Estimator algorithm implemented by the Optuna package [ 27 ] and store the results of 500 trials in the previously created PostgreSQL database. The density-based clustering validation (DBCV), a weighted sum of “validity index” values of clusters [ 28 ], is the considered performance metric to compare the different sets. Its value varies between 0 and 1 when used with HDBSCAN, with larger values providing better clustering solutions. This metric takes the noise into account and captures the shape property of clusters via densities and not distances. For coherency check, another metric is computed, the Silhouette coefficient, which measures cluster cohesiveness and separation with an index between –1 and 1, with larger values providing better clustering solutions [ 29 ].

If after 500 trials, the DBCV score is below 0.5, another round of 500 trials is performed, and so on until a DBCV score equal to or above 0.5 is reached. Once the values of the hyperparameters that maximize the DBCV score are determined, obtained clusters that are larger than 25% of the corpus are clustered again with the same entire procedure described above (starting from the text processing). Once each cluster is smaller than 25% of the corpus, its 10 most important words are extracted using the YAKE package [ 30 ] to ensure interpretability and define topics. This supports the user in getting a quick overview of the corpus and, if desired, to select one or more topics of interest for further exploration. They can then also enter additional keywords to refine this search.

Nearest Neighbors

LiteRev allows the user to define or add papers in the corpus that are considered as being key to the research question. The key papers for the case study were proposed by one of the coauthors (IC), who is working on the topic of acute HIV infection. The papers were chosen by the coauthor from the previously identified literature based on a nonsystematic search and from the references of key review papers if they fulfilled inclusion criteria (see Manual LR in the Methods section). Using the k-NN algorithm from the Scikit-Learn package [ 24 ], a list of potentially relevant papers is provided to the user. Papers deemed to be relevant are tagged by the user and considered as new key papers. This process is iterated as long as relevant papers are being identified (generally 3 to 4 iterations). The initial value of the hyperparameter k, which represents the number of nearest neighbors to be selected, is equal to the value of the number of neighbors for PaCMAP obtained at the first clustering process. The dimension space is the same as the number of dimensions obtained during the embedding process by PaCMAP.

The list of relevant papers from the k-NN search or a list of papers about one or more topics can then be exported in a CSV or HTML format, and their PDF retrieved and stored in a zip folder. For visualization and further exploration, a web-based 2D representation of the corpus is available in an HTML format. Every dot, colored according to the cluster it belongs to, represents a paper with the following available information: date, title, 10 most important keywords of the cluster’s topic, and the cluster number. When clicking on a paper (dot), direct access to the full text is provided using the URL. Figure 1 shows the entire process flow of LiteRev.

An external file that holds a picture, illustration, etc.
Object name is jmir_v25i1e39736_fig1.jpg

Diagram of LiteRev process. API: application programming interface; DOI: digital object identifier; HDBSCAN: hierarchical density-based spatial clustering of applications with noise; PaCMAP: pairwise controlled manifold approximation; TF-IDF: term frequency-inverse document frequency.

The manual LR aimed at summarizing the current evidence on the burden and care provided for AEHI in sub-Saharan Africa, to inform policy, practice, and research in the future, to address the following questions: What is the prevalence of AEHI in sub-Saharan Africa among people being tested for HIV? What models of care have been used for AEHI diagnosis and care, including treatment, partner notifications, and behavior change? What linkage to care has been reached? and What facilitators and barriers to AEHI care were identified?

We searched all papers in PubMed, Embase, and Web of Science related to burden and care for AEHI in sub-Saharan Africa that were published from the inception of the databases to December 20, 2022, using the query: “(”early hiv“ OR ”primary hiv“ OR ”acute hiv“ OR ”HIV Human immuno deficiency virus“ OR (”Window period“ AND HIV)) AND (”Africa South of the Sahara“ OR ”Central Africa“ OR ”Eastern Africa“ OR ”Southern Africa“ OR ”Western Africa“ OR ”sub-saharan africa“ OR ”subsaharan africa“ OR angola OR benini OR botswana OR ”burkina faso“ OR burundi OR cameroon OR ”cape verde“ OR ”central africa“ OR ”central african republic“ OR chad OR comoros OR congo OR ”cote d ivoire“ OR ”democratic republic congo“ OR djibouti OR ”equatorial guinea“ OR eritrea OR eswatini OR ethiopia OR gabon OR gambia OR ghana OR guinea OR ”guinea-bissau“ OR kenya OR lesotho OR liberia OR madagascar OR malawi OR mali OR mayotte OR mozambique OR namibia OR niger OR nigeria OR rwanda OR sahel OR ”sao tome and principe“ OR senegal OR ”sierra leone“ OR somalia OR ”south africa“ OR ”south sudan“ OR sudan OR tanzania OR togo OR uganda OR zambia OR zimbabwe)”. This query is specific to PubMed syntax and is the exact same for both the manual LR and LiteRev. Syntax-specific queries for the manual LR in Embase and Web of Science are shown in Multimedia Appendix 1 . Papers retrieved from Embase and Web of Science have not been used by LiteRev and will not be part of the comparison and performance assessment, but their results will be discussed in the Results and Discussion sections.

The studies were included if they described AEHI prevalence among the population tested for HIV or describe the diagnostic strategy, model of care or linkage to care for AEHI, including studies looking at perceptions and barriers among patients and staff. Only studies conducted in sub-Saharan Africa were included. We followed the Joanna Briggs Institute methodology for conducting LRs [ 31 ], and papers identified by the databases were uploaded into Rayyan (Rayyan) [ 32 ]. Duplicates were deleted, and the screening process, on titles and abstracts, was conducted independently by 2 reviewers (EO and IC). Selected papers were further manually screened based on full text for eligibility against inclusion criteria. LiteRev was run in parallel on the abstracts only, but results were compared both to the title or abstract screening phase and the full-text screening phase of the manual LR.

Performance Comparison

To assess the performance of LiteRev, we compared the results from the manual LR to the same review conducted using LiteRev. Relevant and not relevant papers, as identified by the manual LR during the title or abstract screening phase and the full-text screening phase, were defined as true labels. Suggested and not suggested papers by LiteRev were considered as predicted labels. Based on these figures, 2 confusion matrices were produced. Positive and negative predictive values (% of relevant and not relevant papers correctly identified; PPV and NPV), recall (number of relevant papers identified using LiteRev among those identified using manual review), and WSS [ 33 , 34 ] were computed and discussed.

An external file that holds a picture, illustration, etc.
Object name is jmir_v25i1e39736_fig5.jpg

where true negatives are the number of nonrelevant abstracts that were correctly identified as nonrelevant by LiteRev, that is, that were not suggested by LiteRev for screening, and false negatives are the number of relevant abstracts incorrectly classified as nonrelevant by LiteRev.

Ethical Considerations

No ethics approval was applied as the underlying data are not subject to any approval. The data are publicly available metadata from scientific papers.

Text Processing and Topic Modeling

Based on the search strategy described in the Methods section, we obtained 653 papers with metadata directly from PubMed and added 1 key paper given by the user that was not present in the list of retrieved papers. After removing duplicates (n=3), papers with no abstract available (n=15), those not in English (n=3), and empty abstract after text processing (n=2), 631 unique papers were transformed in a TF-IDF matrix comprised of 631 rows representing the corpus and 3136 columns representing the unique words (vocabulary), including n-grams.

For the first embedding and clustering process, a DBCV score of 0.533 was obtained after the first 500 trials with the following best set of hyperparameters: PaCMAP: 310 dimensions and 18 neighbors; and HDBSCAN: minimum cluster size of 30 and minimum samples of 7. This resulted in 5 main clusters composed of, respectively, 203, 193, 169, 35, and 31 papers. The 3 largest main clusters contained more than 25% of the total number of papers in the corpus, which triggered 3 additional text processing, embedding, and clustering processes. The best set of hyperparameters for these additional processes can be found in Table S1 in Multimedia Appendix 1 .

In the end, the pool of 203 papers was split into 5 clusters (with, respectively, 98, 41, 25, 21, and 18 papers), the pool of 193 papers into 7 clusters (with, respectively, 47, 40, 37, 22, 20, 14, and 13 papers), and the pool of 169 papers into 2 clusters (with, respectively, 87 and 82 papers). In total, the corpus of 631 papers was divided into 16 clusters ranging from 13 to 98 papers. Figure 2 shows the 2D map of the corpus with the 16 clusters identified. Table 1 shows the corresponding 16 topics grouped by main topics described by their 10 most important keywords and the number of papers in each.

An external file that holds a picture, illustration, etc.
Object name is jmir_v25i1e39736_fig2.jpg

2D representation of the corpus with the 16 clusters. Black triangles represent the 18 key papers and red triangles represent the 64 relevant papers correctly identified by LiteRev.

The 16 topics grouped by 5 main topics (in bold) with the 10 most important keywords, the number of papers, and the number of relevant papers in total (key papers).

TopicKeywordsPapers, nRelevant papers (key papers), n (%)

0Woman, risk, year, incidence, high, man, partner, transmission, sexual, testing989 (1)

1Cart, month, initiation, group, treatment, viral, rna, child, infant, week180 (0)

2Risk, high, health, day, score, aehi, prevalence, care, diagnosis, population416 (2)

3Patient, treatment, care, late, diagnosis, associate, testing, aor, datum, initiation210 (0)

4Patient, disease, adult, infect, lymphadenopathy, cell, tuberculous, lymphadenitis, associate, present250 (0)

5Antibody, response, neutralize, vaccine, isolate, neutralization, epitope, env, primary, individual470 (0)

6Subtype, resistance, drug, sequence, mutation, diversity, strain, primary, patient, recombinant403 (0)

7Response, specific, associate, increase, immune, ifn, early, gag, point, level370 (0)

8Level, viremia, acute, associate, early, individual, infect, load, cytokine, set201 (0)

9Load, early, copy, log, plasma, subtype, woman, time, african, rna225 (1)

10Isolate, primary, tropic, individual, derive, clone, strain, infect, dual, sequence130 (0)

11Response, immune, phi, specific, control, activation, plasma, individual, acute, cytokine140 (0)

12Blood, assay, sample, donor, positive, risk, incidence, antibody, estimate, acute8221 (7)

13Ahi, care, participant, health, intervention, patient, diagnosis, early, acute, risk8737 (7)

14Infant, mother, week, transmission, child, month, age, woman, test, infect350 (0)

15Child, year, mortality, age, infect, treatment, patient, associate, month, clinical310 (0)

Using the search query described in the Methods section, 1721 records were retrieved, among which 653 were from PubMed and 1067 records from 2 subscription-based databases, namely, Embase and Web of Science. In total, 879 records were excluded after removing duplicates, empty abstracts, and papers that were not written in English. This resulted in 631 unique papers in PubMed and 211 unique papers in Embase and Web of Science. We also removed the 18 key papers from the PubMed corpus before the screening phases. In total, 613 papers in PubMed were screened at the title and abstract level, and 87 of them were relevant to the research question. After the full-text screening phase on these 87 relevant papers, we found 48 papers to be relevant to the manual LR.

Out of the 211 unique papers from Embase and Web of Science, 46 papers were found relevant to the research question after the title or abstract screening phase (ie, 34.6% of the 133 relevant papers), and 19 after the full-text screening phase (ie, 28.4% of the 67 relevant papers; Figure 3 ). From these 19 relevant papers, 3 were conference abstracts, and 1 paper was kept only based on its title and abstract as the full text could not be found. These 221 papers were not part of PubMed, and hence, not available to LiteRev.

An external file that holds a picture, illustration, etc.
Object name is jmir_v25i1e39736_fig3.jpg

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) diagram of the manual literature review related to burden and care for acute and early HIV infection in sub-Saharan Africa.

Nearest Neighbors Search and Performance Comparison

One coauthor (IC) provided a list of 18 key papers. With these 18 key papers, we performed a k-NN search on the corpus, embedded into 310 dimensions, with k=18, the number of the nearest neighbors for PaCMAP that maximized the DBCV score of the first clustering process. The first k-NN search suggested 110 papers, including 45 of the relevant papers identified by the manual LR title or abstract screening (precision of 41%). Based on these 45 relevant papers, the second k-NN iteration suggested 26 additional papers out of which 8 were confirmed as relevant (precision of 31%). The third iteration found 9 more relevant papers out of 38 papers suggested (precision of 24%). The fourth and last iteration suggested 19 papers out of which 1 was relevant (precision of 5%).

In total, 193 papers out of the 613 papers were suggested by LiteRev. Suggested papers included 64 of the 87 papers identified as relevant during the title or abstract screening of the manual LR. Figure 3 maps the key papers (black triangles) and the relevant papers (red triangles) identified at the title or abstract screening level of the manual LR and that were correctly classified as relevant by LiteRev. Table 1 indicates the number of key papers and the number of relevant papers on each topic.

Figure 4 (top panel) summarizes the above results and represents the confusion matrix between LiteRev (predicted labels) and the manual LR (true labels) after the title or abstract screening phase. Based on these numbers, the PPV was 33.2%, the NPV was 94.5%, and the recall was 73.6%, which led to a WSS of 42.1%.

An external file that holds a picture, illustration, etc.
Object name is jmir_v25i1e39736_fig4.jpg

Confusion matrices based on the results of (top panel) the title or abstract screening and (bottom panel) full-text screening performed during the manual literature review.

The 64 relevant papers found by LiteRev belonged essentially to 2 topics (30 relevant papers in one and 14 relevant papers in the other). The topic that contained 30 relevant papers had 87 papers in total and covered early diagnosis, care seeking, and interventions during the acute HIV infection stage (keywords: ahi, care, participant, health, intervention, patient, diagnosis, early, acute, risk). The topic that contained 14 relevant papers had 82 papers and covered the detection of AEHI by antibody assays and incidence estimate (keywords: blood, assay, sample, donor, positive, risk, incidence, antibody, estimate, acute). Screening 53 additional papers (those not suggested by the k-NN search) from these 2 topics would allow the user to identify 3 additional relevant papers.

After the full-text screening phase of the manual LR, 48 out of the 87 relevant papers from the title and abstract screening phase were deemed relevant to the research question. The 64 papers suggested by LiteRev (based on abstracts only) included 42 out of the 48 papers confirmed as relevant after the full-text screening phase of the manual LR. Figure 4 (bottom panel) summarizes the above results and represents the confusion matrix between LiteRev (predicted labels) and the manual LR (true labels) after the full-text screening phase. Based on these numbers, the PPV was 65.6%, the NPV was 26.1%, and the recall was 87.5%, which led to an additional WSS of 13.9% for an overall WSS of 56% compared to the manual LR.

Processing Time

The processing time represents the overall computation time taken by LiteRev to complete the entire process of metadata retrieval, processing, clustering, and neighbor search. It does not include the time that the user took to check the relevance of the suggested papers. The percentage of time saved by the user is expressed by the WSS metric.

It took 5 minutes for LiteRev to retrieve the metadata of the 653 papers and text process the remaining 631 abstracts and transform it into a TF-IDF matrix. Each trial of the optimization process with a specific set of hyperparameters required on average 1 minute of computation. With 3000 trials in total (500 for the main clustering process, 1000 for the first 2 additional clustering processes, and 500 for the last one) run sequentially, this led to an additional 50 hours, that is, roughly 2 days, to complete the entire optimization process. This computation time can be substantially reduced by running the trials in parallel. Finally, the nearest neighbors are obtained almost instantaneously.

Principal Results

We presented LiteRev, an automation tool that uses NLP and ML methods to support researchers in different steps of a manual LR. The identification of papers to be included in an LR is a critical and time-intensive process, with the majority of time spent in screening thousands of papers for relevance. By combining text processing, literature mapping, topic modeling, and similarity-based search, LiteRev provides a fast and efficient way to remove duplicates, select papers from specific languages, visualize the corpus on a 2D map, identify the different topics covered when addressing the research question, and suggest a list of potentially relevant papers to the user based on their input (eg, prior knowledge of key papers).

Preliminary usage of LiteRev showed that it significantly reduced the researcher’s workload and overall time required to perform an LR. Compared to a manual LR, LiteRev correctly identified 87.5% of the 48 relevant papers (recall), by screening only 31.5% (193/631 papers) of the whole corpus, which corresponds to a total WSS of 56% at the end of the full-text screening phase. In addition, the actual time spent on running LiteRev and retrieving the results was relatively short, and the user was free in the meantime to focus on other work. The text processing and the nearest neighbors search took no more than 5 minutes of computation for 631 papers.

With its topic modeling capability, LiteRev aims at summarizing current evidence on a specific research question to inform policy, practice, and research. For our use case, LiteRev identified 5 main topics and 16 different topics related to AEHI in sub-Saharan Africa, allowing the researcher to have an overview of the different perspectives related to this research question. Finding 61 out of the 105 relevant papers after the title and abstract screening phase (including the key papers) in only 2 topics validates the quality of the clustering.

Limitations

LiteRev is currently limited to open-access databases that provide free APIs to abstract or full-text papers. Databases often used for LRs, such as Embase or Web of Science do not provide API access, require a subscription for accessing full-text papers, or do not allow for text mining and ML analysis. Hence, 19 relevant papers identified in Embase or Web of Science were not available to LiteRev. In addition, when performed on full texts, LiteRev currently works on digitally generated PDFs but not on image-only (scanned) PDFs.

Another limitation concerns the possibility of sharing the list of potentially relevant papers with other users or reviewers. LiteRev does not offer this functionality yet; hence, double screening of papers and comparison of results are not possible at the moment. To overcome this limitation, the user has the option to export their list of papers into a CSV format, which can be uploaded on Rayyan or other similar software for systematic reviews.

As of today, LiteRev is still intended to complement rather than replace full systematic reviews. Finally, by January 2023, no public web-based user interface is available yet.

Comparisons and Future Work

The systematic review tool [ 35 ] maintains a searchable database of tools that can be used to assist in many aspects of LR studies, several of which aim to semiautomate parts of the review process. At the end of February 2022, we identified 14 tools (out of which 9 were free) designed to semiautomate searching and screening with only 4 of them providing text analysis functionalities (scite.ai, SRDB.PRO, StArt, and Sysrev). In addition, since the beginning of 2022, a collaborative team at Utrecht University created a repository that aims to give an overview and comparison of software used for systematically screening large amounts of textual data using ML [ 36 ]. The process of the initial selection of the software tools is described in the Open Science Framework [ 37 ]. Out of the 9 software listed, 4 were free and 2 were in addition open-source (ASReview [ 38 ] and FASTREAD [ 39 ]). Most of them were using TF-IDF for feature extractions with other methods being Word(Doc)2Vec, and one also using Sentence Embeddings Using Siamese BERT Networks (ASReview). All of them were then using classifiers (mainly support vector machine) with or without balancing techniques with ASReview allowing users to choose between different algorithms. None were using a combination of unsupervised learning techniques (PaCMAP and HDBSCAN) in conjunction with a k-NN search. When we have fulfilled the inclusion criteria, we plan to make a pull request and add LiteRev to the overview.

LiteRev is developed in an iterative way with continuous integration of feedbacks from users, and its modules can easily be updated or replaced depending on the needs of the users and the technical evolutions. We are further developing LiteRev by proposing a web application with a user-friendly interface and by adding more functionality in order to better automate the different stages of an LR. We are also planning to implement a living review [ 40 ] by retrieving new papers on each research question in our database (eg, “HIV” AND “Africa”) on a regular basis (eg, every month), and each new paper will be text processed and assigned to the topic it belongs to using a predictive algorithm. Although we compared the performance of LiteRev with 1 manual LR in this paper, we plan to perform additional similar comparisons and performance evaluations in the future using other published LRs covering different topics.

We presented LiteRev, an automation tool that uses NLP and ML techniques to support, facilitate, and accelerate the conduction of LRs providing aid and automation to different steps involved in this process. Its different modules (retrieval of papers’ metadata from open-access databases using a search query, processing of texts, embedding and clustering, and finding of nearest neighbors) can easily be updated or replaced depending on the needs of the users and the technical evolutions. As more papers are published every year, LiteRev not only has the potential to simplify and accelerate LRs, but it also has the capability of helping the researcher get a quick and in-depth overview of any topic of interest.

Acknowledgments

Ms Mafalda Vieira Burri, the librarian from the library of the University of Geneva helped define the search queries. We acknowledge the support of the Swiss National Science Foundation (SNF professorship grants 196270 and 202660 to Professor O Keiser), which funded this study. The funder had no role in study design, data collection and analysis, decision to publish, or paper preparation.

Abbreviations

AEHIacute and early HIV infection
APIapplication programming interface
DBCVdensity-based clustering validation
HDBSCANhierarchical density-based spatial clustering of applications with noise
k-NNk-nearest neighbor
LRliterature review
MLmachine learning
NLPnatural language processing
NPVnegative predictive value
PaCMAPpairwise controlled manifold approximation
PPVpositive predictive value
TF-IDFterm frequency-inverse document frequency
WSSwork saved over sampling

Multimedia Appendix 1

Data availability.

Authors' Contributions: EO, A Thiabaud, and A Temerev wrote the code in Python. EO and AM obtained and analyzed the results it produced. EO wrote the first draft of the paper. IC conducted the manual literature review of the use case, and IC and EO identified the relevant papers. EO, IC, A Thiabaud, and AM helped write the paper, and EO, AM, OK, AC, and IC reviewed the paper.

Conflicts of Interest: None declared.

  • Resources Home 🏠
  • Try SciSpace Copilot
  • Search research papers
  • Add Copilot Extension
  • Try AI Detector
  • Try Paraphraser
  • Try Citation Generator
  • April Papers
  • June Papers
  • July Papers

SciSpace Resources

Automate your literature review with AI

Shubham Dogra

Table of Contents

Traditional methods of literature review can be susceptible to errors . Whether it’s overcoming human bias ">human bias or sifting through an incredibly large amount of scientific research being published today. Not to forget all the papers that have already been published in the past 100 years. Putting both together makes a heap of information that is humanly impossible to sift through. At least do so in an efficient way.

Thanks to artificial intelligence, long and tedious literature reviews are becoming quick and comprehensive. No longer do researchers have to spend endless hours combing through stacks of books and journals.

In this blog post, we'll dive deep into the world of automating your literature review with AI, exploring what a literature review is, why it's so crucial, and how you can harness AI tools to make the process more effective.

What is a literature review?

A literature review is essentially the foundation of a scientific research project, providing a comprehensive overview of existing knowledge on a specific topic. It gives an overview of your chosen topic and summarizes key findings, theories, and methodologies from various sources.

This critical analysis not only showcases the current state of understanding but also identifies gaps and trends in the scientific literature. In addition, it also shows your understanding of your field and can help provide credibility to your research paper .

Types of literature review

There are several types of literature reviews but for the most part, you will come across five versions. These are:

1. Narrative review: A narrative review provides a comprehensive overview of a topic, usually without a strict methodology for selection.

2. Systematic review: Systematic reviews are a strategic synthesis of a topic. This type of review follows a strict plan to identify, evaluate, and critique all relevant research on a topic to minimize bias.

3. Meta-analysis: It is a type of systematic review that uses research data from multiple articles to draw quantitative conclusions about a specific phenomenon.

4. Scoping review: As the name suggests, the purpose of a scoping review is to study a field, highlight the gaps in it, and underline the need for the following research paper.

5. Critical review: A critical literature review assesses and critiques the strengths and weaknesses of existing literature, challenging established ideas and theories.

Benefits of using literature review AI tools?

Using literature review AI tools can be a complete game changer in your research. They can make the literature review process smarter and hassle-free. Here are some practical benefits:

AI tools for literature review can skim through tons of research papers and find the most relevant one for your topic in no time, thus saving you hours of manual searching.

Comprehensive insights

No matter how complex the topic is or how long the research papers are, AI tools can find key insights like methodology, datasets, limitations, etc, by simply scanning the abstracts or PDF documents.

Eliminate bias

AI doesn't have favorites. Based on the data it’s fed, it evaluates research papers objectively and reduces as much bias in your literature review as possible.

Faster research questions

AI tools present loads of research papers in the same place. Some AI tools let you create visual maps and connections, thus helping you identify gaps in existing literature and arriving at your research question faster.

Consistency

AI tools ensure your review is consistently structured and formatted . They can also check for proper grammar and citation style, which is crucial for scholarly writing.

Multilingual support

There are heaps of non-native English-speaking researchers who can struggle with understanding scientific jargon in English. AI tools with multilingual support can help such academicians conduct their literature review in their own language.

How to write a literature review with AI

Now that we understand the benefits of a literature review using artificial intelligence, let's explore how you can automate the process. Literature reviews with AI-powered tools can save you countless hours and allow a more comprehensive and systematic approach. Here's one process you can follow:

Choose the right AI tool

Several AI search engines like Google Scholar, SciSpace, Semantic Scholar help you find the most relevant papers semantically. Or in other words even without the right keywords. These tools understand the context of your search query and deliver the results.

Find relevant research papers

Once you input your research question or keywords into a search engine like Google Scholar, Semantic Scholar, or SciSpace, it scours millions of papers worth of databases to find relevant articles. After that, you can narrow your search results to a certain time period, journals, number of citations, and other parameters for more accuracy.

Analyze the search results

Now that you have your list of relevant academic papers, the next step would be reviewing these results. A lot of AI-powered tools for literature review will often provide summaries along with the paper. Some sophisticated tools also help you gather key points from multiple papers at once and let you ask questions regarding that topic. This way, you can get an understanding of the topic and further have a better understanding of your field.

Organize your collection

Whether you’re writing a literature review or your paper, you will need to keep track of your references. Using AI tools, you can efficiently organize your findings, store them in reference managers, and instantly generate citations automatically, saving you the hassle of manually formatting references.

Write the literature review

Now that you’ve done your groundwork, you can start writing your literature review. Although you should be doing this yourself, you can use tools like paraphrasers, grammar checkers, and co-writers to help you refine your academic writing and get your point across with more clarity.

Best AI Tools for Literature Review

Since generative AI and ChatGPT came into the picture, there are heaps of AI tools for literature review available out there. Some of the most comprehensive ones are:

SciSpace is a valuable tool to have in your arsenal. It has a repository of 270M+ papers and makes it easy to find research articles. You can also extract key information to compare and contrast multiple papers at the same time. Then, go on to converse with individual papers using Copilot, your AI research assistant.

Love using SciSpace tools? Enjoy discounts! Use SR40 (40% off yearly) and SR20 (20% off monthly). Claim yours here 👉 SciSpace Premium

Research Rabbit

Research Rabbit is a research discovery tool that helps you find new, connected papers using a visual graph. You can essentially create maps around metadata, which helps you not only explore similar papers but also connections between them.

Iris AI is a specialized tool that understands the context of your research question, lets you apply smart filters, and finds relevant papers. Further, you can also extract summaries and other data from papers.

If you already don’t know about ChatGPT , you must be living under a rock. ChatGPT is a chatbot that creates text based on a prompt using natural language processing (NLP). You can use it to write the first draft of your literature review, refine your writing, format it properly, write a research presentation, and many more things.

Things to keep in mind when using literature review AI tools

While AI-powered tools can significantly streamline the literature review process, there are a few things you should keep in mind while employing them:

Quality control

Always review the results generated by AI tools. AI is powerful but not infallible. Ensure that you do further analysis by yourself and determine that the selected research articles are indeed relevant to your research.

Ethical considerations

Be aware of ethical concerns, such as plagiarism and AI writing. Use of AI is still frowned upon so make sure you do a thorough check for originality of your work, which is vital for maintaining academic integrity.

Stay updated

The world of AI is ever-evolving. Stay updated on the latest advancements in AI tools for literature review to make the most of your research.

In conclusion

Artificial intelligence is a game-changer for researchers, especially when it comes to literature reviews. It not only saves time but also enhances the quality and comprehensiveness of your work. With the right AI tool and a clear research question in hand, you can build an excellent literature review.

automation tools for literature review

A few more good reads for you!

Types of Essays in Academic Writing

How to Write a Conclusion for a Research Paper

You might also like

Boosting Citations: A Comparative Analysis of Graphical Abstract vs. Video Abstract

Boosting Citations: A Comparative Analysis of Graphical Abstract vs. Video Abstract

Sumalatha G

The Impact of Visual Abstracts on Boosting Citations

Introducing SciSpace’s Citation Booster To Increase Research Visibility

Introducing SciSpace’s Citation Booster To Increase Research Visibility

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Usage of automation tools in systematic reviews

Affiliations.

  • 1 Department of Epidemiology, Biostatistics, and Bioinformatics, Amsterdam Public Health, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands.
  • 2 Medical Library, Amsterdam Public Health, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands.
  • 3 Cochrane Netherlands, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.
  • PMID: 30561081
  • DOI: 10.1002/jrsm.1335

Systematic reviews are a cornerstone of today's evidence-informed decision making. With the rapid expansion of questions to be addressed and scientific information produced, there is a growing workload on reviewers, making the current practice unsustainable without the aid of automation tools. While many automation tools have been developed and are available, uptake seems to be lagging. For this reason, we set out to investigate the current level of uptake and what the potential barriers and facilitators are for the adoption of automation tools in systematic reviews. We deployed surveys among systematic reviewers that gathered information on tool uptake, demographics, systematic review characteristics, and barriers and facilitators for uptake. Systematic reviewers from multiple domains were targeted during recruitment; however, responders were predominantly from the biomedical sciences. We found that automation tools are currently not widely used among the participants. When tools are used, participants mostly learn about them from their environment, for example, through colleagues, peers, or organization. Tools are often chosen on the basis of user experience, either by own experience or from colleagues or peers. Lastly, licensing, steep learning curve, lack of support, and mismatch to workflow are often reported by participants as relevant barriers. While conclusions can only be drawn for the biomedical field, our work provides evidence and confirms the conclusions and recommendations of previous work, which was based on expert opinions. Furthermore, our study highlights the importance that organizations and best practices in a field can have for the uptake of automation tools for systematic reviews.

© 2018 John Wiley & Sons, Ltd.

PubMed Disclaimer

Similar articles

  • A question of trust: can we build an evidence base to gain trust in systematic review automation technologies? O'Connor AM, Tsafnat G, Thomas J, Glasziou P, Gilbert SB, Hutton B. O'Connor AM, et al. Syst Rev. 2019 Jun 18;8(1):143. doi: 10.1186/s13643-019-1062-0. Syst Rev. 2019. PMID: 31215463 Free PMC article.
  • A focus on cross-purpose tools, automated recognition of study design in multiple disciplines, and evaluation of automation tools: a summary of significant discussions at the fourth meeting of the International Collaboration for Automation of Systematic Reviews (ICASR). O'Connor AM, Glasziou P, Taylor M, Thomas J, Spijker R, Wolfe MS. O'Connor AM, et al. Syst Rev. 2020 May 4;9(1):100. doi: 10.1186/s13643-020-01351-4. Syst Rev. 2020. PMID: 32366302 Free PMC article.
  • Systematic review automation tools improve efficiency but lack of knowledge impedes their adoption: a survey. Scott AM, Forbes C, Clark J, Carter M, Glasziou P, Munn Z. Scott AM, et al. J Clin Epidemiol. 2021 Oct;138:80-94. doi: 10.1016/j.jclinepi.2021.06.030. Epub 2021 Jul 7. J Clin Epidemiol. 2021. PMID: 34242757
  • A narrative review of recent tools and innovations toward automating living systematic reviews and evidence syntheses. Schmidt L, Sinyor M, Webb RT, Marshall C, Knipe D, Eyles EC, John A, Gunnell D, Higgins JPT. Schmidt L, et al. Z Evid Fortbild Qual Gesundhwes. 2023 Sep;181:65-75. doi: 10.1016/j.zefq.2023.06.007. Epub 2023 Aug 16. Z Evid Fortbild Qual Gesundhwes. 2023. PMID: 37596160 Review.
  • Making evidence more wanted: a systematic review of facilitators to enhance the uptake of evidence from systematic reviews and meta-analyses. Wallace J, Byrne C, Clarke M. Wallace J, et al. Int J Evid Based Healthc. 2012 Dec;10(4):338-46. doi: 10.1111/j.1744-1609.2012.00288.x. Int J Evid Based Healthc. 2012. PMID: 23173658 Review.
  • Inter-reviewer reliability of human literature reviewing and implications for the introduction of machine-assisted systematic reviews: a mixed-methods review. Hanegraaf P, Wondimu A, Mosselman JJ, de Jong R, Abogunrin S, Queiros L, Lane M, Postma MJ, Boersma C, van der Schans J. Hanegraaf P, et al. BMJ Open. 2024 Mar 19;14(3):e076912. doi: 10.1136/bmjopen-2023-076912. BMJ Open. 2024. PMID: 38508610 Free PMC article. Review.
  • Rapid review methods series: Guidance on the use of supportive software. Affengruber L, Nussbaumer-Streit B, Hamel C, Van der Maten M, Thomas J, Mavergames C, Spijker R, Gartlehner G; Cochrane Rapid Reviews Methods Group. Affengruber L, et al. BMJ Evid Based Med. 2024 Jul 23;29(4):264-271. doi: 10.1136/bmjebm-2023-112530. BMJ Evid Based Med. 2024. PMID: 38242566 Free PMC article.
  • Machine Learning Methods for Systematic Reviews:: A Rapid Scoping Review. Roth S, Wermer-Colan A. Roth S, et al. Dela J Public Health. 2023 Nov 30;9(4):40-47. doi: 10.32481/djph.2023.11.008. eCollection 2023 Nov. Dela J Public Health. 2023. PMID: 38173960 Free PMC article.
  • Overcoming the challenges of using automated technologies for public health evidence synthesis. Hocking L, Parkinson S, Adams A, Molding Nielsen E, Ang C, de Carvalho Gomes H. Hocking L, et al. Euro Surveill. 2023 Nov;28(45):2300183. doi: 10.2807/1560-7917.ES.2023.28.45.2300183. Euro Surveill. 2023. PMID: 37943502 Free PMC article. Review.
  • How do search systems impact systematic searching? A qualitative study. Hickner A. Hickner A. J Med Libr Assoc. 2023 Oct 2;111(4):774-782. doi: 10.5195/jmla.2023.1647. J Med Libr Assoc. 2023. PMID: 37928121 Free PMC article.
  • Search in MeSH

Related information

  • Cited in Books

LinkOut - more resources

Full text sources.

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

DistillerSR Logo

Distiller SR: Literature Review Software

Smarter reviews: trusted evidence.

Securely automate every stage of your literature review to produce evidence-based research faster, more accurately, and more

transparently at scale.

Software Built for Every Stage of a Literature Review

DistillerSR automates the management of literature collection, screening, and assessment using AI and intelligent workflows. From a systematic literature review to a rapid review to a living review, DistillerSR makes any project simpler to manage and configure to produce transparent, audit-ready, and compliant results.

Literature Review Lifecycle, DistillerSR

Broader, Automated Literature Searches

Search more efficiently with DistillerSR’s integrations with data providers, such as PubMed, automatic review updates, and AI-powered duplicate detection and removal.

Search Screen Shot, DistillerSR

PubMed Integration

Automatic review updates.

Automatically import newly published references, always keeping literature reviews up-to-date with DistillerSR LitConnect .

Duplicate Detection

Detect and remove duplicate citations preventing skew and bias caused by studies included more than once.

Faster, More Effective Reference Screening

Reduce your screening burden by 60% with DistillerSR. Start working on later stages of your review sooner by finding relevant references faster and addressing conflicts more easily.

Screening Screen Shot. DistillerSR

AI-Powered Screening

Conflict resolution.

Automatically identifies conflicts and disagreements between literature reviewers for easy resolution.

AI Quality Check

Increase the thoroughness of your literature review by having AI double-check your exclusion decisions and validate your categorization of records with the help of DistillerSR AI Classifiers software module.

Cost-Effective Access  to Full-Text Documents

Ensure your literature review is always up-to-date with DistillerSR’s direct connections to full-text data sources, all the while lowering overall subscription costs.

DistillerSR Full Text Retrieval Screenshot

Ensure your review is always up-to-date with DistillerSR’s direct connections to full-text data sources, all the while lowering overall subscription costs.

Open Access Integrations

Automatically search for and upload full-text documents from PMC , and link directly to source material through DOI.org .

Copyright Compliant Bulk Search

Retrieve full-text articles for the lowest possible cost through Article Galaxy .

Ad-Hoc Document Retrieval

Leverage existing RightFind and Article Galaxy subscriptions, the open access Unpaywall plugin, and internal libraries to access copyright compliant documents.

Simple Yet Powerful Data-Extraction

Simplify data extraction through templates and configurable forms. Extract data easily with in-form validations and calculations, and easily capture repeating, complex data sets.

DistillerSR Data Extraction Screenshot

Cross-Review, Data Reuse

Prevent duplication of effort across your organization and reduce data extraction times with DistillerSR CuratorCR by easily reusing data across literature reviews.

Capturing Complex Output

Easily capture complex data, such as a variable number of time points across multiple studies in an easy-to-understand and ready-to-analyze way.

Smart Forms

Cut down on literature review data cleaning, data conversions, and effective measure calculations with input validation and built-in form calculations.

Automatic and Configurable Reporting

PRISMA 2020 Chart Example, DistillerSR

Customizable Reporting Engine

Build reports and schedule automated email updates to stakeholders. Integrate your data with third-party reporting applications and databases with DistillerSR API .

Auto-Generated Reports

Comprehensive audit-trail.

Automatically keeps track of every entry and decision providing transparency and reproducibility in your literature review.

Easy-to-use Literature Review Project Management

Facilitate project management throughout the literature review process with real-time user and project metric monitoring, reusable configurations, and granular user permissions.

DistillerSR Project Management Screenshot

Facilitate project management throughout the review process with real-time user and project metric monitoring, reusable configurations, and granular user permissions.

Real-Time User and Project Metrics

Monitor teams and literature review progress in real-time, improving management and quality oversight into projects.

Repeatable, Configurable Processes

Secure literature reviews.

Single sign-on (SSO) and fully configurable user roles and permissions simplify the literature reviewer experience while also ensuring data integrity and security .

I can’t think of a way to do reviews faster than with DistillerSR. Being able to monitor progress and collaborate with team members, no matter where they are located makes my life a lot easier.

DistillerSR Case Studies

Stryker Case Study, DistillerSR

Maple Health Group

An alligator on the water's surface

University of Florida

Distillersr frequently asked questions, what types of reviews can be done with distillersr systematic reviews, living reviews, rapid reviews, or clinical evaluation report (cer) literature reviews.

Literature reviews can be a very simple or highly complex process, and literature reviews can use a variety of methods for finding, assessing, and presenting evidence. We describe DistillerSR as a literature review software because it supports all types of reviews , from systematic reviews to rapid reviews, and from living reviews to CER literature reviews.

DistillerSR software is used by over 300 customers in many different industries to support their evidence generation initiatives, from guideline development to HEOR analysis to CERs to post-market surveillance (PMS) and pharmacovigilance.

What are some of DistillerSR’s capabilities that support conducting systematic reviews?

Systematic reviews are the gold standard of literature reviews that aim to identify and screen all evidence relating to a specific research question. DistillerSR facilitates systematic reviews through a configurable, transparent, reproducible process that makes it easy to view the provenance of every cell of data.

DistillerSR was originally designed to support systematic reviews. The software handles dual reviewer screening, conflict resolution, capturing exclusion reasons while you work, risk of bias assessments, duplicate detection, multiple database searches, and reporting templates such as PRISMA . DistillerSR can readily scale for systematic reviews of all sizes, supporting more than 700,000 references per project through a robust enterprise-grade technical architecture . Using software like DistillerSR makes conducting systematic reviews easier to manage and configure to produce transparent evidence-based research faster and more accurately.

How does DistillerSR support clinical evaluation reports (CERs) and performance evaluation reports (PERs) program management?

The new European Union Medical Device Regulation (EU-MDR) and In-Vitro Device Regulation (EU-IVDR) require medical device manufacturers to increase the frequency, traceability, and overall documentation for CERs in the MDR program or PERs in the IVDR counterpart. Literature review software is an ideal tool to help you comply with these regulations.

DistillerSR automates literature reviews to enable a more transparent, repeatable, and auditable process , enabling manufacturers to create and implement a standard framework for literature reviews. This framework for conducting literature reviews can then be incorporated into all CER and PER program management plans consistently across every product, division, and research group.

How can DistillerSR help rapid reviews?

DistillerSR AI is ideal to speed up the rapid review process without compromising on quality. The AI-powered screening enables you to find references faster by continuously reordering relevant references, resulting in accelerated screening. The AI can also double-check your exclusion decisions to ensure relevant references are not left out of the rapid review.

DistillerSR title screening functionality enables you to quickly perform title screening on large numbers of references.

Does DistillerSR support living reviews?

The short answer is yes. DistillerSR has multiple capabilities that automate living systematic reviews , such as automatically importing newly published references into your projects and notifying reviewers that there’s screening to do. You can also put reports on an automated schedule so you’re never caught off guard when important new data is collected.   These capabilities help ensure the latest research is included in your living systematic review and that your review is up-to-date. 

How can DistillerSR help ensure the accuracy of Literature and Systematic reviews?

The quality of systematic reviews is foundational to evidence-based research. However, quality may be compromised because systematic reviews – by their very nature – are often tedious and repetitive, and prone to human error. Tracking all review activity in systematic review software, like DistillerSR, and making it easy to trace the provenance of every cell of data, delivers total transparency and auditability into the systematic review process. DistillerSR enables reviewers to work on the same project simultaneously without the risk of duplicating work or overwriting each other’s results. Configurable workflow filters ensure that the right references are automatically assigned to the right reviewers, and DistillerSR’s cross-project dashboard allows reviewers to monitor to-do lists for all projects from one place.

Why should I add DistillerSR to my Literature and Systematic Review Toolbox and retire my current spreadsheet solution?

It’s estimated that 90% of spreadsheets contain formula errors and approximately 50% have material defects. These errors, coupled with the time and resources necessary to fix them, adversely impact the management of the systematic review process. DistillerSR software was specifically designed to address the challenges faced by systematic review authors, namely the ever-increasing volume of research to screen and extract, review bottlenecks, and regulatory requirements for auditability and transparency, as well as a tool for managing a remote global workforce. Efficiency, consistency, better collaboration, and quality control are just a few of the benefits you’ll get when you choose DistillerSR’s systematic review process over a manual spreadsheet tool for your reviews.

What is the role of AI in your systematic review process?

DistillerSR AI enables the automation of the logistic-heavy tasks involved in conducting a systematic literature review, such as finding references faster using AI to continuously reorder references based on relevance. Continuous AI Reprioritization uses machine learning to learn from the references you are including and excluding and automatically reorder the ones you have left to screen, putting the most pertinent references in front of you first. This means that you find included references much more quickly during the screening process. DistillerSR also uses classifiers , which use NLP to classify and process information in the systematic review.  DistillerSR can also increase the thoroughness of your systematic review by having AI double-check your exclusion decisions.

What about the security and scalability of systematic literature reviews done on DistillerSR?

DistillerSR builds security, scalability, and availability into everything we do, so you can focus on producing evidence-based research faster, more accurately, and more securely with our  systematic review software. We undergo an annual independent third-party audit and certify our products using the American Institute of Certified Public Accountants SOC 2 framework. In terms of scalability, systematic review projects in DistillerSR can easily handle a large number of references; some of our customers have over 700,000 references in their projects.

Do you offer any commitments on the frequency of new product and capability launches?

We pride ourselves on listening to and working with our customers to regularly introduce new capabilities that improve DistillerSR and the systematic review process. We plan on offering two major releases a year in addition to two minor feature enhancements. We notify customers in advance about upcoming releases, host webinars, develop tools and training to introduce the new capabilities, and provide extensive release notes for our reviewers.

I have a unique literature review protocol. Is your software configurable with my literature review data and process?

Configurability is one of the key foundations of DistillerSR software. In fact, with over 300 customers in many different industries, we have yet to see a literature review protocol that our software couldn’t handle. DistillerSR is a professional B2B SaaS company with an exceptional customer success team that will work with you to understand your unique requirements and systematic review process to get you started quickly. Our global support team is available 24/7 to help you.

Still unsure if DistillerSR will meet your systematic literature review requirements?

Adopting a new software is about more than just money. New software is also about commitment and trusting that the new platform will match your systematic review and scalability needs. We have resources to help you in your analysis and decision: check out the systematic review software checklist or the literature review software checklist .

Learn More About DistillerSR

Linkedin Icon

Automation of Systematic Literature Reviews: A Systematic Literature Review

  • Information and Software Technology 136:106589

Raymon van Dinter at Wageningen University & Research

  • Wageningen University & Research

Bedir Tekinerdogan at Wageningen University & Research

  • This person is not on ResearchGate, or hasn't claimed this research yet.

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations

No full-text available

Request Full-text Paper PDF

To read the full-text of this research, you can request a copy directly from the authors.

  • Daniel A. Keim

Maximilian T. Fischer

  • Fabian Neumann

Marcos Vido

  • Sergio Ricardo Lourenço
  • Mário Jorge Ferreira Rodrigues
  • Yuni Nustini

Agus Arwani Yusuf

  • Eva Budiana
  • Randy Ariyadita Putra
  • Elisenda Aguilera-Cora
  • Carlos Lopezosa

José Fernández-Cavia

  • João Montenegro

Andrea Pollastro

  • Supawanhar Supawanhar
  • Askani Askani

Rudi Hartono

  • Monica Carroll
  • Lucy Esteve
  • Karnika Singh
  • Zulhasni Abdul Rahim

Muhammad Saqib Iqbal

  • Liliana Bunescu
  • Andreea Mădălina Vârtei
  • Reny Rahmalina
  • Wawan Gunawan
  • Nasrullah Nasrullah
  • Taufik Al Wahyu

Omotosho Ademola

  • Kathija Yassim
  • ARTIF INTELL REV

Regina Ofori-Boateng

  • INT J LIFE CYCLE ASS

Roberta Di Bari

  • Nanda Rahayu Agustia
  • Salminawati
  • Fatimatuz Zahro
  • Mohammad Budiyanto

Fasih Bintang Ilhami

  • Jefferson Costa

Maisa Mendonça Silva

  • Loso Judijanto
  • Autumn R. Bernard
  • Mostafa S. A. ElSayed
  • Peter Lombaers
  • Jonathan de Bruin

Rens van de Schoot

  • Retantyo Wardoyo
  • Moh Edi Wibowo
  • Hardyanto Soebono

Ioanna Mouratiadou

  • Wojciech Kusa

Georgios Peikos

  • Moritz Staudinger

Allan Hanbury

  • Nurdiyanto Nurdiyanto
  • Muhammad Thoriqul Islam
  • Nawa Marjany
  • Hasbiyallah

Fitri April Yanti

  • J POWER SOURCES
  • Zhongwei Huang
  • Shuojin Yang

Yuntao Zou

  • Swarup Shankar
  • Sanjay Kirti
  • Melanie Lourens
  • J AM MED INFORM ASSN
  • Opeoluwa Akinseloyin

Xiaorui Jiang

  • Vasile Palade

Eugene Syriani

  • Aleksi Huotala

Miikka Kuutila

  • Mika Mäntylä
  • Deepen Chettri

Pritha Datta

  • David Tivey

Naga murali Eragamreddy

  • Moslehuddin Chowdhury Khaled

Tasnim Sultana

  • Tanbinatabassum

Jyothi Bhadran

  • Amrita Chakraborty

Arpan Kumar Kar

  • Kokkinaki Angelika
  • Tsakiris Theodoros
  • Soulla Louca
  • Saadya Chavan

Fabio De Felice

  • Antonella Petrillo

Sally Spencer

  • Santiago Gonzalez-Toral

Renán Freire

  • Morgan Lundy

Frank Webb

  • Johannes Wagner

Michelle Cawley

  • Renee Beardslee
  • Brandy Beverly
  • John Cowden

Mehwish Riaz

  • Matthew Michelson
  • Katja Reuter

Byron C Wallace

  • Iain J. Marshall

Annette M O'Connor

  • Guy Tsafnat
  • James Thomas
  • COMPUT ELECTRON AGR

Joep Tummers

  • Alexandra Bannach Brown

Malcolm Robert Macleod

  • Allard J. van Altena

René Spijker

  • Ivonne von Nostitz-Wallwitz

Thomas Leich

  • SOFTWARE QUAL J

Havva Gulay Gurbuz

  • Ian Shemilt

Nada F Khan

  • Kazuma Hashimoto

Makoto Miwa

  • Lionel L. Bañez
  • Elise Berliner

Stacey Uhl

  • John McNaught

Alison O'Mara-Eves

  • Mark D. Huffman
  • J BIOMED INFORM

Catherine Blake

  • Pearl Brereton

Miew Keen Choong

  • Harrisen Scells

Guido Zuccon

  • Bevan Koopman

Babatunde Kazeem Olorisade

  • Alexis Langlois

Jian-yun Nie

  • I Wedel-Heinen

Stijn Jaspers

  • John F Hurdle
  • Duy Duc An Bui

Guilherme Del Fiol

  • J MACH LEARN RES
  • Byron C. Wallace
  • Joël Kuiper
  • Aakash Sharma

Ed De Quincey

  • Hayda Almeida

Marie-Jean Meurs

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

automation tools for literature review

  • Help Center

GET STARTED

Rayyan

COLLABORATE ON YOUR REVIEWS WITH ANYONE, ANYWHERE, ANYTIME

Rayyan for students

Save precious time and maximize your productivity with a Rayyan membership. Receive training, priority support, and access features to complete your systematic reviews efficiently.

Rayyan for Librarians

Rayyan Teams+ makes your job easier. It includes VIP Support, AI-powered in-app help, and powerful tools to create, share and organize systematic reviews, review teams, searches, and full-texts.

Rayyan for Researchers

RESEARCHERS

Rayyan makes collaborative systematic reviews faster, easier, and more convenient. Training, VIP support, and access to new features maximize your productivity. Get started now!

Over 1 billion reference articles reviewed by research teams, and counting...

Intelligent, scalable and intuitive.

Rayyan understands language, learns from your decisions and helps you work quickly through even your largest systematic literature reviews.

WATCH A TUTORIAL NOW

Solutions for Organizations and Businesses

automation tools for literature review

Rayyan Enterprise and Rayyan Teams+ make it faster, easier and more convenient for you to manage your research process across your organization.

  • Accelerate your research across your team or organization and save valuable researcher time.
  • Build and preserve institutional assets, including literature searches, systematic reviews, and full-text articles.
  • Onboard team members quickly with access to group trainings for beginners and experts.
  • Receive priority support to stay productive when questions arise.
  • SCHEDULE A DEMO
  • LEARN MORE ABOUT RAYYAN TEAMS+

RAYYAN SYSTEMATIC LITERATURE REVIEW OVERVIEW

automation tools for literature review

LEARN ABOUT RAYYAN’S PICO HIGHLIGHTS AND FILTERS

automation tools for literature review

Join now to learn why Rayyan is trusted by already more than 500,000 researchers

Individual plans, teams plans.

For early career researchers just getting started with research.

Free forever

  • 3 Active Reviews
  • Invite Unlimited Reviewers
  • Import Directly from Mendeley
  • Industry Leading De-Duplication
  • 5-Star Relevance Ranking
  • Advanced Filtration Facets
  • Mobile App Access
  • 100 Decisions on Mobile App
  • Standard Support
  • Revoke Reviewer
  • Online Training
  • PICO Highlights & Filters
  • PRISMA (Beta)
  • Auto-Resolver 
  • Multiple Teams & Management Roles
  • Monitor & Manage Users, Searches, Reviews, Full Texts
  • Onboarding and Regular Training

Professional

For researchers who want more tools for research acceleration.

per month, billed annually

  • Unlimited Active Reviews
  • Unlimited Decisions on Mobile App
  • Priority Support
  • Auto-Resolver

For currently enrolled students with valid student ID.

per month, billed quarterly

For a team that wants professional licenses for all members.

per month, per user, billed annually

  • Single Team
  • High Priority Support

For teams that want support and advanced tools for members.

  • Multiple Teams
  • Management Roles

For organizations who want access to all of their members.

Annual Subscription

Contact Sales

  • Organizational Ownership
  • For an organization or a company
  • Access to all the premium features such as PICO Filters, Auto-Resolver, PRISMA and Mobile App
  • Store and Reuse Searches and Full Texts
  • A management console to view, organize and manage users, teams, review projects, searches and full texts
  • Highest tier of support – Support via email, chat and AI-powered in-app help
  • GDPR Compliant
  • Single Sign-On
  • API Integration
  • Training for Experts
  • Training Sessions Students Each Semester
  • More options for secure access control

———————–

ANNUAL ONLY

Rayyan Subscription

membership starts with 2 users. You can select the number of additional members that you’d like to add to your membership.

Total amount:

Click Proceed to get started.

Great usability and functionality. Rayyan has saved me countless hours. I even received timely feedback from staff when I did not understand the capabilities of the system, and was pleasantly surprised with the time they dedicated to my problem. Thanks again!

This is a great piece of software. It has made the independent viewing process so much quicker. The whole thing is very intuitive.

Rayyan makes ordering articles and extracting data very easy. A great tool for undertaking literature and systematic reviews!

Excellent interface to do title and abstract screening. Also helps to keep a track on the the reasons for exclusion from the review. That too in a blinded manner.

Rayyan is a fantastic tool to save time and improve systematic reviews!!! It has changed my life as a researcher!!! thanks

Easy to use, friendly, has everything you need for cooperative work on the systematic review.

Rayyan makes life easy in every way when conducting a systematic review and it is easy to use.

Automation in business research: systematic literature review

  • Original Article
  • Published: 01 August 2023
  • Volume 21 , pages 675–698, ( 2023 )

Cite this article

automation tools for literature review

  • Samer Elhajjar 1 ,
  • Laurent Yacoub 2 &
  • Hala Yaacoub 3  

1279 Accesses

Explore all metrics

Automation has profoundly transformed the operational landscape of companies across various industries. As organizations strive to adapt to this rapidly evolving technology, it becomes crucial for practitioners worldwide to identify the most suitable automation tools and solutions for their unique business needs. A systematic literature review serves as a valuable tool to gain a deeper understanding of the historical context of automation and to explore previous findings in this field. This study aims to provide an extensive literary overview of the history of automation spanning the years from 1966 to 2021. In this research, a combination of bibliometric, conceptual, and theoretical network analysis methodologies are employed, with the aid of VOSviewer software, to analyze and visualize the patterns within the existing body of automation literature. By utilizing bibliometric analysis, this study will map the key scholarly contributions and identify the main research themes and concepts. The findings of this systematic literature review will provide insights into the historical progression of automation research and its interdisciplinary nature, highlighting the significant milestones, emerging trends, and knowledge gaps in the field. Building upon these findings, the study will propose a research agenda to advance the scholarly debate on automation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

automation tools for literature review

Similar content being viewed by others

automation tools for literature review

Trust and Automation- A Systematic Literature Review

automation tools for literature review

Analysis of the scientific knowledge structure on automation in the wine industry: a bibliometric and systematic review

automation tools for literature review

Frontiers of business intelligence and analytics 3.0: a taxonomy-based literature review and research agenda

Explore related subjects.

  • Artificial Intelligence

Ajzen I (1991) The theory of planned behavior. Organ Behav Hum Decis Process 50(2):179–211

Article   Google Scholar  

Altaf Y (2022) An overview of AI-powered marketing automation. Forbes Magazine. https://www.forbes.com/sites/forbestechcouncil/2022/10/26/an-overview-of-ai-powered-marketing-automation/?sh=4c5bfa2d6e83

Anyanwu K, Sheth A, Cardoso J, Miller J, Kochut K (2003) Healthcare enterprise process development and integration. J Res Pract Inf Technol 35(2):83–98

Google Scholar  

Avlonitis G, Pangopoulous N (2010) Effective implementation of sales-based CRM systems: theoretical and practical issues. Int J Custom Relat Market Manag 1(1):1–15

Baker D (2014) Social influence and contextual utilization of customer relationship management technology in an international field sales organization. J Relat Market 13:263–285

Balakrishnan J, Dwivedi YK (2021) Role of cognitive absorption in building user trust and experience. Psychol Mark 38(4):643–668

Batova M, Baranov V, Mayorov S (2021) Automation of economic activity management of high-tech structures of innovation-oriented clusters. J Ind Integr Manag 6(01):15–30

Becker J, Delfmann P, Eggert M, Schwittay S (2012) Generalizability and applicability of model-based business process compliance-checking. Bus Res off Open Access J VHB 5(2):221–247

Becker J, Delfmann P, Dietrich HA, Steinhorst M, Eggert M (2016) Business process compliance checking—applying and evaluating a generic pattern matching approach for conceptual models in the financial sector. Inf Syst Front 18(2):359–405

Benner MJ (2017) Industry 4.0: managing the digital transformation. In: Proceedings of the 2017 international conference on manufacturing and engineering science. IEEE, pp 43–47. https://doi.org/10.1109/ICMES.2017.33

Biegel B (2009) The current view and outlook for the future of marketing automation. J Direct Data Digit Mark Pract 10(3):201–213

Bortolotti T, Romano P (2012) ‘Lean first, then automate’: a framework for process improvement in pure service companies. A case study. Prod Plann Control 23(7):513–522

Brintrup A, Ranasinghe D, McFarlane D (2010) RFID opportunity analysis for leaner manufacturing. Int J Prod Res 48(9):2745–2764. https://doi.org/10.1080/00207540903156517

Büchi G, Cugno M, Castagnoli R (2020) Smart factory performance and industry 4.0. Technol Forecast Soc Change 150:119790. https://doi.org/10.1016/j.techfore.2019.119790

Byrd TA, Pitts JP, Adrian AM, Davidson NW (2008) Examination of a path model relating information technology infrastructure with firm performance. J Bus Logist 29(2):161–187

Cao M, Luo X, Luo XR, Dai X (2015) Automated negotiation for e-commerce decision making: a goal deliberated agent architecture for multi-strategy selection. Decis Support Syst 73:1–14

Castro-Lacouture D, Skibniewski MJ (2003) Applicability of E-work models for the automation of construction materials management systems. Prod Plan Control 14(8):789–797. https://doi.org/10.1080/09537280310001647869

Chang SE, Chen Y, Lu M (2019) Supply chain re-engineering using blockchain technology: a case of smart contract-based tracking process. Technol Forecast Soc Change 144:1–11

Chao G, Hurst E, Shockley R (2018) The evolution of process automation: moving beyond basic robotics to intelligent interactions. IBM Institute for Business Value. chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/ https://www.ibm.com/downloads/cas/QAQMRGVN

McKinsey & Company (2021) The automation imperative for business operations: getting the most out of your automation investments. Retrieved from https://www.mckinsey.com/business-functions/operations/our-insights/the-automation-imperative-for-business-operations-getting-the-most-out-of-your-automation-investments

Cristani M, Bertolaso A, Scannapieco S, Tomazzoli C (2018) Future paradigms of automated processing of business documents. Int J Inf Manag 40:67–75

Dash R, McMurtrey M, Rebman C, Kar UK (2019) Application of artificial intelligence in automation of supply chain management. J Strateg Innov Sustain 14(3):43–53

Davenport TH (2018) From analytics to artificial intelligence. J Bus Anal 1(2):73–80

Davenport TH, Kirby J (2015) Beyond automation. Harv Bus Rev 93(6):58–65

Davenport T, Guha A, Grewal D, Bressgott T (2020) How artificial intelligence will change the future of marketing. J Acad Mark Sci 48:24–42

Davidrajuh R (2003) Modeling and implementation of supplier selection procedures for e-commerce initiatives. Ind Manag Data Syst 103(1):28–38

De Reuver M, Bouwman H, Haaker T (2013) Business model roadmapping: a practical approach to come from an existing to a desired business model. Int J Innov Manag 17(01):1340006

Dey S, Das A (2019) Robotic process automation: assessment of the technology for transformation of business processes. Int J Bus Process Integr Manag 9(3):220–230

Ding B, Ferras Hernandez X, Agell Jane N (2021) Combining lean and agile manufacturing competitive advantages through Industry 4.0 technologies: an integrative approach. Prod Plann Control 1–17:442–458

Du TC (2009) Building an automatic e-tendering system on the semantic web. Decis Support Syst 47(1):13–21

Dunk AS (1992) Reliance on budgetary control, manufacturing process automation and production subunit performance: a research note. Acc Organ Soc 17(3–4):195–203. https://doi.org/10.1016/0361-3682(92)90020-s

Eck NJV, Waltman L (2009) How to normalize cooccurrence data? An analysis of some well-known similarity measures. J Am Soc Inform Sci Technol 60(8):1635–1651

Emerald L (1993) Industrial management & data systems. Ind Manag Data Syst 93(5):1–31. https://doi.org/10.1108/eb057531

Emerald L (2008) Operations and production management. Int J Oper Prod Manag. https://doi.org/10.1108/ijopm.2008.02428eaf.001

Engle RL, Barnes ML (2020) Supply chain re-engineering using blockchain technology: a case of smart contract-based tracking process. J Bus Ind Market 15(4):216–241. https://doi.org/10.1108/08858620010335083

Ferreira C, Robertson J (2020) Examining the boundaries of entrepreneurial marketing: a bibliographic analysis. J Res Mark Entrepreneurship 22(2):161–180

Frey CB, Osborne MA (2017) The future of employment: how susceptible are jobs to computerisation?. Technol Forecast Soc Chang 114:254–280

Galati F, Bigliardi B (2019) Industry 4.0: emerging themes and future research avenues using a text mining approach. Comput Ind 109:100–113

Galende J (2006) Analysis of technological innovation from business economics and management. Technovation 26(3):300–311. https://doi.org/10.1016/j.technovation.2005.04.006

Gozman D, Currie W (2014) The role of Investment Management Systems in regulatory compliance: a Post-Financial Crisis study of displacement mechanisms. J Inf Technol 29(1):44–58

Heinrich B, Klier M (2015) Metric-based data quality assessment—developing and evaluating a probability-based currency metric. Decis Support Syst 72:82–96

Helo P, Szekely B (2005) Logistics information systems: an analysis of software solutions for supply chain co-ordination. Ind Manag Data Syst 105:5–18

Herm LV, Janiesch C, Helm A, Imgrund F, Hofmann A, Winkelmann A (2022) A framework for implementing robotic process automation projects. Inf Syst e-Bus Manag 21:1–35

Hernandez-de-Menendez M, Morales-Menendez R, Escobar CA, McGovern M (2020) Competencies for industry 4.0. Int J Interact Des Manuf 14(4):1511–1524

Hitomi K (1994) Automation—its concept and a short history. Technovation 14(2):121–128

Hoque Z (2000) Just-in-time production, automation, cost allocation practices and importance of cost information: an empirical investigation in New Zealand-based manufacturing organizations. Br Account Rev 32(2):133–159. https://doi.org/10.1006/bare.1999.0125

Huang MH, Rust RT (2021) A strategic framework for artificial intelligence in marketing. J Acad Mark Sci 49(1):30–50

Huang R, Xi L, Lee J, Liu CR (2005) The framework, impact and commercial prospects of a new predictive maintenance system: intelligent maintenance system. Prod Plan Control 16(7):652–664. https://doi.org/10.1080/09537280500205837

Hutchison J, Das SR (2007) Examining a firm’s decisions with a contingency framework for manufacturing flexibility. Int J Oper Prod Manag 27(2):159–180. https://doi.org/10.1108/01443570710720603

Janssen CP, Donker SF, Kun AL (2019) History and future of human–automation interaction. Int J Hum Comput Stud 31:99–107

Järvinen J, Taiminen H (2016) Harnessing marketing automation for B2B content marketing. Ind Mark Manag 54:164–175. https://doi.org/10.1016/j.indmarman.2015.07.002

Johannsen F, Schaller D, Klus MF (2021) Value propositions of chatbots to support innovation management processes. IseB 19(1):205–246

Johnson CD, Bauer BC, Niederman F (2021) The automation of management and business science. Acad Manag Perspect 35(2):292–309

Kahle JH, Marcon É, Ghezzi A, Frank AG (2020) Smart products value creation in SMEs innovation ecosystems. Technol Forecast Soc Change 156:120024. https://doi.org/10.1016/j.techfore.2020.120024

Kaplan M (1998) Decision theory as philosophy. Cambridge University Press

Kazancoglu Y, Sezer MD, Ozkan-Ozen YD, Mangla SK, Kumar A (2021) Industry 4.0 impacts on responsible environmental and societal management in the family business. Technol Forecast Soc Change 173:121108

Kokemüller J, Roßnagel H (2012) Secure mobile sales force automation: the case of independent sales agencies. IseB 10(1):117–133

Kong D, Zhou Y, Liu Y, Xue L (2017) Using the data mining method to assess the innovation gap: a case of industrial robotics in a catching-up country. Technol Forecast Soc Change 119:80–97

Kotorov R (2003) Customer relationship management: strategic lessons and future directions. Bus Process Manag J 9:566–571

Kowalkowski C, Kindström D, Gebauer H (2013) ICT as a catalyst for service business orientation. J Bus Ind Market 28:506–513

Kumar A, Zhao JL (1999) Dynamic routing and operational controls in workflow management systems. Manag Sci 45(2):253–272. https://doi.org/10.1287/mnsc.45.2.253

Kumar KD, Karunamoorthy L, Roth H, Mirnalinee TT (2005) Computers in manufacturing: towards successful implementation of integrated automation system. Technovation 25(5):477–488

Kunz WH, Heinonen K, Lemmink JG (2019) Future service technologies: is service research on track with business reality? J Serv Market 33:479–487

Kushida KE, Murray J, Zysman J (2015) Cloud computing: from scarcity to abundance. J Ind Compet Trade 15(1):5–19. https://doi.org/10.1007/s10842-014-0188-y

Landry T, Arnold JT, Arndt A (2005) A compendium of sales related literature in customer relationship management: processes and technologies with managerial implications. J Person Sell Sales Manag 25(3):231–251

Langer M, Landers RN (2021) The future of artificial intelligence at work: a review on effects of decision automation and augmentation on workers targeted by algorithms and third-party observers. Comput Hum Behav 123:106878

Lee PC, Su HN (2010) Investigating the structure of regional innovation system research through keyword co-occurrence and social network analysis. Innovation 12(1):26–40

Lichtenberg FR (1995) The output contributions of computer equipment and personnel: a firm-level analysis. Econ Innov New Technol 3(3–4):201–218. https://doi.org/10.1080/10438599500000003

Macedo de Morais R, Kazan S, Ines S, de Padua D, Lucirton Costa A (2014) An analysis of BPM lifecycles: from a literature review to a framework proposal. Bus Process Manag J 20(3):412–432

Makarius EE, Mukherjee D, Fox JD, Fox AK (2020) Rising with the machines: a sociotechnical framework for bringing artificial intelligence into the organization. J Bus Res 120:262–273

Marengo L (2019) Is this time different? A note on automation and labour in the fourth industrial revolution. J Ind Bus Econ 46(3):323–331

Margherita A (2014) Business process management system and activities: two integrative definitions to build an operational body of knowledge. Bus Process Manag J 20(5):642–662

Mariani MM, Perez-Vega R, Wirtz J (2022) AI in marketing, consumer research and psychology: a systematic literature review and research agenda. Psychol Mark 39(4):755–776

McGoldrick PJ, Keeling KA, Beatty SF (2008) A typology of roles for avatars in online retailing. J Mark Manag 24(3–4):433–461

Mendes C, Osaki R, Da Costa C (2017) Internet of things in automated production. Eur J Eng Technol Res 2(10):13–16

Millen R, Sohal AS (1998) Planning processes for advanced manufacturing technology by large American manufacturers. Technovation 18(12):741–750. https://doi.org/10.1016/s0166-4972(98)00068-6

Moher D, Liberati A, Tetzlaff J, Altman DG (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Ann Internal Med 151(4):264–269

Morgan A, Inks S (2001) Technology and the sales force: increasing acceptance of sales force automation. Ind Market Manag 30:463–472

Ndede-Amadi AA (2004) What strategic alignment, process redesign, enterprise resource planning, and e-commerce have in common: enterprise-wide computing. Bus Process Manag J 10:184–199

Nissen M (1998) Redesigning reengineering through measurement-driven inference. MIS Q 22(4):509–534

Nissen ME, Sengupta K (2006) Incorporating software agents into supply chains: experimental investigation with a procurement task. MIS Q 30(1):145. https://doi.org/10.2307/25148721

Owen G (2013) Applications of game theory to economics. Int Game Theory Rev 15(03):1340011

Paim R, Caulliraux HM, Cardoso R (2008) Process management tasks: a conceptual and practical view. Bus Process Manag J 14(15):694–723

Parthasarthy R, Sethi SP (1992) The impact of flexible automation on business strategy and organizational structure. Acad Manag Rev 17(1):86. https://doi.org/10.2307/258649

Paul J, Criado AR (2020) The art of writing literature review: What do we know and what do we need to know? Int Bus Rev 29(4):101717

Paul J, Feliciano-Cestero MM (2021) Five decades of research on foreign direct investment by MNEs: an overview and research agenda. J Bus Res 124:800–812

Peacock E, Tanniru M (2005) Activity-based justification of IT investments. Inf Manag 42(3):415–424. https://doi.org/10.1016/j.im.2003.12.015

Pickering C, Byrne J (2014) The benefits of publishing systematic quantitative literature reviews for PhD candidates and other early-career researchers. High Educ Res Dev 33(3):534–548

Pizzi G, Scarpi D, Pantano E (2021) Artificial intelligence and the new forms of interaction: who has the control when interacting with a chatbot?. J Bus Res 129:878–890

Radhakrishnan S, Erbis S, Isaacs JA, Kamarthi S (2017) Novel keyword co-occurrence network-based methods to foster systematic reviews of scientific literature. PLoS ONE 12(3):e0172778

Radziwon A, Bogers M, Bilberg A (2017) Creating and capturing value in a regional innovation ecosystem: a study of how manufacturing SMEs develop collaborative solutions. Int J Technol Manag 75(1–4):73. https://doi.org/10.1504/ijtm.2017.085694

Rocchi P (2000) Technology+ Culture. IOS Press

Russel J, Pinilla-Redondo R, Mayo-Muñoz D, Shah SA, Sørensen SJ (2020) CRISPRCasTyper: automated identification, annotation, and classification of CRISPR-Cas Loci. CRISPR J 3(6):462–469

Saeed TU, Burris MW, Labi S, Sinha KC (2020) An empirical discourse on forecasting the use of autonomous vehicles using consumers’ preferences. Technol Forecast Soc Change 158:120130. https://doi.org/10.1016/j.techfore.2020.120130

Santos F, Pereira R, Vasconcelos JB (2020) Toward robotic process automation implementation: an end-to-end perspective. Bus Process Manag J 26(2):405–420

Schippers WA (2001) An integrated approach to process control. Int J Prod Econ 69(1):93–105. https://doi.org/10.1016/s0925-5273(98)00243-6

Sedighi M (2016) Application of word co-occurrence analysis method in mapping of the scientific fields (case study: the field of infometrics). Lib Rev 65(1/2):52–64

Sia SK, Neo BS (1997) Reengineering effectiveness and the redesign of organizational control: a case study of the inland revenue authority of Singapore. J Manag Inf Syst 14(1):69–92

Silva SC, Corbo L, Vlačić B, Fernandes M (2021) Marketing accountability and marketing automation: evidence from Portugal. EuroMed J Bus 18:145–164

Simchi-Levi D, Wu MX (2018) Powering retailers’ digitization through analytics and automation. Int J Prod Res 56(1–2):809–816

Singhal K, Singhal J, Starr MK (2007) The domain of production and operations management and the role of Elwood Buffa in its delineation. J Oper Manag 25(2):310–327. https://doi.org/10.1016/j.jom.2006.06.004

Sipser M (1996) Introduction to the theory of computation. ACM SIGACT News 27(1):27–29

Speier C, Venkatesh V (2002) The hidden minefields in the adoption of sales force automation technologies. J Mark 66:98–111

Sun S, Zhao L (2013) Formal workflow design analytics using data flow modeling. Decis Support Syst 55:270–283

Thaichon P, Surachartkumtonkun J, Quach S, Weaven S, Palmatier R (2018) Hybrid sales structures in the age of e-commerce. J Person Sell Sales Manag 38(3):277–302

Thakkar D, Kumar N, Sambasivan N (2020) Towards an AI-powered future that works for vocational workers. In: proceedings of the 2020 CHI Conference on Human Factors in Computing Systems pp. 1–13

Torres R, Sidorova A (2015) The effect of business process configurations on user motivation. Bus Process Manag J 21:541–563

Tseng S, Won Y (2016) Integrating multiple recommendation schemes for designing sales force support system: a travel agency example. Int J Electron Bus 13(1):1–37

Udell M, Stehel V, Kliestik T, Kliestikova J, Durana P (2019) Towards a smart automated society: cognitive technologies, knowledge production, and economic growth. Econ Manag Financ Mark 14(1):44. https://doi.org/10.22381/emfm14120195

Van Eck N, Waltman L (2010) Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 84(2):523–538

Velcu O (2010) Strategic alignment of ERP implementation stages: an empirical investigation. Inf Manag 47(3):158–166

vom Brocke J, Zelt S, Schmiedel T (2016) On the role of context in business process management. Int J Inf Manag 36(3):486–495

Vom Brocke J, Maaß W, Buxmann P, Maedche A, Leimeister JM, Pecht G (2018) Future work and enterprise systems. Bus Inf Syst Eng 60(4):357–366

Wang B, Hu SJ, Sun L, Freiheit T (2020) Intelligent welding system technologies: state-of-the-art review and perspectives. J Manuf Syst 56:373–391

Wesche JS, Sonderegger A (2019) When computers take the lead: the automation of leadership. Comput Hum Behav 101:197–209

West DM, Allen JR (2018) How artificial intelligence is transforming the world. Brookings Media

Yin J, Wei S, Chen X, Wei J (2020) Does it pay to align a firm’s competitive strategy with its industry IT strategic role? Inf Manag 57(8):103391

Zhao D, Strotmann A (2015) Analysis and visualization of citation networks. Synth Lect Inf Concepts Retriev Serv 7(1):1–207

Zur Muehlen M (2004) Organizational management in workflow applications-issues and perspectives. Inf Technol Manag 5(3):271–291

Download references

Author information

Authors and affiliations.

National University of Singapore Business School, Singapore, Singapore

Samer Elhajjar

Holy Spirit University of Kaslik, Kaslik, Lebanon

Laurent Yacoub

University of Balamand, Balamand, Lebanon

Hala Yaacoub

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Samer Elhajjar .

Ethics declarations

Conflict of interest.

The authors declared that they have no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Elhajjar, S., Yacoub, L. & Yaacoub, H. Automation in business research: systematic literature review. Inf Syst E-Bus Manage 21 , 675–698 (2023). https://doi.org/10.1007/s10257-023-00645-z

Download citation

Received : 03 August 2022

Revised : 15 June 2023

Accepted : 28 June 2023

Published : 01 August 2023

Issue Date : September 2023

DOI : https://doi.org/10.1007/s10257-023-00645-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Business research
  • Systematic literature review
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. Automation tools can support different stages of systematic review

    automation tools for literature review

  2. Use AI to Start Your Literature Review in a second|| Paper Digest Literature Review Tool Tutorial

    automation tools for literature review

  3. Ace your research with these 5 literature review tools

    automation tools for literature review

  4. How to use AI tools for literature review

    automation tools for literature review

  5. 50 Best AI Tools for Writing Literature Review: Ultimate Guide 2024

    automation tools for literature review

  6. Mastering Systematic Literature Reviews with AI Tools

    automation tools for literature review

COMMENTS

  1. Tools to support the automation of systematic reviews: a scoping review

    Automatic systematic review tools can be categorised into several categories: visualization tools - tools that use active learning (a combination of a Natural Language Processing (NLP) technique, machine learning classifier, and human labour) and automated tools that employ an NLP and classifier but they use labelled documents and no human interaction during the learning process (Scott et al ...

  2. Tools to support the automation of systematic reviews: a scoping review

    Objective: The objectives of this scoping review are to identify the reliability and validity of the available tools, their limitations and any recommendations to further improve the use of these tools. Study design: A scoping review methodology was followed to map the literature published on the challenges and solutions of conducting evidence synthesis using the JBI scoping review methodology.

  3. Automation of systematic literature reviews: A systematic literature review

    In 2011, Thomas et al. [17] published a report that lists the application of text mining techniques to automate the systematic literature review process. In total, we have found 5 studies that reported text mining techniques and tools to automate a - part of - the systematic review process [12, [17], [18], [19], [20]].Tsafnat et al. [21] describe each step in the systematic review process ...

  4. Data extraction methods for systematic review (semi)automation: Update

    12 Schmidt et al. 13 published a narrative review of tools with a focus on living systematic review automation. They discuss tools that automate or support the constant literature retrieval that is the hallmark of LSRs, while well-integrated (semi) automation of data extraction and automatic dissemination or visualisation of results between ...

  5. An open source machine learning framework for efficient and ...

    It is a challenging task for any research field to screen the literature and determine what needs to be included in a systematic review in a transparent way. A new open source machine learning ...

  6. Toward systematic review automation: a practical guide to using machine

    Tools for screening are accessible via usable software platforms (Abstrackr, RobotAnalyst, and EPPI reviewer) and could safely be used now as a second screener or to prioritize abstracts for manual review. Data extraction tools are designed to assist the manual process, e.g. drawing the user's attention to relevant text or making suggestions ...

  7. Tools to support the automation of systematic reviews: a scoping review

    This review considered any automation tools that was used in the process of automation of systematic reviews. Tools were included if they were either freely available or subscription based. In addition, only tools that were readily available and published to the public with clear guidelines regarding their use were included. 2.1.2. Concept

  8. Artificial intelligence to automate the systematic review of scientific

    Marshall C, Brereton P (2013) Tools to support systematic literature reviews in software engineering: a mapping study. In: International symposium on empirical software engineering and measurement. p. 296-299. van Dinter R, Tekinerdogan B, Catal C (2021) Automation of systematic literature reviews: a systematic literature review.

  9. Tools to support the automation of systematic reviews: a scoping review

    AI-based automation tools exhibited promising but varying levels of accuracy and efficiency during the screening process of medical literature for conducting SRs in the cancer field. Until further progress is made and thorough evaluations are conducted, AI tools should be utilized as supplementary aids rather than complete substitutes for human ...

  10. Automation of systematic reviews of biomedical literature: a scoping

    Automation techniques are being developed for all SR stages, but with limited real-world adoption. Most SR automation tools target single SR stages, with modest time savings for the entire SR process and varying sensitivity and specificity across studies. ... Automation of systematic reviews of biomedical literature: a scoping review of studies ...

  11. 5 software tools to support your systematic review processes

    DistillerSR is an online software maintained by the Canadian company, Evidence Partners which specialises in literature review automation. DistillerSR provides a collaborative platform for every stage of literature review management. The framework is flexible and can accommodate literature reviews of different sizes.

  12. An Automated Literature Review Tool (LiteRev) for Streamlining and

    An Automated Literature Review Tool (LiteRev) for Streamlining and Accelerating Research Using Natural Language Processing and Machine Learning: Descriptive Performance Evaluation Study. Monitoring Editor: Tiffany Leung. ... The systematic review tool maintains a searchable database of tools that can be used to assist in many aspects of LR ...

  13. Automate your literature review with AI

    Best AI Tools for Literature Review. Since generative AI and ChatGPT came into the picture, there are heaps of AI tools for literature review available out there. Some of the most comprehensive ones are: SciSpace. SciSpace is a valuable tool to have in your arsenal. It has a repository of 270M+ papers and makes it easy to find research articles.

  14. Automating Systematic Literature Review

    Systematic literature reviews (SLRs) have become the foundation of evidence-based software engineering (EBSE). Conducting an SLR is largely a manual process. In the past decade, researchers have made major advances in automating the SLR process, aiming to reduce the workload and effort for conducting high-quality SLRs in software engineering (SE).

  15. Systematic Review and Literature Review Software by DistillerSR

    Get Started. The DistillerSR platform automates the conduct and management of literature reviews so you can deliver better research faster, more accurately and cost-effectively. DistillerSR's highly configurable, AI-enabled workflow streamlines the entire literature review lifecycle, allowing you to make more informed evidence-based health ...

  16. Usage of automation tools in systematic reviews

    We found that automation tools are currently not widely used among the participants. When tools are used, participants mostly learn about them from their environment, for example, through colleagues, peers, or organization. Tools are often chosen on the basis of user experience, either by own experience or from colleagues or peers.

  17. Systematic Review Software

    DistillerSR automates the management of literature collection, screening, and assessment using AI and intelligent workflows. From a systematic literature review to a rapid review to a living review, DistillerSR makes any project simpler to manage and configure to produce transparent, audit-ready, and compliant results. Search.

  18. Automation of Systematic Literature Reviews: A Systematic Literature Review

    In a recent systematic literature review on automation in SRs, van Dinter et al. [55] identified 41 relevant studies, primarily in the fields of medicine and software engineering. Their findings ...

  19. Automation of systematic literature reviews: A systematic literature review

    This paper performs a systematic literature review (SLR) on the automation of SLR studies to collect and summarize the current state-of-the-art that is needed to define a framework for further research activities. Table 1 lists the steps in the systematic review process, as proposed by [4]. Synonyms that were used in the literature were noted ...

  20. Rayyan

    Rayyan Enterprise and Rayyan Teams+ make it faster, easier and more convenient for you to manage your research process across your organization. Accelerate your research across your team or organization and save valuable researcher time. Build and preserve institutional assets, including literature searches, systematic reviews, and full-text ...

  21. Healthcare Automation: A Systematic Literature Review

    The literature review aims to condense findings surrounding pivotal aspects of research in this emerging area and recognize the various factors that are forming theoretical basis for further exploration based on healthcare automation. Analytical tools including Harzing's Publish or Perish, Web of Science, VOS Viewer, Vicinitas, and MAXQDA ...

  22. Automation in business research: systematic literature review

    Automation has profoundly transformed the operational landscape of companies across various industries. As organizations strive to adapt to this rapidly evolving technology, it becomes crucial for practitioners worldwide to identify the most suitable automation tools and solutions for their unique business needs. A systematic literature review serves as a valuable tool to gain a deeper ...

  23. Full article: State of the art and future directions of digital twin

    According to our understanding, this is the first literature review work that specifically focuses on DT-enabled smart assembly systems in the existing literature, providing a clear picture of the challenges and future opportunities in this research area. ... and structural integrity in DT-enabled assembly automation. Examples of tools include ...

  24. Process automation using RPA

    This work, through a literature review, aims to clarify the RPA's concept, the benefits found in this adoption, the main characteristics that the processes must have to be eligible, and the main barriers encountered for successful RPA adoption. In short, this preliminary literature review aims to contribute to the organization's clarification ...