• UNC Libraries
  • HSL Academic Process
  • Systematic Reviews
  • Step 7: Extract Data from Included Studies

Systematic Reviews: Step 7: Extract Data from Included Studies

Created by health science librarians.

HSL Logo

  • Step 1: Complete Pre-Review Tasks
  • Step 2: Develop a Protocol
  • Step 3: Conduct Literature Searches
  • Step 4: Manage Citations
  • Step 5: Screen Citations
  • Step 6: Assess Quality of Included Studies

About Step 7: Extract Data from Included Studies

About data extraction, select a data extraction tool, what should i extract, helpful tip- data extraction.

  • Data extraction FAQs
  • Step 8: Write the Review

  Check our FAQ's

   Email us

   Call (919) 962-0800

   Make an appointment with a librarian

  Request a systematic or scoping review consultation

In Step 7, you will skim the full text of included articles to collect information about the studies in a table format (extract data), to summarize the studies and make them easier to compare. You will: 

  • Make sure you have collected the full text of any included articles.
  • Choose the pieces of information you want to collect from each study.
  • Choose a method for collecting the data.
  • Create the data extraction table.
  • Test the data collection table (optional). 
  • Collect (extract) the data. 
  • Review the data collected for any errors. 

For accuracy, two or more people should extract data from each study. This process can be done by hand or by using a computer program. 

Click an item below to see how it applies to Step 7: Extract Data from Included Studies.

Reporting your review with PRISMA

If you reach the data extraction step and choose to exclude articles for any reason, update the number of included and excluded studies in your PRISMA flow diagram.

Managing your review with Covidence

Covidence allows you to assemble a custom data extraction template, have two reviewers conduct extraction, then send their extractions for consensus.

How a librarian can help with Step 7

A librarian can advise you on data extraction for your systematic review, including: 

  • What the data extraction stage of the review entails
  • Finding examples in the literature of similar reviews and their completed data tables
  • How to choose what data to extract from your included articles 
  • How to create a randomized sample of citations for a pilot test
  • Best practices for reporting your included studies and their important data in your review

In this step of the systematic review, you will develop your evidence tables, which give detailed information for each study (perhaps using a PICO framework as a guide), and summary tables, which give a high-level overview of the findings of your review. You can create evidence and summary tables to describe study characteristics, results, or both. These tables will help you determine which studies, if any, are eligible for quantitative synthesis.

Data extraction requires a lot of planning.  We will review some of the tools you can use for data extraction, the types of information you will want to extract, and the options available in the systematic review software used here at UNC, Covidence .

How many people should extract data?

The Cochrane Handbook and other studies strongly suggest at least two reviewers and extractors to reduce the number of errors.

  • Chapter 5: Collecting Data (Cochrane Handbook)
  • A Practical Guide to Data Extraction for Intervention Systematic Reviews (Covidence)

Click on a type of data extraction tool below to see some more information about using that type of tool and what UNC has to offer.

Systematic Review Software (Covidence)

Most systematic review software tools have data extraction functionality that can save you time and effort.  Here at UNC, we use a systematic review software called Covidence. You can see a more complete list of options in the Systematic Review Toolbox .

Covidence allows you to create and publish a data extraction template with text fields, single-choice items, section headings and section subheadings; perform dual and single reviewer data extraction ; review extractions for consensus ; and export data extraction and quality assessment to a CSV with each item in a column and each study in a row.

  • Covidence@UNC Guide
  • Covidence for Data Extraction (Covidence)
  • A Practical Guide to Data Extraction for Intervention Systematic Reviews(Covidence)

Spreadsheet or Database Software (Excel, Google Sheets)

You can also use spreadsheet or database software to create custom extraction forms. Spreadsheet software (such as Microsoft Excel) has functions such as drop-down menus and range checks can speed up the process and help prevent data entry errors. Relational databases (such as Microsoft Access) can help you extract information from different categories like citation details, demographics, participant selection, intervention, outcomes, etc.

  • Microsoft Products (UNC Information Technology Services)

Cochrane RevMan

RevMan offers collection forms for descriptive information on population, interventions, and outcomes, and quality assessments, as well as for data for analysis and forest plots. The form elements may not be changed, and data must be entered manually. RevMan is a free software download.

  • Cochrane RevMan 5.0 Download
  • RevMan for Non-Cochrane Reviews (Cochrane Training)

Survey or Form Software (Qualtrics, Poll Everywhere)

Survey or form tools can help you create custom forms with many different question types, such as multiple choice, drop downs, ranking, and more. Content from these tools can often be exported to spreadsheet or database software as well. Here at UNC we have access to the survey/form software Qualtrics & Poll Everywhere.

  • Qualtrics (UNC Information Technology Services)
  • Poll Everywhere (UNC Information Technology Services)

Electronic Documents or Paper & Pencil (Word, Google Docs)

In the past, people often used paper and pencil to record the data they extracted from articles. Handwritten extraction is less popular now due to widespread electronic tools. You can record extracted data in electronic tables or forms created in Microsoft Word or other word processing programs, but this process may take longer than many of our previously listed methods. If chosen, the electronic document or paper-and-pencil extraction methods should only be used for small reviews, as larger sets of articles may become unwieldy. These methods may also be more prone to errors in data entry than some of the more automated methods.

There are benefits and limitations to each method of data extraction.  You will want to consider:

  • The cost of the software / tool
  • Shareability / versioning
  • Existing versus custom data extraction forms
  • The data entry process
  • Interrater reliability

For example, in Covidence you may spend more time building your data extraction form, but save time later in the extraction process as Covidence can automatically highlight discrepancies for review and resolution between different extractors. Excel may require less time investment to create an extraction form, but it may take longer for you to match and compare data between extractors. More in-depth comparison of the benefits and limitations of each extraction tool can be found in the table below.

Benefits and Limitations of Data Extraction Tools
Systematic Review Software (Covidence) for UNC affiliates through UNC Libraries' subscription
Spreadsheets (Excel, Google Sheets)
Cochrane Revman
Survey or Form Software (Poll Everywhere, Qualtrics, etc.)
Electronic documents (Word, Google Docs)

Sample information to include in an extraction table

It may help to consult other similar systematic reviews to identify what data to collect or to think about your question in a framework such as PICO .

Helpful data for an intervention question may include:

  • Information about the article (author(s), year of publication, title, DOI)
  • Information about the study (study type, participant recruitment / selection / allocation, level of evidence, study quality)
  • Patient demographics (age, sex, ethnicity, diseases / conditions, other characteristics related to the intervention / outcome)
  • Intervention (quantity, dosage, route of administration, format, duration, time frame, setting)
  • Outcomes (quantitative and / or qualitative)

If you plan to synthesize data, you will want to collect additional information such as sample sizes, effect sizes, dependent variables, reliability measures, pre-test data, post-test data, follow-up data, and statistical tests used.

Extraction templates and approaches should be determined by the needs of the specific review.   For example, if you are extracting qualitative data, you will want to extract data such as theoretical framework, data collection method, or role of the researcher and their potential bias.

  • Supplementary Guidance for Inclusion of Qualitative Research in Cochrane Systematic Reviews of Interventions (Cochrane Collaboration Qualitative Methods Group)
  • Look for an existing extraction form or tool to help guide you.  Use existing systematic reviews on your topic to identify what information to collect if you are not sure what to do.
  • Train the review team on the extraction categories and what type of data would be expected.  A manual or guide may help your team establish standards.
  • Pilot the extraction / coding form to ensure data extractors are recording similar data. Revise the extraction form if needed.
  • Discuss any discrepancies in coding throughout the process.
  • Document any changes to the process or the form.  Keep track of the decisions the team makes and the reasoning behind them.
  • << Previous: Step 6: Assess Quality of Included Studies
  • Next: Step 8: Write the Review >>
  • Last Updated: May 16, 2024 3:24 PM
  • URL: https://guides.lib.unc.edu/systematic-reviews

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • BMJ Journals

You are here

  • Volume 26, Issue 3
  • Summarising good practice guidelines for data extraction for systematic reviews and meta-analysis
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0001-6589-5456 Kathryn S Taylor 1 ,
  • http://orcid.org/0000-0002-7791-8552 Kamal R Mahtani 1 ,
  • http://orcid.org/0000-0003-1139-655X Jeffrey K Aronson 2
  • 1 Nuffield Department of Primary Care Health Sciences , University of Oxford , Oxford , UK
  • 2 Centre for Evidence Based Medicine , University of Oxford , Oxford , UK
  • Correspondence to Dr Kathryn S Taylor, Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford OX2 6GG, UK; kathryn.taylor{at}phc.ox.ac.uk

https://doi.org/10.1136/bmjebm-2020-111651

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Data extraction is the process of a systematic review that occurs between identifying eligible studies and analysing the data, whether it can be a qualitative synthesis or a quantitative synthesis involving the pooling of data in a meta-analysis. The aims of data extraction are to obtain information about the included studies in terms of the characteristics of each study and its population and, for quantitative synthesis, to collect the necessary data to carry out meta-analysis. In systematic reviews, information about the included studies will also be required to conduct risk of bias assessments, but these data are not the focus of this article.

Following good practice when extracting data will help make the process efficient and reduce the risk of errors and bias. Failure to follow good practice risks basing the analysis on poor quality data, and therefore providing poor quality inputs, which will result in poor quality outputs, with unreliable conclusions and invalid study findings. In computer science, this is known as ‘garbage in, garbage out’ or ‘rubbish in, rubbish out’. Furthermore, providing insufficient information about the included studies for readers to be able to assess the generalisability of the findings from a systematic review will undermine the value of the pooled analysis. Such failures will cause your systematic review and meta-analysis to be less useful than it ought to be.

Some guidelines for data extraction are formal, including those described in the Cochrane Handbook for Systematic Reviews of Interventions, 1 the Cochrane Handbook for Diagnostic Test Accuracy Reviews, 2 3 the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) reporting guidelines for systematic reviews and their protocols 4–7 and other sources, 8 9 , formal guidelines are complemented with informal advice in the form of examples and videos on how to avoid possible pitfalls and guidance on …

Twitter @dataextips

Contributors KST and KRM conceived the idea of the series of which this is one part. KST wrote the first draft of the manuscript. All authors revised the manuscript and agreed the final version.

Funding This research was supported by the National Institute for Health Research Applied Research Collaboration Oxford and Thames Valley at Oxford Health NHS Foundation Trust.

Disclaimer The views expressed in this publication are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care.

Competing interests KRM and JKA were associate editors of BMJ Evidence Medicine at the time of submission.

Provenance and peer review Commissioned; internally peer reviewed.

Read the full text or download the PDF:

Jump to navigation

Home

Cochrane Training

Chapter 5: collecting data.

Tianjing Li, Julian PT Higgins, Jonathan J Deeks

Key Points:

  • Systematic reviews have studies, rather than reports, as the unit of interest, and so multiple reports of the same study need to be identified and linked together before or after data extraction.
  • Because of the increasing availability of data sources (e.g. trials registers, regulatory documents, clinical study reports), review authors should decide on which sources may contain the most useful information for the review, and have a plan to resolve discrepancies if information is inconsistent across sources.
  • Review authors are encouraged to develop outlines of tables and figures that will appear in the review to facilitate the design of data collection forms. The key to successful data collection is to construct easy-to-use forms and collect sufficient and unambiguous data that faithfully represent the source in a structured and organized manner.
  • Effort should be made to identify data needed for meta-analyses, which often need to be calculated or converted from data reported in diverse formats.
  • Data should be collected and archived in a form that allows future access and data sharing.

Cite this chapter as: Li T, Higgins JPT, Deeks JJ (editors). Chapter 5: Collecting data. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.4 (updated August 2023). Cochrane, 2023. Available from www.training.cochrane.org/handbook .

5.1 Introduction

Systematic reviews aim to identify all studies that are relevant to their research questions and to synthesize data about the design, risk of bias, and results of those studies. Consequently, the findings of a systematic review depend critically on decisions relating to which data from these studies are presented and analysed. Data collected for systematic reviews should be accurate, complete, and accessible for future updates of the review and for data sharing. Methods used for these decisions must be transparent; they should be chosen to minimize biases and human error. Here we describe approaches that should be used in systematic reviews for collecting data, including extraction of data directly from journal articles and other reports of studies.

5.2 Sources of data

Studies are reported in a range of sources which are detailed later. As discussed in Section 5.2.1 , it is important to link together multiple reports of the same study. The relative strengths and weaknesses of each type of source are discussed in Section 5.2.2 . For guidance on searching for and selecting reports of studies, refer to Chapter 4 .

Journal articles are the source of the majority of data included in systematic reviews. Note that a study can be reported in multiple journal articles, each focusing on some aspect of the study (e.g. design, main results, and other results).

Conference abstracts are commonly available. However, the information presented in conference abstracts is highly variable in reliability, accuracy, and level of detail (Li et al 2017).

Errata and letters can be important sources of information about studies, including critical weaknesses and retractions, and review authors should examine these if they are identified (see MECIR Box 5.2.a ).

Trials registers (e.g. ClinicalTrials.gov) catalogue trials that have been planned or started, and have become an important data source for identifying trials, for comparing published outcomes and results with those planned, and for obtaining efficacy and safety data that are not available elsewhere (Ross et al 2009, Jones et al 2015, Baudard et al 2017).

Clinical study reports (CSRs) contain unabridged and comprehensive descriptions of the clinical problem, design, conduct and results of clinical trials, following a structure and content guidance prescribed by the International Conference on Harmonisation (ICH 1995). To obtain marketing approval of drugs and biologics for a specific indication, pharmaceutical companies submit CSRs and other required materials to regulatory authorities. Because CSRs also incorporate tables and figures, with appendices containing the protocol, statistical analysis plan, sample case report forms, and patient data listings (including narratives of all serious adverse events), they can be thousands of pages in length. CSRs often contain more data about trial methods and results than any other single data source (Mayo-Wilson et al 2018). CSRs are often difficult to access, and are usually not publicly available. Review authors could request CSRs from the European Medicines Agency (Davis and Miller 2017). The US Food and Drug and Administration had historically avoided releasing CSRs but launched a pilot programme in 2018 whereby selected portions of CSRs for new drug applications were posted on the agency’s website. Many CSRs are obtained through unsealed litigation documents, repositories (e.g. clinicalstudydatarequest.com ), and other open data and data-sharing channels (e.g. The Yale University Open Data Access Project) (Doshi et al 2013, Wieland et al 2014, Mayo-Wilson et al 2018)).

Regulatory reviews such as those available from the US Food and Drug Administration or European Medicines Agency provide useful information about trials of drugs, biologics, and medical devices submitted by manufacturers for marketing approval (Turner 2013). These documents are summaries of CSRs and related documents, prepared by agency staff as part of the process of approving the products for marketing, after reanalysing the original trial data. Regulatory reviews often are available only for the first approved use of an intervention and not for later applications (although review authors may request those documents, which are usually brief). Using regulatory reviews from the US Food and Drug Administration as an example, drug approval packages are available on the agency’s website for drugs approved since 1997 (Turner 2013); for drugs approved before 1997, information must be requested through a freedom of information request. The drug approval packages contain various documents: approval letter(s), medical review(s), chemistry review(s), clinical pharmacology review(s), and statistical reviews(s).

Individual participant data (IPD) are usually sought directly from the researchers responsible for the study, or may be identified from open data repositories (e.g. www.clinicalstudydatarequest.com ). These data typically include variables that represent the characteristics of each participant, intervention (or exposure) group, prognostic factors, and measurements of outcomes (Stewart et al 2015). Access to IPD has the advantage of allowing review authors to reanalyse the data flexibly, in accordance with the preferred analysis methods outlined in the protocol, and can reduce the variation in analysis methods across studies included in the review. IPD reviews are addressed in detail in Chapter 26 .

MECIR Box 5.2.a Relevant expectations for conduct of intervention reviews

Examining errata ( )

Some studies may have been found to be fraudulent or may for other reasons have been retracted since publication. Errata can reveal important limitations, or even fatal flaws, in included studies. All of these may potentially lead to the exclusion of a study from a review or meta-analysis. Care should be taken to ensure that this information is retrieved in all database searches by downloading the appropriate fields together with the citation data.

5.2.1 Studies (not reports) as the unit of interest

In a systematic review, studies rather than reports of studies are the principal unit of interest. Since a study may have been reported in several sources, a comprehensive search for studies for the review may identify many reports from a potentially relevant study (Mayo-Wilson et al 2017a, Mayo-Wilson et al 2018). Conversely, a report may describe more than one study.

Multiple reports of the same study should be linked together (see MECIR Box 5.2.b ). Some authors prefer to link reports before they collect data, and collect data from across the reports onto a single form. Other authors prefer to collect data from each report and then link together the collected data across reports. Either strategy may be appropriate, depending on the nature of the reports at hand. It may not be clear that two reports relate to the same study until data collection has commenced. Although sometimes there is a single report for each study, it should never be assumed that this is the case.

MECIR Box 5.2.b Relevant expectations for conduct of intervention reviews

Collating multiple reports ( )

It is wrong to consider multiple reports of the same study as if they are multiple studies. Secondary reports of a study should not be discarded, however, since they may contain valuable information about the design and conduct. Review authors must choose and justify which report to use as a source for study results.

It can be difficult to link multiple reports from the same study, and review authors may need to do some ‘detective work’. Multiple sources about the same trial may not reference each other, do not share common authors (Gøtzsche 1989, Tramèr et al 1997), or report discrepant information about the study design, characteristics, outcomes, and results (von Elm et al 2004, Mayo-Wilson et al 2017a).

Some of the most useful criteria for linking reports are:

  • trial registration numbers;
  • authors’ names;
  • sponsor for the study and sponsor identifiers (e.g. grant or contract numbers);
  • location and setting (particularly if institutions, such as hospitals, are named);
  • specific details of the interventions (e.g. dose, frequency);
  • numbers of participants and baseline data; and
  • date and duration of the study (which also can clarify whether different sample sizes are due to different periods of recruitment), length of follow-up, or subgroups selected to address secondary goals.

Review authors should use as many trial characteristics as possible to link multiple reports. When uncertainties remain after considering these and other factors, it may be necessary to correspond with the study authors or sponsors for confirmation.

5.2.2 Determining which sources might be most useful

A comprehensive search to identify all eligible studies from all possible sources is resource-intensive but necessary for a high-quality systematic review (see Chapter 4 ). Because some data sources are more useful than others (Mayo-Wilson et al 2018), review authors should consider which data sources may be available and which may contain the most useful information for the review. These considerations should be described in the protocol. Table 5.2.a summarizes the strengths and limitations of different data sources (Mayo-Wilson et al 2018). Gaining access to CSRs and IPD often takes a long time. Review authors should begin searching repositories and contact trial investigators and sponsors as early as possible to negotiate data usage agreements (Mayo-Wilson et al 2015, Mayo-Wilson et al 2018).

Table 5.2.a Strengths and limitations of different data sources for systematic reviews

 

Found easily

Data extracted quickly

Include useful information about methods and results

Available for some, but not all studies (with a risk of reporting biases: see and )

Contain limited study characteristics and methods

Can omit outcomes, especially harms

Identify unpublished studies

Include little information about study design

Include limited and unclear information for meta-analysis

May result in double-counting studies in meta-analysis if not correctly linked to other reports of the same study

Identify otherwise unpublished trials

May contain information about design, risk of bias, and results not included in other public sources

Link multiple sources about the same trial using unique registration number

Limited to more recent studies that comply with registration requirements

Often contain limited information about trial design and quantitative results

May report only harms (adverse events) occurring above a threshold (e.g. 5%)

May be inaccurate or incomplete for trials whose methods have changed during the conduct of the study, or results not kept up to date

Identify studies not reported in other public sources

Describe details of methods and results not found in other sources

Available only for studies submitted to regulators

Available for approved indications, but not ‘off-label’ uses

Not always in a standard format

Not often available for old products

Contain detailed information about study characteristics, methods, and results

Can be particularly useful for identifying detailed information about harms

Describe aggregate results, which are easy to analyse and sufficient for most reviews

Do not exist or difficult to obtain for most studies

Require more time to obtain and analyse than public sources

Allow review authors to use contemporary statistical methods and to standardize analyses across studies

Permit additional analyses that the review authors desire (e.g. subgroup analyses)

Require considerable expertise and time to obtain and analyse

May lead to the same results that can be found in aggregate report

May not be necessary if one has a CSR

5.2.3 Correspondence with investigators

Review authors often find that they are unable to obtain all the information they seek from available reports about the details of the study design, the full range of outcomes measured and the numerical results. In such circumstances, authors are strongly encouraged to contact the original investigators (see MECIR Box 5.2.c ). Contact details of study authors, when not available from the study reports, often can be obtained from more recent publications, from university or institutional staff listings, from membership directories of professional societies, or by a general search of the web. If the contact author named in the study report cannot be contacted or does not respond, it is worthwhile attempting to contact other authors.

Review authors should consider the nature of the information they require and make their request accordingly. For descriptive information about the conduct of the trial, it may be most appropriate to ask open-ended questions (e.g. how was the allocation process conducted, or how were missing data handled?). If specific numerical data are required, it may be more helpful to request them specifically, possibly providing a short data collection form (either uncompleted or partially completed). If IPD are required, they should be specifically requested (see also Chapter 26 ). In some cases, study investigators may find it more convenient to provide IPD rather than conduct additional analyses to obtain the specific statistics requested.

MECIR Box 5.2.c Relevant expectations for conduct of intervention reviews

Obtaining unpublished data ( )

Contacting study authors to obtain or confirm data makes the review more complete, potentially enhances precision and reduces the impact of reporting biases. Missing information includes details to inform risk of bias assessments, details of interventions and outcomes, and study results (including breakdowns of results by important subgroups).

5.3 What data to collect

5.3.1 what are data.

For the purposes of this chapter, we define ‘data’ to be any information about (or derived from) a study, including details of methods, participants, setting, context, interventions, outcomes, results, publications, and investigators. Review authors should plan in advance what data will be required for their systematic review, and develop a strategy for obtaining them (see MECIR Box 5.3.a ). The involvement of consumers and other stakeholders can be helpful in ensuring that the categories of data collected are sufficiently aligned with the needs of review users ( Chapter 1, Section 1.3 ). The data to be sought should be described in the protocol, with consideration wherever possible of the issues raised in the rest of this chapter.

The data collected for a review should adequately describe the included studies, support the construction of tables and figures, facilitate the risk of bias assessment, and enable syntheses and meta-analyses. Review authors should familiarize themselves with reporting guidelines for systematic reviews (see online Chapter III and the PRISMA statement; (Liberati et al 2009) to ensure that relevant elements and sections are incorporated. The following sections review the types of information that should be sought, and these are summarized in Table 5.3.a (Li et al 2015).

MECIR Box 5.3.a Relevant expectations for conduct of intervention reviews

Describing studies ( )

Basic characteristics of each study will need to be presented as part of the review, including details of participants, interventions and comparators, outcomes and study design.

Table 5.3.a Checklist of items to consider in data collection

Name of data extractors, date of data extraction, and identification features of each report from which data are being extracted

Confirm eligibility of the study for the review

Reason for exclusion

Study design:

Recruitment and sampling procedures used (including at the level of individual participants and clusters/sites if relevant)

Enrolment start and end dates; length of participant follow-up

Details of random sequence generation, allocation sequence concealment, and masking for randomized trials, and methods used to prevent and control for confounding, selection biases, and information biases for non-randomized studies*

Methods used to prevent and address missing data*

Statistical analysis:

Unit of analysis (e.g. individual participant, clinic, village, body part)

Statistical methods used if computed effect estimates are extracted from reports, including any covariates included in the statistical model

Likelihood of reporting and other biases*

Source(s) of funding or other material support for the study

Authors’ financial relationship and other potential conflicts of interest

Setting

Region(s) and country/countries from which study participants were recruited

Study eligibility criteria, including diagnostic criteria

Characteristics of participants at the beginning (or baseline) of the study (e.g. age, sex, comorbidity, socio-economic status)

Description of the intervention(s) and comparison intervention(s), ideally with sufficient detail for replication:

For each pre-specified outcome domain (e.g. anxiety) in the systematic review:

)

For each group, and for each outcome at each time point: number of participants randomly assigned and included in the analysis; and number of participants who withdrew, were lost to follow-up or were excluded (with reasons for each)

Summary data for each group (e.g. 2×2 table for dichotomous data; means and standard deviations for continuous data)

Between-group estimates that quantify the effect of the intervention on the outcome, and their precision (e.g. risk ratio, odds ratio, mean difference)

If subgroup analysis is planned, the same information would need to be extracted for each participant subgroup

Key conclusions of the study authors

Reference to other relevant studies

Correspondence required

Miscellaneous comments from the study authors or by the review authors

*Full description required for assessments of risk of bias (see Chapter 8 , Chapter 23 and Chapter 25 ).

5.3.2 Study methods and potential sources of bias

Different research methods can influence study outcomes by introducing different biases into results. Important study design characteristics should be collected to allow the selection of appropriate methods for assessment and analysis, and to enable description of the design of each included study in a table of ‘Characteristics of included studies’, including whether the study is randomized, whether the study has a cluster or crossover design, and the duration of the study. If the review includes non-randomized studies, appropriate features of the studies should be described (see Chapter 24 ).

Detailed information should be collected to facilitate assessment of the risk of bias in each included study. Risk-of-bias assessment should be conducted using the tool most appropriate for the design of each study, and the information required to complete the assessment will depend on the tool. Randomized studies should be assessed using the tool described in Chapter 8 . The tool covers bias arising from the randomization process, due to deviations from intended interventions, due to missing outcome data, in measurement of the outcome, and in selection of the reported result. For each item in the tool, a description of what happened in the study is required, which may include verbatim quotes from study reports. Information for assessment of bias due to missing outcome data and selection of the reported result may be most conveniently collected alongside information on outcomes and results. Chapter 7 (Section 7.3.1) discusses some issues in the collection of information for assessments of risk of bias. For non-randomized studies, the most appropriate tool is described in Chapter 25 . A separate tool also covers bias due to missing results in meta-analysis (see Chapter 13 ).

A particularly important piece of information is the funding source of the study and potential conflicts of interest of the study authors.

Some review authors will wish to collect additional information on study characteristics that bear on the quality of the study’s conduct but that may not lead directly to risk of bias, such as whether ethical approval was obtained and whether a sample size calculation was performed a priori.

5.3.3 Participants and setting

Details of participants are collected to enable an understanding of the comparability of, and differences between, the participants within and between included studies, and to allow assessment of how directly or completely the participants in the included studies reflect the original review question.

Typically, aspects that should be collected are those that could (or are believed to) affect presence or magnitude of an intervention effect and those that could help review users assess applicability to populations beyond the review. For example, if the review authors suspect important differences in intervention effect between different socio-economic groups, this information should be collected. If intervention effects are thought constant over such groups, and if such information would not be useful to help apply results, it should not be collected. Participant characteristics that are often useful for assessing applicability include age and sex. Summary information about these should always be collected unless they are not obvious from the context. These characteristics are likely to be presented in different formats (e.g. ages as means or medians, with standard deviations or ranges; sex as percentages or counts for the whole study or for each intervention group separately). Review authors should seek consistent quantities where possible, and decide whether it is more relevant to summarize characteristics for the study as a whole or by intervention group. It may not be possible to select the most consistent statistics until data collection is complete across all or most included studies. Other characteristics that are sometimes important include ethnicity, socio-demographic details (e.g. education level) and the presence of comorbid conditions. Clinical characteristics relevant to the review question (e.g. glucose level for reviews on diabetes) also are important for understanding the severity or stage of the disease.

Diagnostic criteria that were used to define the condition of interest can be a particularly important source of diversity across studies and should be collected. For example, in a review of drug therapy for congestive heart failure, it is important to know how the definition and severity of heart failure was determined in each study (e.g. systolic or diastolic dysfunction, severe systolic dysfunction with ejection fractions below 20%). Similarly, in a review of antihypertensive therapy, it is important to describe baseline levels of blood pressure of participants.

If the settings of studies may influence intervention effects or applicability, then information on these should be collected. Typical settings of healthcare intervention studies include acute care hospitals, emergency facilities, general practice, and extended care facilities such as nursing homes, offices, schools, and communities. Sometimes studies are conducted in different geographical regions with important differences that could affect delivery of an intervention and its outcomes, such as cultural characteristics, economic context, or rural versus city settings. Timing of the study may be associated with important technology differences or trends over time. If such information is important for the interpretation of the review, it should be collected.

Important characteristics of the participants in each included study should be summarized for the reader in the table of ‘Characteristics of included studies’.

5.3.4 Interventions

Details of all experimental and comparator interventions of relevance to the review should be collected. Again, details are required for aspects that could affect the presence or magnitude of an effect or that could help review users assess applicability to their own circumstances. Where feasible, information should be sought (and presented in the review) that is sufficient for replication of the interventions under study. This includes any co-interventions administered as part of the study, and applies similarly to comparators such as ‘usual care’. Review authors may need to request missing information from study authors.

The Template for Intervention Description and Replication (TIDieR) provides a comprehensive framework for full description of interventions and has been proposed for use in systematic reviews as well as reports of primary studies (Hoffmann et al 2014). The checklist includes descriptions of:

  • the rationale for the intervention and how it is expected to work;
  • any documentation that instructs the recipient on the intervention;
  • what the providers do to deliver the intervention (procedures and processes);
  • who provides the intervention (including their skill level), how (e.g. face to face, web-based) and in what setting (e.g. home, school, or hospital);
  • the timing and intensity;
  • whether any variation is permitted or expected, and whether modifications were actually made; and
  • any strategies used to ensure or assess fidelity or adherence to the intervention, and the extent to which the intervention was delivered as planned.

For clinical trials of pharmacological interventions, key information to collect will often include routes of delivery (e.g. oral or intravenous delivery), doses (e.g. amount or intensity of each treatment, frequency of delivery), timing (e.g. within 24 hours of diagnosis), and length of treatment. For other interventions, such as those that evaluate psychotherapy, behavioural and educational approaches, or healthcare delivery strategies, the amount of information required to characterize the intervention will typically be greater, including information about multiple elements of the intervention, who delivered it, and the format and timing of delivery. Chapter 17 provides further information on how to manage intervention complexity, and how the intervention Complexity Assessment Tool (iCAT) can facilitate data collection (Lewin et al 2017).

Important characteristics of the interventions in each included study should be summarized for the reader in the table of ‘Characteristics of included studies’. Additional tables or diagrams such as logic models ( Chapter 2, Section 2.5.1 ) can assist descriptions of multi-component interventions so that review users can better assess review applicability to their context.

5.3.4.1 Integrity of interventions

The degree to which specified procedures or components of the intervention are implemented as planned can have important consequences for the findings from a study. We describe this as intervention integrity ; related terms include adherence, compliance and fidelity (Carroll et al 2007). The verification of intervention integrity may be particularly important in reviews of non-pharmacological trials such as behavioural interventions and complex interventions, which are often implemented in conditions that present numerous obstacles to idealized delivery.

It is generally expected that reports of randomized trials provide detailed accounts of intervention implementation (Zwarenstein et al 2008, Moher et al 2010). In assessing whether interventions were implemented as planned, review authors should bear in mind that some interventions are standardized (with no deviations permitted in the intervention protocol), whereas others explicitly allow a degree of tailoring (Zwarenstein et al 2008). In addition, the growing field of implementation science has led to an increased awareness of the impact of setting and context on delivery of interventions (Damschroder et al 2009). (See Chapter 17, Section 17.1.2.1 for further information and discussion about how an intervention may be tailored to local conditions in order to preserve its integrity.)

Information about integrity can help determine whether unpromising results are due to a poorly conceptualized intervention or to an incomplete delivery of the prescribed components. It can also reveal important information about the feasibility of implementing a given intervention in real life settings. If it is difficult to achieve full implementation in practice, the intervention will have low feasibility (Dusenbury et al 2003).

Whether a lack of intervention integrity leads to a risk of bias in the estimate of its effect depends on whether review authors and users are interested in the effect of assignment to intervention or the effect of adhering to intervention, as discussed in more detail in Chapter 8, Section 8.2.2 . Assessment of deviations from intended interventions is important for assessing risk of bias in the latter, but not the former (see Chapter 8, Section 8.4 ), but both may be of interest to decision makers in different ways.

An example of a Cochrane Review evaluating intervention integrity is provided by a review of smoking cessation in pregnancy (Chamberlain et al 2017). The authors found that process evaluation of the intervention occurred in only some trials and that the implementation was less than ideal in others, including some of the largest trials. The review highlighted how the transfer of an intervention from one setting to another may reduce its effectiveness when elements are changed, or aspects of the materials are culturally inappropriate.

5.3.4.2 Process evaluations

Process evaluations seek to evaluate the process (and mechanisms) between the intervention’s intended implementation and the actual effect on the outcome (Moore et al 2015). Process evaluation studies are characterized by a flexible approach to data collection and the use of numerous methods to generate a range of different types of data, encompassing both quantitative and qualitative methods. Guidance for including process evaluations in systematic reviews is provided in Chapter 21 . When it is considered important, review authors should aim to collect information on whether the trial accounted for, or measured, key process factors and whether the trials that thoroughly addressed integrity showed a greater impact. Process evaluations can be a useful source of factors that potentially influence the effectiveness of an intervention.

5.3.5 Outcome s

An outcome is an event or a measurement value observed or recorded for a particular person or intervention unit in a study during or following an intervention, and that is used to assess the efficacy and safety of the studied intervention (Meinert 2012). Review authors should indicate in advance whether they plan to collect information about all outcomes measured in a study or only those outcomes of (pre-specified) interest in the review. Research has shown that trials addressing the same condition and intervention seldom agree on which outcomes are the most important, and consequently report on numerous different outcomes (Dwan et al 2014, Ismail et al 2014, Denniston et al 2015, Saldanha et al 2017a). The selection of outcomes across systematic reviews of the same condition is also inconsistent (Page et al 2014, Saldanha et al 2014, Saldanha et al 2016, Liu et al 2017). Outcomes used in trials and in systematic reviews of the same condition have limited overlap (Saldanha et al 2017a, Saldanha et al 2017b).

We recommend that only the outcomes defined in the protocol be described in detail. However, a complete list of the names of all outcomes measured may allow a more detailed assessment of the risk of bias due to missing outcome data (see Chapter 13 ).

Review authors should collect all five elements of an outcome (Zarin et al 2011, Saldanha et al 2014):

1. outcome domain or title (e.g. anxiety);

2. measurement tool or instrument (including definition of clinical outcomes or endpoints); for a scale, name of the scale (e.g. the Hamilton Anxiety Rating Scale), upper and lower limits, and whether a high or low score is favourable, definitions of any thresholds if appropriate;

3. specific metric used to characterize each participant’s results (e.g. post-intervention anxiety, or change in anxiety from baseline to a post-intervention time point, or post-intervention presence of anxiety (yes/no));

4. method of aggregation (e.g. mean and standard deviation of anxiety scores in each group, or proportion of people with anxiety);

5. timing of outcome measurements (e.g. assessments at end of eight-week intervention period, events occurring during eight-week intervention period).

Further considerations for economics outcomes are discussed in Chapter 20 , and for patient-reported outcomes in Chapter 18 .

5.3.5.1 Adverse effects

Collection of information about the harmful effects of an intervention can pose particular difficulties, discussed in detail in Chapter 19 . These outcomes may be described using multiple terms, including ‘adverse event’, ‘adverse effect’, ‘adverse drug reaction’, ‘side effect’ and ‘complication’. Many of these terminologies are used interchangeably in the literature, although some are technically different. Harms might additionally be interpreted to include undesirable changes in other outcomes measured during a study, such as a decrease in quality of life where an improvement may have been anticipated.

In clinical trials, adverse events can be collected either systematically or non-systematically. Systematic collection refers to collecting adverse events in the same manner for each participant using defined methods such as a questionnaire or a laboratory test. For systematically collected outcomes representing harm, data can be collected by review authors in the same way as efficacy outcomes (see Section 5.3.5 ).

Non-systematic collection refers to collection of information on adverse events using methods such as open-ended questions (e.g. ‘Have you noticed any symptoms since your last visit?’), or reported by participants spontaneously. In either case, adverse events may be selectively reported based on their severity, and whether the participant suspected that the effect may have been caused by the intervention, which could lead to bias in the available data. Unfortunately, most adverse events are collected non-systematically rather than systematically, creating a challenge for review authors. The following pieces of information are useful and worth collecting (Nicole Fusco, personal communication):

  • any coding system or standard medical terminology used (e.g. COSTART, MedDRA), including version number;
  • name of the adverse events (e.g. dizziness);
  • reported intensity of the adverse event (e.g. mild, moderate, severe);
  • whether the trial investigators categorized the adverse event as ‘serious’;
  • whether the trial investigators identified the adverse event as being related to the intervention;
  • time point (most commonly measured as a count over the duration of the study);
  • any reported methods for how adverse events were selected for inclusion in the publication (e.g. ‘We reported all adverse events that occurred in at least 5% of participants’); and
  • associated results.

Different collection methods lead to very different accounting of adverse events (Safer 2002, Bent et al 2006, Ioannidis et al 2006, Carvajal et al 2011, Allen et al 2013). Non-systematic collection methods tend to underestimate how frequently an adverse event occurs. It is particularly problematic when the adverse event of interest to the review is collected systematically in some studies but non-systematically in other studies. Different collection methods introduce an important source of heterogeneity. In addition, when non-systematic adverse events are reported based on quantitative selection criteria (e.g. only adverse events that occurred in at least 5% of participants were included in the publication), use of reported data alone may bias the results of meta-analyses. Review authors should be cautious of (or refrain from) synthesizing adverse events that are collected differently.

Regardless of the collection methods, precise definitions of adverse effect outcomes and their intensity should be recorded, since they may vary between studies. For example, in a review of aspirin and gastrointestinal haemorrhage, some trials simply reported gastrointestinal bleeds, while others reported specific categories of bleeding, such as haematemesis, melaena, and proctorrhagia (Derry and Loke 2000). The definition and reporting of severity of the haemorrhages (e.g. major, severe, requiring hospital admission) also varied considerably among the trials (Zanchetti and Hansson 1999). Moreover, a particular adverse effect may be described or measured in different ways among the studies. For example, the terms ‘tiredness’, ‘fatigue’ or ‘lethargy’ may all be used in reporting of adverse effects. Study authors also may use different thresholds for ‘abnormal’ results (e.g. hypokalaemia diagnosed at a serum potassium concentration of 3.0 mmol/L or 3.5 mmol/L).

No mention of adverse events in trial reports does not necessarily mean that no adverse events occurred. It is usually safest to assume that they were not reported. Quality of life measures are sometimes used as a measure of the participants’ experience during the study, but these are usually general measures that do not look specifically at particular adverse effects of the intervention. While quality of life measures are important and can be used to gauge overall participant well-being, they should not be regarded as substitutes for a detailed evaluation of safety and tolerability.

5.3.6 Results

Results data arise from the measurement or ascertainment of outcomes for individual participants in an intervention study. Results data may be available for each individual in a study (i.e. individual participant data; see Chapter 26 ), or summarized at arm level, or summarized at study level into an intervention effect by comparing two intervention arms. Results data should be collected only for the intervention groups and outcomes specified to be of interest in the protocol (see MECIR Box 5.3.b ). Results for other outcomes should not be collected unless the protocol is modified to add them. Any modification should be reported in the review. However, review authors should be alert to the possibility of important, unexpected findings, particularly serious adverse effects.

MECIR Box 5.3.b Relevant expectations for conduct of intervention reviews

Choosing intervention groups in multi-arm studies ( )

There is no point including irrelevant interventions in the review. Authors should, however, make it clear in the table of ‘Characteristics of included studies’ that these interventions were present in the study.

Reports of studies often include several results for the same outcome. For example, different measurement scales might be used, results may be presented separately for different subgroups, and outcomes may have been measured at different follow-up time points. Variation in the results can be very large, depending on which data are selected (Gøtzsche et al 2007, Mayo-Wilson et al 2017a). Review protocols should be as specific as possible about which outcome domains, measurement tools, time points, and summary statistics (e.g. final values versus change from baseline) are to be collected (Mayo-Wilson et al 2017b). A framework should be pre-specified in the protocol to facilitate making choices between multiple eligible measures or results. For example, a hierarchy of preferred measures might be created, or plans articulated to select the result with the median effect size, or to average across all eligible results for a particular outcome domain (see also Chapter 9, Section 9.3.3 ). Any additional decisions or changes to this framework made once the data are collected should be reported in the review as changes to the protocol.

Section 5.6 describes the numbers that will be required to perform meta-analysis, if appropriate. The unit of analysis (e.g. participant, cluster, body part, treatment period) should be recorded for each result when it is not obvious (see Chapter 6, Section 6.2 ). The type of outcome data determines the nature of the numbers that will be sought for each outcome. For example, for a dichotomous (‘yes’ or ‘no’) outcome, the number of participants and the number who experienced the outcome will be sought for each group. It is important to collect the sample size relevant to each result, although this is not always obvious. A flow diagram as recommended in the CONSORT Statement (Moher et al 2001) can help to determine the flow of participants through a study. If one is not available in a published report, review authors can consider drawing one (available from www.consort-statement.org ).

The numbers required for meta-analysis are not always available. Often, other statistics can be collected and converted into the required format. For example, for a continuous outcome, it is usually most convenient to seek the number of participants, the mean and the standard deviation for each intervention group. These are often not available directly, especially the standard deviation. Alternative statistics enable calculation or estimation of the missing standard deviation (such as a standard error, a confidence interval, a test statistic (e.g. from a t-test or F-test) or a P value). These should be extracted if they provide potentially useful information (see MECIR Box 5.3.c ). Details of recalculation are provided in Section 5.6 . Further considerations for dealing with missing data are discussed in Chapter 10, Section 10.12 .

MECIR Box 5.3.c Relevant expectations for conduct of intervention reviews

Making maximal use of data ( )

) or P values, or even data for individual participants Data entry into RevMan is easiest when 2×2 tables are reported for dichotomous outcomes, and when means and standard deviations are presented for continuous outcomes. Sometimes these statistics are not reported but some manipulations of the reported data can be performed to obtain them. For instance, 2×2 tables can often be derived from sample sizes and percentages, while standard deviations can often be computed using confidence intervals or P values. Furthermore, the inverse-variance data entry format can be used even if the detailed data required for dichotomous or continuous data are not available, for instance if only odds ratios and their confidence intervals are presented. The RevMan calculator facilitates many of these manipulations.

Checking accuracy of numeric data in the review ( )

This is a reasonably straightforward way for authors to check a number of potential problems, including typographical errors in studies’ reports, accuracy of data collection and manipulation, and data entry into RevMan.  For example, the direction of a standardized mean difference may accidentally be wrong in the review. A basic check is to ensure the same qualitative findings (e.g. direction of effect and statistical significance) between the data as presented in the review and the data as available from the original study. Results in forest plots should agree with data in the original report (point estimate and confidence interval) if the same effect measure and statistical model is used.

5.3.7 Other information to collect

We recommend that review authors collect the key conclusions of the included study as reported by its authors. It is not necessary to report these conclusions in the review, but they should be used to verify the results of analyses undertaken by the review authors, particularly in relation to the direction of effect. Further comments by the study authors, for example any explanations they provide for unexpected findings, may be noted. References to other studies that are cited in the study report may be useful, although review authors should be aware of the possibility of citation bias (see Chapter 7, Section 7.2.3.2 ). Documentation of any correspondence with the study authors is important for review transparency.

5.4 Data collection tools

5.4.1 rationale for data collection forms.

Data collection for systematic reviews should be performed using structured data collection forms (see MECIR Box 5.4.a ). These can be paper forms, electronic forms (e.g. Google Form), or commercially or custom-built data systems (e.g. Covidence, EPPI-Reviewer, Systematic Review Data Repository (SRDR)) that allow online form building, data entry by several users, data sharing, and efficient data management (Li et al 2015). All different means of data collection require data collection forms.

MECIR Box 5.4.a Relevant expectations for conduct of intervention reviews

Using data collection forms ( )

Review authors often have different backgrounds and level of systematic review experience. Using a data collection form ensures some consistency in the process of data extraction, and is necessary for comparing data extracted in duplicate. The completed data collection forms should be available to the CRG on request. Piloting the form within the review team is highly desirable. At minimum, the data collection form (or a very close variant of it) must have been assessed for usability.

The data collection form is a bridge between what is reported by the original investigators (e.g. in journal articles, abstracts, personal correspondence) and what is ultimately reported by the review authors. The data collection form serves several important functions (Meade and Richardson 1997). First, the form is linked directly to the review question and criteria for assessing eligibility of studies, and provides a clear summary of these that can be used to identify and structure the data to be extracted from study reports. Second, the data collection form is the historical record of the provenance of the data used in the review, as well as the multitude of decisions (and changes to decisions) that occur throughout the review process. Third, the form is the source of data for inclusion in an analysis.

Given the important functions of data collection forms, ample time and thought should be invested in their design. Because each review is different, data collection forms will vary across reviews. However, there are many similarities in the types of information that are important. Thus, forms can be adapted from one review to the next. Although we use the term ‘data collection form’ in the singular, in practice it may be a series of forms used for different purposes: for example, a separate form could be used to assess the eligibility of studies for inclusion in the review to assist in the quick identification of studies to be excluded from or included in the review.

5.4.2 Considerations in selecting data collection tools

The choice of data collection tool is largely dependent on review authors’ preferences, the size of the review, and resources available to the author team. Potential advantages and considerations of selecting one data collection tool over another are outlined in Table 5.4.a (Li et al 2015). A significant advantage that data systems have is in data management ( Chapter 1, Section 1.6 ) and re-use. They make review updates more efficient, and also facilitate methodological research across reviews. Numerous ‘meta-epidemiological’ studies have been carried out using Cochrane Review data, resulting in methodological advances which would not have been possible if thousands of studies had not all been described using the same data structures in the same system.

Some data collection tools facilitate automatic imports of extracted data into RevMan (Cochrane’s authoring tool), such as CSV (Excel) and Covidence. Details available here https://documentation.cochrane.org/revman-kb/populate-study-data-260702462.html

Table 5.4.a Considerations in selecting data collection tools

Examples

Forms developed using word processing software

Microsoft Access

Google Forms

Covidence

EPPI-Reviewer

Systematic Review Data Repository (SRDR)

DistillerSR (Evidence Partners)

Doctor Evidence

Suitable review type and team sizes

Small-scale reviews (<10 included studies)

Small team with 2 to 3 data extractors in the same physical location

Small- to medium-scale reviews (10 to 20 studies)

Small to moderate-sized team with 4 to 6 data extractors

For small-, medium-, and especially large-scale reviews (>20 studies), as well as reviews that need constant updating

All team sizes, especially large teams (i.e. >6 data extractors)

Resource needs

Low

Low to medium

Low (open-access tools such as Covidence or SRDR, or tools for which authors have institutional licences)

High (commercial data systems with no access via an institutional licence)

Advantages

Do not rely on access to computer and network or internet connectivity

Can record notes and explanations easily

Require minimal software skills

Allow extracted data to be processed electronically for editing and analysis

Allow electronic data storage, sharing and collation

Easy to expand or edit forms as required

Can automate data comparison with additional programming

Can copy data to analysis software without manual re-entry, reducing errors

Specifically designed for data collection for systematic reviews

Allow online data storage, linking, and sharing

Easy to expand or edit forms as required

Can be integrated with title/abstract, full-text screening and other functions

Can link data items to locations in the report to facilitate checking

Can readily automate data comparison between independent data collection for the same study

Allow easy monitoring of progress and performance of the author team

Facilitate coordination among data collectors such as allocation of studies for collection and monitoring team progress

Allow simultaneous data entry by multiple authors

Can export data directly to analysis software

In some cases, improve public accessibility through open data sharing

Disadvantages

Inefficient and potentially unreliable because data must be entered into software for analysis and reporting

Susceptible to errors

Data collected by multiple authors must be manually collated

Difficult to amend as the review progresses

If the papers are lost, all data will need to be re-created

Require familiarity with software packages to design and use forms

Susceptible to changes in software versions

Upfront investment of resources to set up the form and train data extractors

Structured templates may not be as flexible as electronic forms

Cost of commercial data systems

Require familiarity with data systems

Susceptible to changes in software versions

5.4.3 Design of a data collection form

Regardless of whether data are collected using a paper or electronic form, or a data system, the key to successful data collection is to construct easy-to-use forms and collect sufficient and unambiguous data that faithfully represent the source in a structured and organized manner (Li et al 2015). In most cases, a document format should be developed for the form before building an electronic form or a data system. This can be distributed to others, including programmers and data analysts, and as a guide for creating an electronic form and any guidance or codebook to be used by data extractors. Review authors also should consider compatibility of any electronic form or data system with analytical software, as well as mechanisms for recording, assessing and correcting data entry errors.

Data described in multiple reports (or even within a single report) of a study may not be consistent. Review authors will need to describe how they work with multiple reports in the protocol, for example, by pre-specifying which report will be used when sources contain conflicting data that cannot be resolved by contacting the investigators. Likewise, when there is only one report identified for a study, review authors should specify the section within the report (e.g. abstract, methods, results, tables, and figures) for use in case of inconsistent information.

If review authors wish to automatically import their extracted data into RevMan, it is advised that their data collection forms match the data extraction templates available via the RevMan Knowledge Base. Details available here https://documentation.cochrane.org/revman-kb/data-extraction-templates-260702375.html.

A good data collection form should minimize the need to go back to the source documents. When designing a data collection form, review authors should involve all members of the team, that is, content area experts, authors with experience in systematic review methods and data collection form design, statisticians, and persons who will perform data extraction. Here are suggested steps and some tips for designing a data collection form, based on the informal collation of experiences from numerous review authors (Li et al 2015).

Step 1. Develop outlines of tables and figures expected to appear in the systematic review, considering the comparisons to be made between different interventions within the review, and the various outcomes to be measured. This step will help review authors decide the right amount of data to collect (not too much or too little). Collecting too much information can lead to forms that are longer than original study reports, and can be very wasteful of time. Collection of too little information, or omission of key data, can lead to the need to return to study reports later in the review process.

Step 2. Assemble and group data elements to facilitate form development. Review authors should consult Table 5.3.a , in which the data elements are grouped to facilitate form development and data collection. Note that it may be more efficient to group data elements in the order in which they are usually found in study reports (e.g. starting with reference information, followed by eligibility criteria, intervention description, statistical methods, baseline characteristics and results).

Step 3. Identify the optimal way of framing the data items. Much has been written about how to frame data items for developing robust data collection forms in primary research studies. We summarize a few key points and highlight issues that are pertinent to systematic reviews.

  • Ask closed-ended questions (i.e. questions that define a list of permissible responses) as much as possible. Closed-ended questions do not require post hoc coding and provide better control over data quality than open-ended questions. When setting up a closed-ended question, one must anticipate and structure possible responses and include an ‘other, specify’ category because the anticipated list may not be exhaustive. Avoid asking data extractors to summarize data into uncoded text, no matter how short it is.
  • Avoid asking a question in a way that the response may be left blank. Include ‘not applicable’, ‘not reported’ and ‘cannot tell’ options as needed. The ‘cannot tell’ option tags uncertain items that may promote review authors to contact study authors for clarification, especially on data items critical to reach conclusions.
  • Remember that the form will focus on what is reported in the article rather what has been done in the study. The study report may not fully reflect how the study was actually conducted. For example, a question ‘Did the article report that the participants were masked to the intervention?’ is more appropriate than ‘Were participants masked to the intervention?’
  • Where a judgement is required, record the raw data (i.e. quote directly from the source document) used to make the judgement. It is also important to record the source of information collected, including where it was found in a report or whether information was obtained from unpublished sources or personal communications. As much as possible, questions should be asked in a way that minimizes subjective interpretation and judgement to facilitate data comparison and adjudication.
  • Incorporate flexibility to allow for variation in how data are reported. It is strongly recommended that outcome data be collected in the format in which they were reported and transformed in a subsequent step if required. Review authors also should consider the software they will use for analysis and for publishing the review (e.g. RevMan).

Step 4. Develop and pilot-test data collection forms, ensuring that they provide data in the right format and structure for subsequent analysis. In addition to data items described in Step 2, data collection forms should record the title of the review as well as the person who is completing the form and the date of completion. Forms occasionally need revision; forms should therefore include the version number and version date to reduce the chances of using an outdated form by mistake. Because a study may be associated with multiple reports, it is important to record the study ID as well as the report ID. Definitions and instructions helpful for answering a question should appear next to the question to improve quality and consistency across data extractors (Stock 1994). Provide space for notes, regardless of whether paper or electronic forms are used.

All data collection forms and data systems should be thoroughly pilot-tested before launch (see MECIR Box 5.4.a ). Testing should involve several people extracting data from at least a few articles. The initial testing focuses on the clarity and completeness of questions. Users of the form may provide feedback that certain coding instructions are confusing or incomplete (e.g. a list of options may not cover all situations). The testing may identify data that are missing from the form, or likely to be superfluous. After initial testing, accuracy of the extracted data should be checked against the source document or verified data to identify problematic areas. It is wise to draft entries for the table of ‘Characteristics of included studies’ and complete a risk of bias assessment ( Chapter 8 ) using these pilot reports to ensure all necessary information is collected. A consensus between review authors may be required before the form is modified to avoid any misunderstandings or later disagreements. It may be necessary to repeat the pilot testing on a new set of reports if major changes are needed after the first pilot test.

Problems with the data collection form may surface after pilot testing has been completed, and the form may need to be revised after data extraction has started. When changes are made to the form or coding instructions, it may be necessary to return to reports that have already undergone data extraction. In some situations, it may be necessary to clarify only coding instructions without modifying the actual data collection form.

5.5 Extracting data from reports

5.5.1 introduction.

In most systematic reviews, the primary source of information about each study is published reports of studies, usually in the form of journal articles. Despite recent developments in machine learning models to automate data extraction in systematic reviews (see Section 5.5.9 ), data extraction is still largely a manual process. Electronic searches for text can provide a useful aid to locating information within a report. Examples include using search facilities in PDF viewers, internet browsers and word processing software. However, text searching should not be considered a replacement for reading the report, since information may be presented using variable terminology and presented in multiple formats.

5.5.2 Who should extract data?

Data extractors should have at least a basic understanding of the topic, and have knowledge of study design, data analysis and statistics. They should pay attention to detail while following instructions on the forms. Because errors that occur at the data extraction stage are rarely detected by peer reviewers, editors, or users of systematic reviews, it is recommended that more than one person extract data from every report to minimize errors and reduce introduction of potential biases by review authors (see MECIR Box 5.5.a ). As a minimum, information that involves subjective interpretation and information that is critical to the interpretation of results (e.g. outcome data) should be extracted independently by at least two people (see MECIR Box 5.5.a ). In common with implementation of the selection process ( Chapter 4, Section 4.6 ), it is preferable that data extractors are from complementary disciplines, for example a methodologist and a topic area specialist. It is important that everyone involved in data extraction has practice using the form and, if the form was designed by someone else, receives appropriate training.

Evidence in support of duplicate data extraction comes from several indirect sources. One study observed that independent data extraction by two authors resulted in fewer errors than data extraction by a single author followed by verification by a second (Buscemi et al 2006). A high prevalence of data extraction errors (errors in 20 out of 34 reviews) has been observed (Jones et al 2005). A further study of data extraction to compute standardized mean differences found that a minimum of seven out of 27 reviews had substantial errors (Gøtzsche et al 2007).

MECIR Box 5.5.a Relevant expectations for conduct of intervention reviews

Extracting study characteristics in duplicate ( )

Duplicating the data extraction process reduces both the risk of making mistakes and the possibility that data selection is influenced by a single person’s biases. Dual data extraction may be less important for study characteristics than it is for outcome data, so it is not a mandatory standard for the former.

Extracting outcome data in duplicate ( )

Duplicating the data extraction process reduces both the risk of making mistakes and the possibility that data selection is influenced by a single person’s biases. Dual data extraction is particularly important for outcome data, which feed directly into syntheses of the evidence and hence to conclusions of the review.

5.5.3 Training data extractors

Training of data extractors is intended to familiarize them with the review topic and methods, the data collection form or data system, and issues that may arise during data extraction. Results of the pilot testing of the form should prompt discussion among review authors and extractors of ambiguous questions or responses to establish consistency. Training should take place at the onset of the data extraction process and periodically over the course of the project (Li et al 2015). For example, when data related to a single item on the form are present in multiple locations within a report (e.g. abstract, main body of text, tables, and figures) or in several sources (e.g. publications, ClinicalTrials.gov, or CSRs), the development and documentation of instructions to follow an agreed algorithm are critical and should be reinforced during the training sessions.

Some have proposed that some information in a report, such as its authors, be blinded to the review author prior to data extraction and assessment of risk of bias (Jadad et al 1996). However, blinding of review authors to aspects of study reports generally is not recommended for Cochrane Reviews as there is little evidence that it alters the decisions made (Berlin 1997).

5.5.4 Extracting data from multiple reports of the same study

Studies frequently are reported in more than one publication or in more than one source (Tramèr et al 1997, von Elm et al 2004). A single source rarely provides complete information about a study; on the other hand, multiple sources may contain conflicting information about the same study (Mayo-Wilson et al 2017a, Mayo-Wilson et al 2017b, Mayo-Wilson et al 2018). Because the unit of interest in a systematic review is the study and not the report, information from multiple reports often needs to be collated and reconciled. It is not appropriate to discard any report of an included study without careful examination, since it may contain valuable information not included in the primary report. Review authors will need to decide between two strategies:

  • Extract data from each report separately, then combine information across multiple data collection forms.
  • Extract data from all reports directly into a single data collection form.

The choice of which strategy to use will depend on the nature of the reports and may vary across studies and across reports. For example, when a full journal article and multiple conference abstracts are available, it is likely that the majority of information will be obtained from the journal article; completing a new data collection form for each conference abstract may be a waste of time. Conversely, when there are two or more detailed journal articles, perhaps relating to different periods of follow-up, then it is likely to be easier to perform data extraction separately for these articles and collate information from the data collection forms afterwards. When data from all reports are extracted into a single data collection form, review authors should identify the ‘main’ data source for each study when sources include conflicting data and these differences cannot be resolved by contacting authors (Mayo-Wilson et al 2018). Flow diagrams such as those modified from the PRISMA statement can be particularly helpful when collating and documenting information from multiple reports (Mayo-Wilson et al 2018).

5.5.5 Reliability and reaching consensus

When more than one author extracts data from the same reports, there is potential for disagreement. After data have been extracted independently by two or more extractors, responses must be compared to assure agreement or to identify discrepancies. An explicit procedure or decision rule should be specified in the protocol for identifying and resolving disagreements. Most often, the source of the disagreement is an error by one of the extractors and is easily resolved. Thus, discussion among the authors is a sensible first step. More rarely, a disagreement may require arbitration by another person. Any disagreement that cannot be resolved should be addressed by contacting the study authors; if this is unsuccessful, the disagreement should be reported in the review.

The presence and resolution of disagreements should be carefully recorded. Maintaining a copy of the data ‘as extracted’ (in addition to the consensus data) allows assessment of reliability of coding. Examples of ways in which this can be achieved include the following:

  • Use one author’s (paper) data collection form and record changes after consensus in a different ink colour.
  • Enter consensus data onto an electronic form.
  • Record original data extracted and consensus data in separate forms (some online tools do this automatically).

Agreement of coded items before reaching consensus can be quantified, for example using kappa statistics (Orwin 1994), although this is not routinely done in Cochrane Reviews. If agreement is assessed, this should be done only for the most important data (e.g. key risk of bias assessments, or availability of key outcomes).

Throughout the review process informal consideration should be given to the reliability of data extraction. For example, if after reaching consensus on the first few studies, the authors note a frequent disagreement for specific data, then coding instructions may need modification. Furthermore, an author’s coding strategy may change over time, as the coding rules are forgotten, indicating a need for retraining and, possibly, some recoding.

5.5.6 Extracting data from clinical study reports

Clinical study reports (CSRs) obtained for a systematic review are likely to be in PDF format. Although CSRs can be thousands of pages in length and very time-consuming to review, they typically follow the content and format required by the International Conference on Harmonisation (ICH 1995). Information in CSRs is usually presented in a structured and logical way. For example, numerical data pertaining to important demographic, efficacy, and safety variables are placed within the main text in tables and figures. Because of the clarity and completeness of information provided in CSRs, data extraction from CSRs may be clearer and conducted more confidently than from journal articles or other short reports.

To extract data from CSRs efficiently, review authors should familiarize themselves with the structure of the CSRs. In practice, review authors may want to browse or create ‘bookmarks’ within a PDF document that record section headers and subheaders and search key words related to the data extraction (e.g. randomization). In addition, it may be useful to utilize optical character recognition software to convert tables of data in the PDF to an analysable format when additional analyses are required, saving time and minimizing transcription errors.

CSRs may contain many outcomes and present many results for a single outcome (due to different analyses) (Mayo-Wilson et al 2017b). We recommend review authors extract results only for outcomes of interest to the review (Section 5.3.6 ). With regard to different methods of analysis, review authors should have a plan and pre-specify preferred metrics in their protocol for extracting results pertaining to different populations (e.g. ‘all randomized’, ‘all participants taking at least one dose of medication’), methods for handling missing data (e.g. ‘complete case analysis’, ‘multiple imputation’), and adjustment (e.g. unadjusted, adjusted for baseline covariates). It may be important to record the range of analysis options available, even if not all are extracted in detail. In some cases it may be preferable to use metrics that are comparable across multiple included studies, which may not be clear until data collection for all studies is complete.

CSRs are particularly useful for identifying outcomes assessed but not presented to the public. For efficacy outcomes and systematically collected adverse events, review authors can compare what is described in the CSRs with what is reported in published reports to assess the risk of bias due to missing outcome data ( Chapter 8, Section 8.5 ) and in selection of reported result ( Chapter 8, Section 8.7 ). Note that non-systematically collected adverse events are not amenable to such comparisons because these adverse events may not be known ahead of time and thus not pre-specified in the protocol.

5.5.7 Extracting data from regulatory reviews

Data most relevant to systematic reviews can be found in the medical and statistical review sections of a regulatory review. Both of these are substantially longer than journal articles (Turner 2013). A list of all trials on a drug usually can be found in the medical review. Because trials are referenced by a combination of numbers and letters, it may be difficult for the review authors to link the trial with other reports of the same trial (Section 5.2.1 ).

Many of the documents downloaded from the US Food and Drug Administration’s website for older drugs are scanned copies and are not searchable because of redaction of confidential information (Turner 2013). Optical character recognition software can convert most of the text. Reviews for newer drugs have been redacted electronically; documents remain searchable as a result.

Compared to CSRs, regulatory reviews contain less information about trial design, execution, and results. They provide limited information for assessing the risk of bias. In terms of extracting outcomes and results, review authors should follow the guidance provided for CSRs (Section 5.5.6 ).

5.5.8 Extracting data from figures with software

Sometimes numerical data needed for systematic reviews are only presented in figures. Review authors may request the data from the study investigators, or alternatively, extract the data from the figures either manually (e.g. with a ruler) or by using software. Numerous tools are available, many of which are free. Those available at the time of writing include tools called Plot Digitizer, WebPlotDigitizer, Engauge, Dexter, ycasd, GetData Graph Digitizer. The software works by taking an image of a figure and then digitizing the data points off the figure using the axes and scales set by the users. The numbers exported can be used for systematic reviews, although additional calculations may be needed to obtain the summary statistics, such as calculation of means and standard deviations from individual-level data points (or conversion of time-to-event data presented on Kaplan-Meier plots to hazard ratios; see Chapter 6, Section 6.8.2 ).

It has been demonstrated that software is more convenient and accurate than visual estimation or use of a ruler (Gross et al 2014, Jelicic Kadic et al 2016). Review authors should consider using software for extracting numerical data from figures when the data are not available elsewhere.

5.5.9 Automating data extraction in systematic reviews

Because data extraction is time-consuming and error-prone, automating or semi-automating this step may make the extraction process more efficient and accurate. The state of science relevant to automating data extraction is summarized here (Jonnalagadda et al 2015).

  • At least 26 studies have tested various natural language processing and machine learning approaches for facilitating data extraction for systematic reviews.

· Each tool focuses on only a limited number of data elements (ranges from one to seven). Most of the existing tools focus on the PICO information (e.g. number of participants, their age, sex, country, recruiting centres, intervention groups, outcomes, and time points). A few are able to extract study design and results (e.g. objectives, study duration, participant flow), and two extract risk of bias information (Marshall et al 2016, Millard et al 2016). To date, well over half of the data elements needed for systematic reviews have not been explored for automated extraction.

  • Most tools highlight the sentence(s) that may contain the data elements as opposed to directly recording these data elements into a data collection form or a data system.
  • There is no gold standard or common dataset to evaluate the performance of these tools, limiting our ability to interpret the significance of the reported accuracy measures.

At the time of writing, we cannot recommend a specific tool for automating data extraction for routine systematic review production. There is a need for review authors to work with experts in informatics to refine these tools and evaluate them rigorously. Such investigations should address how the tool will fit into existing workflows. For example, the automated or semi-automated data extraction approaches may first act as checks for manual data extraction before they can replace it.

5.5.10 Suspicions of scientific misconduct

Systematic review authors can uncover suspected misconduct in the published literature. Misconduct includes fabrication or falsification of data or results, plagiarism, and research that does not adhere to ethical norms. Review authors need to be aware of scientific misconduct because the inclusion of fraudulent material could undermine the reliability of a review’s findings. Plagiarism of results data in the form of duplicated publication (either by the same or by different authors) may, if undetected, lead to study participants being double counted in a synthesis.

It is preferable to identify potential problems before, rather than after, publication of the systematic review, so that readers are not misled. However, empirical evidence indicates that the extent to which systematic review authors explore misconduct varies widely (Elia et al 2016). Text-matching software and systems such as CrossCheck may be helpful for detecting plagiarism, but they can detect only matching text, so data tables or figures need to be inspected by hand or using other systems (e.g. to detect image manipulation). Lists of data such as in a meta-analysis can be a useful means of detecting duplicated studies. Furthermore, examination of baseline data can lead to suspicions of misconduct for an individual randomized trial (Carlisle et al 2015). For example, Al-Marzouki and colleagues concluded that a trial report was fabricated or falsified on the basis of highly unlikely baseline differences between two randomized groups (Al-Marzouki et al 2005).

Cochrane Review authors are advised to consult with Cochrane editors if cases of suspected misconduct are identified. Searching for comments, letters or retractions may uncover additional information. Sensitivity analyses can be used to determine whether the studies arousing suspicion are influential in the conclusions of the review. Guidance for editors for addressing suspected misconduct will be available from Cochrane’s Editorial Publishing and Policy Resource (see community.cochrane.org ). Further information is available from the Committee on Publication Ethics (COPE; publicationethics.org ), including a series of flowcharts on how to proceed if various types of misconduct are suspected. Cases should be followed up, typically including an approach to the editors of the journals in which suspect reports were published. It may be useful to write first to the primary investigators to request clarification of apparent inconsistencies or unusual observations.

Because investigations may take time, and institutions may not always be responsive (Wager 2011), articles suspected of being fraudulent should be classified as ‘awaiting assessment’. If a misconduct investigation indicates that the publication is unreliable, or if a publication is retracted, it should not be included in the systematic review, and the reason should be noted in the ‘excluded studies’ section.

5.5.11 Key points in planning and reporting data extraction

In summary, the methods section of both the protocol and the review should detail:

  • the data categories that are to be extracted;
  • how extracted data from each report will be verified (e.g. extraction by two review authors, independently);
  • whether data extraction is undertaken by content area experts, methodologists, or both;
  • pilot testing, training and existence of coding instructions for the data collection form;
  • how data are extracted from multiple reports from the same study; and
  • how disagreements are handled when more than one author extracts data from each report.

5.6 Extracting study results and converting to the desired format

In most cases, it is desirable to collect summary data separately for each intervention group of interest and to enter these into software in which effect estimates can be calculated, such as RevMan. Sometimes the required data may be obtained only indirectly, and the relevant results may not be obvious. Chapter 6 provides many useful tips and techniques to deal with common situations. When summary data cannot be obtained from each intervention group, or where it is important to use results of adjusted analyses (for example to account for correlations in crossover or cluster-randomized trials) effect estimates may be available directly.

5.7 Managing and sharing data

When data have been collected for each individual study, it is helpful to organize them into a comprehensive electronic format, such as a database or spreadsheet, before entering data into a meta-analysis or other synthesis. When data are collated electronically, all or a subset of them can easily be exported for cleaning, consistency checks and analysis.

Tabulation of collected information about studies can facilitate classification of studies into appropriate comparisons and subgroups. It also allows identification of comparable outcome measures and statistics across studies. It will often be necessary to perform calculations to obtain the required statistics for presentation or synthesis. It is important through this process to retain clear information on the provenance of the data, with a clear distinction between data from a source document and data obtained through calculations. Statistical conversions, for example from standard errors to standard deviations, ideally should be undertaken with a computer rather than using a hand calculator to maintain a permanent record of the original and calculated numbers as well as the actual calculations used.

Ideally, data only need to be extracted once and should be stored in a secure and stable location for future updates of the review, regardless of whether the original review authors or a different group of authors update the review (Ip et al 2012). Standardizing and sharing data collection tools as well as data management systems among review authors working in similar topic areas can streamline systematic review production. Review authors have the opportunity to work with trialists, journal editors, funders, regulators, and other stakeholders to make study data (e.g. CSRs, IPD, and any other form of study data) publicly available, increasing the transparency of research. When legal and ethical to do so, we encourage review authors to share the data used in their systematic reviews to reduce waste and to allow verification and reanalysis because data will not have to be extracted again for future use (Mayo-Wilson et al 2018).

5.8 Chapter information

Editors: Tianjing Li, Julian PT Higgins, Jonathan J Deeks

Acknowledgements: This chapter builds on earlier versions of the Handbook . For details of previous authors and editors of the Handbook , see Preface. Andrew Herxheimer, Nicki Jackson, Yoon Loke, Deirdre Price and Helen Thomas contributed text. Stephanie Taylor and Sonja Hood contributed suggestions for designing data collection forms. We are grateful to Judith Anzures, Mike Clarke, Miranda Cumpston and Peter Gøtzsche for helpful comments.

Funding: JPTH is a member of the National Institute for Health Research (NIHR) Biomedical Research Centre at University Hospitals Bristol NHS Foundation Trust and the University of Bristol. JJD received support from the NIHR Birmingham Biomedical Research Centre at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham. JPTH received funding from National Institute for Health Research Senior Investigator award NF-SI-0617-10145. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

5.9 References

Al-Marzouki S, Evans S, Marshall T, Roberts I. Are these data real? Statistical methods for the detection of data fabrication in clinical trials. BMJ 2005; 331 : 267-270.

Allen EN, Mushi AK, Massawe IS, Vestergaard LS, Lemnge M, Staedke SG, Mehta U, Barnes KI, Chandler CI. How experiences become data: the process of eliciting adverse event, medical history and concomitant medication reports in antimalarial and antiretroviral interaction trials. BMC Medical Research Methodology 2013; 13 : 140.

Baudard M, Yavchitz A, Ravaud P, Perrodeau E, Boutron I. Impact of searching clinical trial registries in systematic reviews of pharmaceutical treatments: methodological systematic review and reanalysis of meta-analyses. BMJ 2017; 356 : j448.

Bent S, Padula A, Avins AL. Better ways to question patients about adverse medical events: a randomized, controlled trial. Annals of Internal Medicine 2006; 144 : 257-261.

Berlin JA. Does blinding of readers affect the results of meta-analyses? University of Pennsylvania Meta-analysis Blinding Study Group. Lancet 1997; 350 : 185-186.

Buscemi N, Hartling L, Vandermeer B, Tjosvold L, Klassen TP. Single data extraction generated more errors than double data extraction in systematic reviews. Journal of Clinical Epidemiology 2006; 59 : 697-703.

Carlisle JB, Dexter F, Pandit JJ, Shafer SL, Yentis SM. Calculating the probability of random sampling for continuous variables in submitted or published randomised controlled trials. Anaesthesia 2015; 70 : 848-858.

Carroll C, Patterson M, Wood S, Booth A, Rick J, Balain S. A conceptual framework for implementation fidelity. Implementation Science 2007; 2 : 40.

Carvajal A, Ortega PG, Sainz M, Velasco V, Salado I, Arias LHM, Eiros JM, Rubio AP, Castrodeza J. Adverse events associated with pandemic influenza vaccines: Comparison of the results of a follow-up study with those coming from spontaneous reporting. Vaccine 2011; 29 : 519-522.

Chamberlain C, O'Mara-Eves A, Porter J, Coleman T, Perlen SM, Thomas J, McKenzie JE. Psychosocial interventions for supporting women to stop smoking in pregnancy. Cochrane Database of Systematic Reviews 2017; 2 : CD001055.

Damschroder LJ, Aron DC, Keith RE, Kirsh SR, Alexander JA, Lowery JC. Fostering implementation of health services research findings into practice: a consolidated framework for advancing implementation science. Implementation Science 2009; 4 : 50.

Davis AL, Miller JD. The European Medicines Agency and publication of clinical study reports: a challenge for the US FDA. JAMA 2017; 317 : 905-906.

Denniston AK, Holland GN, Kidess A, Nussenblatt RB, Okada AA, Rosenbaum JT, Dick AD. Heterogeneity of primary outcome measures used in clinical trials of treatments for intermediate, posterior, and panuveitis. Orphanet Journal of Rare Diseases 2015; 10 : 97.

Derry S, Loke YK. Risk of gastrointestinal haemorrhage with long term use of aspirin: meta-analysis. BMJ 2000; 321 : 1183-1187.

Doshi P, Dickersin K, Healy D, Vedula SS, Jefferson T. Restoring invisible and abandoned trials: a call for people to publish the findings. BMJ 2013; 346 : f2865.

Dusenbury L, Brannigan R, Falco M, Hansen WB. A review of research on fidelity of implementation: implications for drug abuse prevention in school settings. Health Education Research 2003; 18 : 237-256.

Dwan K, Altman DG, Clarke M, Gamble C, Higgins JPT, Sterne JAC, Williamson PR, Kirkham JJ. Evidence for the selective reporting of analyses and discrepancies in clinical trials: a systematic review of cohort studies of clinical trials. PLoS Medicine 2014; 11 : e1001666.

Elia N, von Elm E, Chatagner A, Popping DM, Tramèr MR. How do authors of systematic reviews deal with research malpractice and misconduct in original studies? A cross-sectional analysis of systematic reviews and survey of their authors. BMJ Open 2016; 6 : e010442.

Gøtzsche PC. Multiple publication of reports of drug trials. European Journal of Clinical Pharmacology 1989; 36 : 429-432.

Gøtzsche PC, Hróbjartsson A, Maric K, Tendal B. Data extraction errors in meta-analyses that use standardized mean differences. JAMA 2007; 298 : 430-437.

Gross A, Schirm S, Scholz M. Ycasd - a tool for capturing and scaling data from graphical representations. BMC Bioinformatics 2014; 15 : 219.

Hoffmann TC, Glasziou PP, Boutron I, Milne R, Perera R, Moher D, Altman DG, Barbour V, Macdonald H, Johnston M, Lamb SE, Dixon-Woods M, McCulloch P, Wyatt JC, Chan AW, Michie S. Better reporting of interventions: template for intervention description and replication (TIDieR) checklist and guide. BMJ 2014; 348 : g1687.

ICH. ICH Harmonised tripartite guideline: Struture and content of clinical study reports E31995. ICH1995. www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E3/E3_Guideline.pdf .

Ioannidis JPA, Mulrow CD, Goodman SN. Adverse events: The more you search, the more you find. Annals of Internal Medicine 2006; 144 : 298-300.

Ip S, Hadar N, Keefe S, Parkin C, Iovin R, Balk EM, Lau J. A web-based archive of systematic review data. Systematic Reviews 2012; 1 : 15.

Ismail R, Azuara-Blanco A, Ramsay CR. Variation of clinical outcomes used in glaucoma randomised controlled trials: a systematic review. British Journal of Ophthalmology 2014; 98 : 464-468.

Jadad AR, Moore RA, Carroll D, Jenkinson C, Reynolds DJM, Gavaghan DJ, McQuay H. Assessing the quality of reports of randomized clinical trials: Is blinding necessary? Controlled Clinical Trials 1996; 17 : 1-12.

Jelicic Kadic A, Vucic K, Dosenovic S, Sapunar D, Puljak L. Extracting data from figures with software was faster, with higher interrater reliability than manual extraction. Journal of Clinical Epidemiology 2016; 74 : 119-123.

Jones AP, Remmington T, Williamson PR, Ashby D, Smyth RL. High prevalence but low impact of data extraction and reporting errors were found in Cochrane systematic reviews. Journal of Clinical Epidemiology 2005; 58 : 741-742.

Jones CW, Keil LG, Holland WC, Caughey MC, Platts-Mills TF. Comparison of registered and published outcomes in randomized controlled trials: a systematic review. BMC Medicine 2015; 13 : 282.

Jonnalagadda SR, Goyal P, Huffman MD. Automating data extraction in systematic reviews: a systematic review. Systematic Reviews 2015; 4 : 78.

Lewin S, Hendry M, Chandler J, Oxman AD, Michie S, Shepperd S, Reeves BC, Tugwell P, Hannes K, Rehfuess EA, Welch V, McKenzie JE, Burford B, Petkovic J, Anderson LM, Harris J, Noyes J. Assessing the complexity of interventions within systematic reviews: development, content and use of a new tool (iCAT_SR). BMC Medical Research Methodology 2017; 17 : 76.

Li G, Abbade LPF, Nwosu I, Jin Y, Leenus A, Maaz M, Wang M, Bhatt M, Zielinski L, Sanger N, Bantoto B, Luo C, Shams I, Shahid H, Chang Y, Sun G, Mbuagbaw L, Samaan Z, Levine MAH, Adachi JD, Thabane L. A scoping review of comparisons between abstracts and full reports in primary biomedical research. BMC Medical Research Methodology 2017; 17 : 181.

Li TJ, Vedula SS, Hadar N, Parkin C, Lau J, Dickersin K. Innovations in data collection, management, and archiving for systematic reviews. Annals of Internal Medicine 2015; 162 : 287-294.

Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JPA, Clarke M, Devereaux PJ, Kleijnen J, Moher D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. PLoS Medicine 2009; 6 : e1000100.

Liu ZM, Saldanha IJ, Margolis D, Dumville JC, Cullum NA. Outcomes in Cochrane systematic reviews related to wound care: an investigation into prespecification. Wound Repair and Regeneration 2017; 25 : 292-308.

Marshall IJ, Kuiper J, Wallace BC. RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. Journal of the American Medical Informatics Association 2016; 23 : 193-201.

Mayo-Wilson E, Doshi P, Dickersin K. Are manufacturers sharing data as promised? BMJ 2015; 351 : h4169.

Mayo-Wilson E, Li TJ, Fusco N, Bertizzolo L, Canner JK, Cowley T, Doshi P, Ehmsen J, Gresham G, Guo N, Haythomthwaite JA, Heyward J, Hong H, Pham D, Payne JL, Rosman L, Stuart EA, Suarez-Cuervo C, Tolbert E, Twose C, Vedula S, Dickersin K. Cherry-picking by trialists and meta-analysts can drive conclusions about intervention efficacy. Journal of Clinical Epidemiology 2017a; 91 : 95-110.

Mayo-Wilson E, Fusco N, Li TJ, Hong H, Canner JK, Dickersin K, MUDS Investigators. Multiple outcomes and analyses in clinical trials create challenges for interpretation and research synthesis. Journal of Clinical Epidemiology 2017b; 86 : 39-50.

Mayo-Wilson E, Li T, Fusco N, Dickersin K. Practical guidance for using multiple data sources in systematic reviews and meta-analyses (with examples from the MUDS study). Research Synthesis Methods 2018; 9 : 2-12.

Meade MO, Richardson WS. Selecting and appraising studies for a systematic review. Annals of Internal Medicine 1997; 127 : 531-537.

Meinert CL. Clinical trials dictionary: Terminology and usage recommendations . Hoboken (NJ): Wiley; 2012.

Millard LAC, Flach PA, Higgins JPT. Machine learning to assist risk-of-bias assessments in systematic reviews. International Journal of Epidemiology 2016; 45 : 266-277.

Moher D, Schulz KF, Altman DG. The CONSORT Statement: revised recommendations for improving the quality of reports of parallel-group randomised trials. Lancet 2001; 357 : 1191-1194.

Moher D, Hopewell S, Schulz KF, Montori V, Gøtzsche PC, Devereaux PJ, Elbourne D, Egger M, Altman DG. CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials. BMJ 2010; 340 : c869.

Moore GF, Audrey S, Barker M, Bond L, Bonell C, Hardeman W, Moore L, O'Cathain A, Tinati T, Wight D, Baird J. Process evaluation of complex interventions: Medical Research Council guidance. BMJ 2015; 350 : h1258.

Orwin RG. Evaluating coding decisions. In: Cooper H, Hedges LV, editors. The Handbook of Research Synthesis . New York (NY): Russell Sage Foundation; 1994. p. 139-162.

Page MJ, McKenzie JE, Kirkham J, Dwan K, Kramer S, Green S, Forbes A. Bias due to selective inclusion and reporting of outcomes and analyses in systematic reviews of randomised trials of healthcare interventions. Cochrane Database of Systematic Reviews 2014; 10 : MR000035.

Ross JS, Mulvey GK, Hines EM, Nissen SE, Krumholz HM. Trial publication after registration in ClinicalTrials.Gov: a cross-sectional analysis. PLoS Medicine 2009; 6 .

Safer DJ. Design and reporting modifications in industry-sponsored comparative psychopharmacology trials. Journal of Nervous and Mental Disease 2002; 190 : 583-592.

Saldanha IJ, Dickersin K, Wang X, Li TJ. Outcomes in Cochrane systematic reviews addressing four common eye conditions: an evaluation of completeness and comparability. PloS One 2014; 9 : e109400.

Saldanha IJ, Li T, Yang C, Ugarte-Gil C, Rutherford GW, Dickersin K. Social network analysis identified central outcomes for core outcome sets using systematic reviews of HIV/AIDS. Journal of Clinical Epidemiology 2016; 70 : 164-175.

Saldanha IJ, Lindsley K, Do DV, Chuck RS, Meyerle C, Jones LS, Coleman AL, Jampel HD, Dickersin K, Virgili G. Comparison of clinical trial and systematic review outcomes for the 4 most prevalent eye diseases. JAMA Ophthalmology 2017a; 135 : 933-940.

Saldanha IJ, Li TJ, Yang C, Owczarzak J, Williamson PR, Dickersin K. Clinical trials and systematic reviews addressing similar interventions for the same condition do not consider similar outcomes to be important: a case study in HIV/AIDS. Journal of Clinical Epidemiology 2017b; 84 : 85-94.

Stewart LA, Clarke M, Rovers M, Riley RD, Simmonds M, Stewart G, Tierney JF, PRISMA-IPD Development Group. Preferred reporting items for a systematic review and meta-analysis of individual participant data: the PRISMA-IPD statement. JAMA 2015; 313 : 1657-1665.

Stock WA. Systematic coding for research synthesis. In: Cooper H, Hedges LV, editors. The Handbook of Research Synthesis . New York (NY): Russell Sage Foundation; 1994. p. 125-138.

Tramèr MR, Reynolds DJ, Moore RA, McQuay HJ. Impact of covert duplicate publication on meta-analysis: a case study. BMJ 1997; 315 : 635-640.

Turner EH. How to access and process FDA drug approval packages for use in research. BMJ 2013; 347 .

von Elm E, Poglia G, Walder B, Tramèr MR. Different patterns of duplicate publication: an analysis of articles used in systematic reviews. JAMA 2004; 291 : 974-980.

Wager E. Coping with scientific misconduct. BMJ 2011; 343 : d6586.

Wieland LS, Rutkow L, Vedula SS, Kaufmann CN, Rosman LM, Twose C, Mahendraratnam N, Dickersin K. Who has used internal company documents for biomedical and public health research and where did they find them? PloS One 2014; 9 .

Zanchetti A, Hansson L. Risk of major gastrointestinal bleeding with aspirin (Authors' reply). Lancet 1999; 353 : 149-150.

Zarin DA, Tse T, Williams RJ, Califf RM, Ide NC. The ClinicalTrials.gov results database: update and key issues. New England Journal of Medicine 2011; 364 : 852-860.

Zwarenstein M, Treweek S, Gagnier JJ, Altman DG, Tunis S, Haynes B, Oxman AD, Moher D. Improving the reporting of pragmatic trials: an extension of the CONSORT statement. BMJ 2008; 337 : a2390.

For permission to re-use material from the Handbook (either academic or commercial), please see here for full details.

Systematic Reviews and Meta-Analyses: Data Extraction

  • Get Started
  • Exploratory Search
  • Where to Search
  • How to Search
  • Grey Literature
  • What about errata and retractions?
  • Eligibility Screening
  • Critical Appraisal

Data Extraction

  • Synthesis & Discussion
  • Assess Certainty
  • Share & Archive

Systematic methods for extracting data from all relevant studies is an important step leading synthesis . This step often occurs simultaneously with the Critical Appraisal  phase.

  • Presenting Results

Data extraction, sometimes referred to as data collection  or data abstraction , refers to the process of extracting and organizing the information from each included (relevant) study.

The synthesis approach(es) (e.g.,  meta-analysis, framework synthesis ) that you intend to use will inform data extraction . 

Process Details

Just like all other stages of a systematic review,  2 data extractors  should extract data from in  each included reference . The exact procedure may vary according to your resource capacity. For example, you may have a team of 10 extractors in 5 pairs of 2 extracting data from chunks of the included material, if managing a large corpus.

Note:  experience in the field does not necessarily increase the accuracy of this process . See Horton et al., (2010) 'Systematic review data extraction: cross-sectional study showed that experience did not increase accuracy' , and Jones et al., (2005) 'High prevalence but low impact of data extraction and reporting errors were found in Cochrane systematic reviews' for more on this topic' .

Note: Effect Size Measurements

Defining ahead of time which measurement of effect(s) will be relevant and useful  is important, especially if you hope to pursue a meta-analysis . Though it is unlikely that all  of your studies will produce the same measurement of effect (e.g., odds ratio, relative risk ratio), many of these measurements can be transformed or converted to the measurement you need for your meta-analysis.

If converting effect sizes , be sure to provide enough detail about this process in your manuscript such that another team could replicate. It is best to collect the original outputs from articles  before  converting effect sizes. There are tools available for converting effect sizes such as the Campbell Collaboration's tool for calculating or converting effect sizes and the effect size converter from MIT .

Data Extraction Templates

Data extraction is often performed using a  single form to extract data from all included (relevant) studies in a uniform manner . Because the data extraction stage is driven by the scope and goals of a systematic review, there is not a gold standard or one-size-fits all approach to developing a data extraction form.

However, there are templates and guidance available to help in the creation of your forms .

Because it is standard to include the data extraction form in the supplemental material of a systematic review and/or meta-analysis, you may also consider the forms developed and/or used during similar, already published and/or in-progress reviews  

As is the case with the critical appraisal , the type of data you are able to extract will also depend on the study design . Therefore, it is likely that the exact data you extract from each individual article will vary somewhat. 

Data Extraction Form Templates

Cochrane  |  One form for randomized controlled trials (RCTs) only; one form for RCTs and non-RCTs

Joanna Briggs Institute (JBI) |  Several forms located in each relevant chapter:

  • Qualitative data (appendix 2.3)
  • Text and opinion data (appendix 4.3)
  • Prevalence studies (prevalence data; appendix 5.2)
  • Mixed method (convergent integrated approach; appendix 8.1)
  • Diagnostic test accuracy (appendix 9.3)
  • Measurement properties (appendix 12.1) with table of results template (appendix 12.2)

Present Data Extracted

Data extracted from each reference is presented as a summary table or summary of findings table  and described in the narrative .

Summary Tables

A summary table,  like the examples seen below, provides readers with quick glance summary of study details that are important to the systematic review and/or meta-analysis. Similarly to the other stages of a review, what you collect and report will depend on the scope   of the review and the type of synthesis you plan to conduct.

6(4), 257–267.

, 10(1), 1–105.

It may be appropriate to include more than one summary table . For example, one table may present basic information about the study such as author names, year of publication, year(s) the study was conducted, study design, funding agency, etc.; Another table may present details more specific to the qualitative synthesis; A third table may present information specifically relevant to the meta-analysis, with effect sizes, confidence intervals, etc. Additionally, it is best practice to have one summary table  for  each outcome.

Methodological Guidance

  • Health Sciences
  • Animal, Food Sciences
  • Social Sciences
  • Environmental Sciences

Cochrane Handbook  -  Part 2: Core Methods

Chapter 5 : Collecting data

  • 5.2 Sources of data
  • 5.3 What data to collect
  • 5.4 Data collection tools
  • 5.5 Extracting data from reports
  • 5.6 Extracting study results and converting to the desired format
  • 5.7 Managing and sharing data

Chapter 6 : Choosing effect measures and computing estimates of effect

  • 6.1 Types of data and effect measures
  • 6.2 Study designs and identifying the unit of analysis
  • 6.3 Extracting estimates of effect directly
  • 6.4 Dichotomous outcome data
  • 6.5 Continuous outcome data
  • 6.6 Ordinal outcome data and measurement scales
  • 6.7 Count and rate data
  • 6.8 Time-to-event data 
  • 6.9 Conditional outcomes only available for subsets of participants 

SYREAF Protocols 

Step 4: data extraction.

Conducting systematic reviews of intervention questions II: Relevance screening, data extraction , assessing risk of bias, presenting the results and interpreting the findings.  Sargeant JM, O’Connor AM. Zoonoses Public Health. 2014 Jun;61 Suppl 1:39-51. doi: 10.1111/zph.12124. PMID: 24905995

Study designs and systematic reviews of interventions: building evidence across study designs.  Sargeant JM, Kelton DF, O’Connor AM. Zoonoses Public Health. 2014 Jun;61 Suppl 1:10-7. doi: 10.1111/zph.12127. PMID: 24905992

Randomized controlled trials and challenge trials: Design and criterion for validity.  Sargeant JM, Kelton DF, O’Connor AM,Zoon. Public Health. 2014. 61 (S1); 18 – 27. PMID: 24905993

Campbell -  MECCIR

C43. Using data collection forms  ( protocol & review / final manuscript )

C44. Describing studies ( review / final manuscript )

C45. Extracting study characteristics and outcome data in duplicate  ( protocol & review / final manuscript )

C46. Making maximal use of data  ( protocol & review / final manuscript )

C47. Examining errata  ( review / final manuscript )

C49. Choosing intervention groups in multi-arm studies  ( protocol & review / final manuscript )

C50. Checking accuracy of numeric data in the review ( review / final manuscript )

CEE  -  Guidelines and Standards for Evidence synthesis in Environmental Management

Section 6. data coding and data extraction.

CEE Standards for conduct and reporting

6.3   Assessing agreement between data coders/extractors

6.4   Data coding

6.5 Data extraction

Reporting in Protocol and Final Manuscript

  • Final Manuscript

In the Protocol |  PRISMA-P

Data collection process (item 11c).

...forms should be developed a priori and included in the published or otherwise available review protocol as an appendix or as online supplementary materials

Include strategies for reducing error:

"...level of reviewer experience has not been shown to affect extraction error rates. As such, additional strategies planned to reduce errors, such as training of reviewers and piloting of extraction forms should be described."

Include how to handle  missing information:

"...in the absence of complete descriptions of treatments, outcomes, effect estimates, or other important information, reviewers may consider asking authors for this information. Whether reviewers plan to contact authors of included studies and how this will be done (such as a maximum of three email attempts) to obtain missing information should be documented in the protocol."

Data Items (Item 12)

List and define all variables for which data will be sought (such as PICO items, funding sources) and any pre-planned data assumptions and simplifications

Include any assumptions by extractors:

"...describe assumptions they intend to make if they encounter missing or unclear information and explain how they plan to deal with such data or lack thereof"

Outcomes and Prioritization (Item 13)

List and define all outcomes for which data will be sought, including prioritisation of main and additional outcomes, with rationale

In the Final Manuscript |  PRISMA

Data collection process (item 9; report in  methods ), essential items.

  • Report how many reviewers collected data from each report, whether multiple reviewers worked independently or not (for example, data collected by one reviewer and checked by another), and any processes used to resolve disagreements between data collectors.
  • Report any processes used to obtain or confirm relevant data from study investigators (such as how they were contacted, what data were sought, and success in obtaining the necessary information).
  • If any automation tools were used to collect data, report how the tool was used (such as machine learning models to extract sentences from articles relevant to the PICO characteristics), how the tool was trained, and what internal or external validation was done to understand the risk of incorrect extractions .
  • If articles required translation into another language to enable data collection, report how these articles were translated (for example, by asking a native speaker or by using software programs).
  • If any software was used to extract data from figures, specify the software used.
  • If any decision rules were used to select data from multiple reports corresponding to a study, and any steps were taken to resolve inconsistencies across reports, report the rules and steps used.

Data Items (Item 10; report in  methods )

  • List and define the outcome domains and time frame of measurement for which data were sought  (Item 10a)
  • Specify whether all results that were compatible with each outcome domain in each study were sought , and, if not, what process was used to select results within eligible domains  (Item 10a)
  • If any changes were made to the inclusion or definition of the outcome domains or to the importance given to them in the review, specify the changes, along with a rationale  (Item 10a)
  • If any changes were made to the processes used to select results within eligible outcome domains, specify the changes, along with a rationale  (Item 10a)
  • List and define all other variables for which data were sought . It may be sufficient to report a brief summary of information collected if the data collection and dictionary forms are made available (for example, as additional files or deposited in a publicly available repository)  (Item 10b)
  • Describe any assumptions made about any missing or unclear information from the studies. For example, in a study that includes “children and adolescents,” for which the investigators did not specify the age range, authors might assume that the oldest participants would be 18 years, based on what was observed in similar studies included in the review, and should report that assumption  (Item 10b)
  • If a tool was used to inform which data items to collect (such as the Tool for Addressing Conflicts of Interest in Trials (TACIT) or a tool for recording intervention details), cite the tool used  (Item 10b)

Additional Items

Consider specifying which outcome domains were considered the most important for interpreting the review’s conclusions (such as “critical” versus “important” outcomes) and provide rationale for the labelling (such as “a recent core outcome set identified the outcomes labelled ‘critical’ as being the most important to patients”)  (Item 10a)

Effect Measures (Item 12; report in  methods )

  • Specify for each outcome or type of outcome (such as binary, continuous) the effect measure(s) (such as risk ratio, mean difference) used in the synthesis or presentation of results.
  • State any thresholds or ranges used to interpret the size of effect (such as minimally important difference; ranges for no/trivial, small, moderate, and large effects) and the rationale for these thresholds.
  • If synthesised results were re-expressed to a different effect measure , report the methods used to re-express results (such as meta-analysing risk ratios and computing an absolute risk reduction based on an assumed comparator risk)

Study Characteristics (Item 17; report in  results )

  • Cite each included study
  • Present the key characteristics of each study in a table or figure (considering a format that will facilitate comparison of characteristics across the studies)

If the review examines the effects of interventions, consider presenting an additional table that summarises the intervention details for each study

Results of Individual Studies (Item 19; report in  results )

  • For all outcomes , irrespective of whether statistical synthesis was undertaken, present for each study summary statistics for each group (where appropriate). For dichotomous outcomes, report the number of participants with and without the events for each group; or the number with the event and the total for each group (such as 12/45). For continuous outcomes, report the mean, standard deviation, and sample size of each group.
  • For all outcomes , irrespective of whether statistical synthesis was undertaken, present for each study an effect estimate and its precision (such as standard error or 95% confidence/credible interval). For example, for time-to-event outcomes, present a hazard ratio and its confidence interval.
  • If study-level data are presented visually or reported in the text (or both), also present a tabular display of the results .
  • If results were obtained from multiple sources (such as journal article, study register entry, clinical study report, correspondence with authors), report the source of the data . This need not be overly burdensome. For example, a statement indicating that, unless otherwise specified, all data came from the primary reference for each included study would suffice. Alternatively, this could be achieved by, for example, presenting the origin of each data point in footnotes, in a column of the data table, or as a hyperlink to relevant text highlighted in reports (such as using SRDR Data Abstraction Assistant139).
  • If applicable, indicate which results were not reported directly and had to be computed or estimated from other information (see item #13b)
  • << Previous: Critical Appraisal
  • Next: Synthesis & Discussion >>
  • Last Updated: Jun 13, 2024 12:34 PM
  • URL: https://guides.lib.vt.edu/SRMA

literature review data extraction

Scoping & Systematic Reviews

  • Step 1: Complete Pre-Review Tasks
  • Step 2: Develop a Protocol
  • Step 3: Conduct a Literature Search
  • Step 4: Manage Citations
  • Step 5: Screen Citations
  • Step 6: Assess Quality of Included Studies (Optional for Scoping Reviews)
  • Step 7: Data Extraction & Charting

About Step 7: Data Extraction & Charting

About data extraction (charting), select a tool, data extraction templates/examples.

  • Step 8: Write the Review
  • Systematic & Scoping Review Service
  • Contact a Librarian in Your Field This link opens in a new window

Creative Commons Licence

  • Librarian Role
                                           

In Step 7, you will skim the full text of included articles to collect information about the studies in a table format (extract data), to summarize the studies and make them easier to compare. You will: 

  • Make sure you have collected the full text of any included articles.
  • Choose the pieces of information you want to collect from each study.
  • Choose a method for collecting the data.
  • Create the data extraction table.
  • Test the data collection table (optional). 
  • Collect (extract) the data. 
  • Review the data collected for any errors. 

For accuracy, two or more people should extract data from each study. This process can be done by hand or by using a computer program. 

If you reach the data extraction step and choose to exclude articles for any reason, update the number of included and excluded studies in your PRISMA flow diagram.

A  librarian can  advise you on data extraction and charting for your review, including:

  • What the data extraction stage of the review entails
  • Finding examples in the literature of similar reviews and their completed data tables
  • How to choose what data to extract from your included articles 
  • How to create a randomized sample of citations for a pilot test
  • Export specific data elements from the included studies like title, authors, publication date, citation, & DOI to a Google Sheet for you to use in your data extraction.
  • Best practices for reporting your included studies and their important data in your review

In this step of the systematic or scoping review, you will develop your evidence tables, which give detailed information for each study (perhaps using a PICO or PCC framework as a guide), and summary tables, which give a high-level overview of the findings of your review. You can create evidence and summary tables to describe study characteristics, results, or both. These tables will help you determine which studies, if any, are eligible for quantitative synthesis.

Data extraction (charting) requires a lot of planning.  We will review some of the tools you can use for data extraction (charting), and the types of information you will want to extract

How many people should extract data?

The Cochrane Handbook and other studies strongly suggest  at least two  reviewers and extractors to reduce the number of errors. The librarian usually does not help with the data extraction but may assist in preparing for the data extraction such as creating spreadsheets, etc.

There are benefits and limitations to each method of data extraction.  You will want to consider:

  • The cost of the software / tool
  • Shareability / versioning
  • Existing versus custom data extraction forms
  • The data entry process
  • Interrater reliability

For example, in Covidence you may spend more time building your data extraction form, but save time later in the extraction process as Covidence can automatically highlight discrepancies for review and resolution between different extractors. Excel may require less time investment to create an extraction form, but it may take longer for you to match and compare data between extractors. More in-depth comparison of the benefits and limitations of each extraction tool can be found in the table below.

Review Software ( )
Spreadsheets (Excel, Google Sheets)

Cochrane Revman

Survey or Form Software (Poll Everywhere, Qualtrics, etc.)
Electronic documents (Word, Google Docs)
  • Scoping Reviews
  • Systematic Reviews

In a recent article by Pollock et al (2023), a recommendation of what items should be collected in the data extraction phase. They recommend creating 2 tables; the first table includes basic information about the article (which they call the Guidance sheet) and then the 2nd table includes more detailed information.

  • Pollock, D., Peters, M. D. J., Khalil, H., McInerney, P., Alexander, L., Tricco, A. C., Evans, C., de Moraes, É. B., Godfrey, C. M., Pieper, D., Saran, A., Stern, C., & Munn, Z. (2023). Recommendations for the extraction, analysis, and presentation of results in scoping reviews .  JBI evidence synthesis ,  21 (3), 520–532. https://doi.org/10.11124/JBIES-22-00123
  • Sample Tables in Google Sheet s. Download it to your computer.

Other templates:

JBI's recommended scoping review data extraction instrument for study details, characteristics and results extraction.

An example data extraction table from the PRISMA-ScR.

Your protocol should include a plan for how you will present your results.

Your PCC inclusion criteria will assist you in choosing how the data should be mapped most appropriately, but you can refine this toward the end of the review, when you have a better picture of the sort of data available in your included studies.

The results of a scoping review may be presented in your final paper in a variety of ways, including:

  • tables and charts, featuring distribution of studies by year or period of publication, countries of origin, area of intervention (clinical, policy, educational, etc.) and research methods; and/or
  • in a descriptive format that aligns with the review objective/s and scope.

The latest guidance (Pollock et al. 2023) encourages 'creative approaches...to convey results to the reader in an understandable way' such as word clouds, honeycombs, heat maps, tree graphs, iconography, waffle charts and interactive resources.

Note : If you present your data in a table/chart, also include a narrative summary to explain how the results relate to your review objectives and questions. 

Pollock et al. 2023, JBI Evidence Synthesis, 21(3): 520–532.

JBI advise ( 11.3.8.1 Search results ) results can be classified under main conceptual categories, such as:

  • intervention type
  • population (and sample size, if it is the case)
  • duration of intervention
  • methodology adopted
  • key findings (evidence established)
  • gaps in the research

'For each category reported, a clear explanation should be provided.'

JBI Manual for Evidence Synthesis, Chapter 11: Scoping Reviews: 11.3.8.1 Search results

JBI Manual for Evidence Synthesis, Chapter 11: Scoping Reviews: 11.3.8.1 Review findings

Joanna Briggs Institute also has a template for data collection and extraction for systematic reviews in section 12.2.9.

  • Aromataris E, Munn Z, (editors) (2020). JBI Manual for Evidence Synthesis . JBI. Available from:  https://synthesismanual.jbi.global .  https://doi.org/10.46658/JBIMES-20-01

Cochrane Manual Handbook for Systematic Reviews includes a chapter on data collection and extraction in section 5-3 .  

  • Li T, Higgins JPT, Deeks JJ (editors) (20230. Chapter 5: Collecting data. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors).  Cochrane Handbook for Systematic Reviews of Interventions  version 6.4 (updated August 2023). Cochrane. Available from  https://training.cochrane.org/handbook/current/chapter-05#section-5-3

Sample information to include in an extraction table

It may help to consult other similar systematic reviews to identify what data to collect or to think about your question in a  framework such as PICO .

Helpful data for an intervention question may include:

  • Information about the article (author(s), year of publication, title, DOI)
  • Information about the study (study type, participant recruitment / selection / allocation, level of evidence, study quality)
  • Patient demographics (age, sex, ethnicity, diseases / conditions, other characteristics related to the intervention / outcome)
  • Intervention (quantity, dosage, route of administration, format, duration, time frame, setting)
  • Outcomes (quantitative and / or qualitative)

If you plan to synthesize data, you will want to collect additional information such as sample sizes, effect sizes, dependent variables, reliability measures, pre-test data, post-test data, follow-up data, and statistical tests used.

Extraction templates and approaches should be determined by the needs of the specific review.    For example, if you are extracting qualitative data, you will want to extract data such as theoretical framework, data collection method, or role of the researcher and their potential bias.

Supplementary Guidance for Inclusion of Qualitative Research in Cochrane Systematic Reviews of Interventions (Cochrane Collaboration Qualitative Methods Group)

  • << Previous: Step 6: Assess Quality of Included Studies (Optional for Scoping Reviews)
  • Next: Step 8: Write the Review >>
  • Last Updated: May 7, 2024 10:02 AM
  • URL: https://apu.libguides.com/reviews

Banner

  • JABSOM Library

Systematic Review Toolbox

Data extraction.

  • Guidelines & Rubrics
  • Databases & Indexes
  • Reference Management
  • Quality Assessment
  • Data Analysis
  • Manuscript Development
  • Software Comparison
  • Systematic Searching This link opens in a new window
  • Authorship Determination This link opens in a new window
  • Critical Appraisal Tools This link opens in a new window

Requesting Research Consultation

The Health Sciences Library provides consultation services for University of Hawaiʻi-affiliated students, staff, and faculty. The John A. Burns School of Medicine Health Sciences Library does not have staffing to conduct or assist researchers unaffiliated with the University of Hawaiʻi. Please utilize the publicly available guides and support pages that address research databases and tools.

Before Requesting Assistance

Before requesting systematic review assistance from the librarians, please review the relevant guides and the various pages of the Systematic Review Toolbox . Most inquiries received have been answered there previously. Support for research software issues is limited to help with basic installation and setup. Please contact the software developer directly if further assistance is needed.

Data extraction is the process of extracting the relevant pieces of information from the studies you have assessed for eligibility in your review and organizing the information in a way that will help you synthesize the studies and draw conclusions.

Extracting data from reviewed studies should be done in accordance to pre-established guidelines, such as the ones from PRISMA . From each included study, the following data may need to be extracted, depending on the review's purpose: title, author, year, journal, research question and specific aims, conceptual framework, hypothesis, research methods or study type, and concluding points. Special attention should be paid to the methodology, in order to organize studies by study type category in the review results section. If a meta-analysis is also being completed, extract raw and refined data from each result in the study.

Established frameworks for extracting data have been created. Common templates are offered by Cochrane  and supplementary resources have been collected by the George Washington University Libraries . Other forms are built into systematic review manuscript development software (e.g., Covidence, RevMan), although many scholars prefer to simply use Excel to collect data.

Covidence

RevMan

JBI SUMARI

Excel for Systematic Reviews

  • Data Collection Form A template developed by the Cochrane Collaboration for data extraction of both RCTs and non-RCTs in a systematic review
  • Data Extraction Template A comprehensive template for systematic reviews developed by the Cochrane Haematological Malignancies Group
  • A Framework for Developing a Coding Scheme for Meta-Analysis
  • << Previous: Reference Management
  • Next: Quality Assessment >>
  • Last Updated: Sep 20, 2023 9:14 AM
  • URL: https://hslib.jabsom.hawaii.edu/systematicreview

Health Sciences Library, John A. Burns School of Medicine, University of Hawai‘i at Mānoa, 651 Ilalo Street, MEB 101, Honolulu, HI 96813 - Phone: 808-692-0810, Fax: 808-692-1244

Copyright © 2004-2024. All rights reserved. Library Staff Page - Other UH Libraries

icon

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • Write for Us
  • BMJ Journals

You are here

  • Volume 24, Issue 2
  • Five tips for developing useful literature summary tables for writing review articles
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0003-0157-5319 Ahtisham Younas 1 , 2 ,
  • http://orcid.org/0000-0002-7839-8130 Parveen Ali 3 , 4
  • 1 Memorial University of Newfoundland , St John's , Newfoundland , Canada
  • 2 Swat College of Nursing , Pakistan
  • 3 School of Nursing and Midwifery , University of Sheffield , Sheffield , South Yorkshire , UK
  • 4 Sheffield University Interpersonal Violence Research Group , Sheffield University , Sheffield , UK
  • Correspondence to Ahtisham Younas, Memorial University of Newfoundland, St John's, NL A1C 5C4, Canada; ay6133{at}mun.ca

https://doi.org/10.1136/ebnurs-2021-103417

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

Literature reviews offer a critical synthesis of empirical and theoretical literature to assess the strength of evidence, develop guidelines for practice and policymaking, and identify areas for future research. 1 It is often essential and usually the first task in any research endeavour, particularly in masters or doctoral level education. For effective data extraction and rigorous synthesis in reviews, the use of literature summary tables is of utmost importance. A literature summary table provides a synopsis of an included article. It succinctly presents its purpose, methods, findings and other relevant information pertinent to the review. The aim of developing these literature summary tables is to provide the reader with the information at one glance. Since there are multiple types of reviews (eg, systematic, integrative, scoping, critical and mixed methods) with distinct purposes and techniques, 2 there could be various approaches for developing literature summary tables making it a complex task specialty for the novice researchers or reviewers. Here, we offer five tips for authors of the review articles, relevant to all types of reviews, for creating useful and relevant literature summary tables. We also provide examples from our published reviews to illustrate how useful literature summary tables can be developed and what sort of information should be provided.

Tip 1: provide detailed information about frameworks and methods

  • Download figure
  • Open in new tab
  • Download powerpoint

Tabular literature summaries from a scoping review. Source: Rasheed et al . 3

The provision of information about conceptual and theoretical frameworks and methods is useful for several reasons. First, in quantitative (reviews synthesising the results of quantitative studies) and mixed reviews (reviews synthesising the results of both qualitative and quantitative studies to address a mixed review question), it allows the readers to assess the congruence of the core findings and methods with the adapted framework and tested assumptions. In qualitative reviews (reviews synthesising results of qualitative studies), this information is beneficial for readers to recognise the underlying philosophical and paradigmatic stance of the authors of the included articles. For example, imagine the authors of an article, included in a review, used phenomenological inquiry for their research. In that case, the review authors and the readers of the review need to know what kind of (transcendental or hermeneutic) philosophical stance guided the inquiry. Review authors should, therefore, include the philosophical stance in their literature summary for the particular article. Second, information about frameworks and methods enables review authors and readers to judge the quality of the research, which allows for discerning the strengths and limitations of the article. For example, if authors of an included article intended to develop a new scale and test its psychometric properties. To achieve this aim, they used a convenience sample of 150 participants and performed exploratory (EFA) and confirmatory factor analysis (CFA) on the same sample. Such an approach would indicate a flawed methodology because EFA and CFA should not be conducted on the same sample. The review authors must include this information in their summary table. Omitting this information from a summary could lead to the inclusion of a flawed article in the review, thereby jeopardising the review’s rigour.

Tip 2: include strengths and limitations for each article

Critical appraisal of individual articles included in a review is crucial for increasing the rigour of the review. Despite using various templates for critical appraisal, authors often do not provide detailed information about each reviewed article’s strengths and limitations. Merely noting the quality score based on standardised critical appraisal templates is not adequate because the readers should be able to identify the reasons for assigning a weak or moderate rating. Many recent critical appraisal checklists (eg, Mixed Methods Appraisal Tool) discourage review authors from assigning a quality score and recommend noting the main strengths and limitations of included studies. It is also vital that methodological and conceptual limitations and strengths of the articles included in the review are provided because not all review articles include empirical research papers. Rather some review synthesises the theoretical aspects of articles. Providing information about conceptual limitations is also important for readers to judge the quality of foundations of the research. For example, if you included a mixed-methods study in the review, reporting the methodological and conceptual limitations about ‘integration’ is critical for evaluating the study’s strength. Suppose the authors only collected qualitative and quantitative data and did not state the intent and timing of integration. In that case, the strength of the study is weak. Integration only occurred at the levels of data collection. However, integration may not have occurred at the analysis, interpretation and reporting levels.

Tip 3: write conceptual contribution of each reviewed article

While reading and evaluating review papers, we have observed that many review authors only provide core results of the article included in a review and do not explain the conceptual contribution offered by the included article. We refer to conceptual contribution as a description of how the article’s key results contribute towards the development of potential codes, themes or subthemes, or emerging patterns that are reported as the review findings. For example, the authors of a review article noted that one of the research articles included in their review demonstrated the usefulness of case studies and reflective logs as strategies for fostering compassion in nursing students. The conceptual contribution of this research article could be that experiential learning is one way to teach compassion to nursing students, as supported by case studies and reflective logs. This conceptual contribution of the article should be mentioned in the literature summary table. Delineating each reviewed article’s conceptual contribution is particularly beneficial in qualitative reviews, mixed-methods reviews, and critical reviews that often focus on developing models and describing or explaining various phenomena. Figure 2 offers an example of a literature summary table. 4

Tabular literature summaries from a critical review. Source: Younas and Maddigan. 4

Tip 4: compose potential themes from each article during summary writing

While developing literature summary tables, many authors use themes or subthemes reported in the given articles as the key results of their own review. Such an approach prevents the review authors from understanding the article’s conceptual contribution, developing rigorous synthesis and drawing reasonable interpretations of results from an individual article. Ultimately, it affects the generation of novel review findings. For example, one of the articles about women’s healthcare-seeking behaviours in developing countries reported a theme ‘social-cultural determinants of health as precursors of delays’. Instead of using this theme as one of the review findings, the reviewers should read and interpret beyond the given description in an article, compare and contrast themes, findings from one article with findings and themes from another article to find similarities and differences and to understand and explain bigger picture for their readers. Therefore, while developing literature summary tables, think twice before using the predeveloped themes. Including your themes in the summary tables (see figure 1 ) demonstrates to the readers that a robust method of data extraction and synthesis has been followed.

Tip 5: create your personalised template for literature summaries

Often templates are available for data extraction and development of literature summary tables. The available templates may be in the form of a table, chart or a structured framework that extracts some essential information about every article. The commonly used information may include authors, purpose, methods, key results and quality scores. While extracting all relevant information is important, such templates should be tailored to meet the needs of the individuals’ review. For example, for a review about the effectiveness of healthcare interventions, a literature summary table must include information about the intervention, its type, content timing, duration, setting, effectiveness, negative consequences, and receivers and implementers’ experiences of its usage. Similarly, literature summary tables for articles included in a meta-synthesis must include information about the participants’ characteristics, research context and conceptual contribution of each reviewed article so as to help the reader make an informed decision about the usefulness or lack of usefulness of the individual article in the review and the whole review.

In conclusion, narrative or systematic reviews are almost always conducted as a part of any educational project (thesis or dissertation) or academic or clinical research. Literature reviews are the foundation of research on a given topic. Robust and high-quality reviews play an instrumental role in guiding research, practice and policymaking. However, the quality of reviews is also contingent on rigorous data extraction and synthesis, which require developing literature summaries. We have outlined five tips that could enhance the quality of the data extraction and synthesis process by developing useful literature summaries.

  • Aromataris E ,
  • Rasheed SP ,

Twitter @Ahtisham04, @parveenazamali

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests None declared.

Patient consent for publication Not required.

Provenance and peer review Not commissioned; externally peer reviewed.

Read the full text or download the PDF:

A Guide to Evidence Synthesis: 10. Data Extraction

  • Meet Our Team
  • Our Published Reviews and Protocols
  • What is Evidence Synthesis?
  • Types of Evidence Synthesis
  • Evidence Synthesis Across Disciplines
  • Finding and Appraising Existing Systematic Reviews
  • 0. Develop a Protocol
  • 1. Draft your Research Question
  • 2. Select Databases
  • 3. Select Grey Literature Sources
  • 4. Write a Search Strategy
  • 5. Register a Protocol
  • 6. Translate Search Strategies
  • 7. Citation Management
  • 8. Article Screening
  • 9. Risk of Bias Assessment
  • 10. Data Extraction
  • 11. Synthesize, Map, or Describe the Results
  • Evidence Synthesis Institute for Librarians
  • Open Access Evidence Synthesis Resources

Data Extraction

Whether you plan to perform a meta-analysis or not, you will need to establish a regimented approach to extracting data. Researchers often use a form or table to capture the data they will then summarize or analyze. The amount and types of data you collect, as well as the number of collaborators who will be extracting it, will dictate which extraction tools are best for your project. Programs like Excel or Google Spreadsheets may be the best option for smaller or more straightforward projects, while systematic review software platforms can provide more robust support for larger or more complicated data.

It is recommended that you pilot your data extraction tool (especially if you will code your data) to determine if fields should be added or clarified, or if the review team needs guidance in collecting and coding data.

Data Extraction Tools

Excel is the most basic tool for the management of the screening and data extraction stages of the systematic review process. Customized workbooks and spreadsheets can be designed for the review process. A more advanced approach to using Excel for this purpose is the PIECES approach, designed by Margaret Foster at Texas A&M. The PIECES workbook is downloadable at this drive link .

Covidence is a software platform for managing independent title/abstract screening, full text screening, data extraction and risk of bias assessment in a systematic review project. Read more about how Covidence can help you customize extraction tables and export your extracted data.  

RevMan  is free software used to manage Cochrane reviews. For an overview on RevMan, including how it may be used to extract and analyze data, watch the RevMan Web Quickstart Guide or check out the RevMan Knowledge Base .

SRDR  (Systematic Review Data Repository) is a Web-based tool for the extraction and management of data for systematic review or meta-analysis. It is also an open and searchable archive of systematic reviews and their data. Access the help page  for more information.

DistillerSR

DistillerSR is a systematic review management software program, similar to Covidence. It guides reviewers in creating project-specific forms, extracting, and analyzing data. 

JBI Sumari (the Joanna Briggs Institute System for the United Management, Assessment and Review of Information) is a systematic review software platform geared toward fields such as health, social sciences, and humanities. Among the other steps of a review project, it facilitates data extraction and data synthesis. View their short introductions to data extraction and analysis for more information.

The Systematic Review Toolbox (under construction)

The SR Toolbox  is a community-driven, searchable, web-based catalogue of tools that support the systematic review process across multiple domains. Use the advanced search option to restrict to tools specific to data extraction. 

Additional Information

These resources offer additional information and examples of data extraction forms:​

  • Brown, S. A., Upchurch, S. L., & Acton, G. J. (2003). A framework for developing a coding scheme for meta-analysis.  Western Journal of Nursing Research ,  25 (2), 205–222. https://doi.org/10.1177/0193945902250038
  • Elamin, M. B., Flynn, D. N., Bassler, D., Briel, M., Alonso-Coello, P., Karanicolas, P. J., … Montori, V. M. (2009). Choice of data extraction tools for systematic reviews depends on resources and review complexity.  Journal of Clinical Epidemiology ,  62 (5), 506–510. https://doi.org/10.1016/j.jclinepi.2008.10.016
  • Li T, Higgins JPT, Deeks JJ (editors). Chapter 5: Collecting data . In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.4 (updated August 2023). Cochrane, 2023. Available from www.training.cochrane.org/handbook.
  • Research guide from the George Washington University Himmelfarb Health Sciences Library: https://guides.himmelfarb.gwu.edu/c.php?g=27797&p=170447
  • << Previous: 9. Risk of Bias Assessment
  • Next: 11. Synthesize, Map, or Describe the Results >>
  • Last Updated: Jun 14, 2024 1:30 PM
  • URL: https://guides.library.cornell.edu/evidence-synthesis

Systematic Reviews: Data Extraction/Coding/Study characteristics/Results

  • Types of literature review, methods, & resources
  • Protocol and registration
  • Search strategy
  • Medical Literature Databases to search
  • Study selection and appraisal
  • Data Extraction/Coding/Study characteristics/Results
  • Reporting the quality/risk of bias
  • Manage citations using RefWorks This link opens in a new window
  • GW Box file storage for PDF's This link opens in a new window

Data Extraction: PRISMA Item 10

The next step is for the researchers to read the full text of each article identified for inclusion in the review and  extract the pertinent data using a standardized data extraction/coding form.  The data extraction form should be as long or as short as necessary and can be coded for computer analysis if desired.

If you are writing a narrative review to summarise information reported in a small number of studies then you probably don't need to go to the trouble of coding the data variables for computer analysis but instead summarize the information from the data extraction forms for the included studies.

If you are conducting an analytical review with a meta-analysis to compare data outcomes from several clinical trials you may wish to computerize the data collection and analysis processes.   Reviewers can use fillable forms to collect and code data reported in the studies included in the review, the data can then be uploaded to analytical computer software such as Excel or SPSS for statistical analysis.  GW School of Medicine, School of Public Health, and School of Nursing faculty, staff, and students can use the various  statistical analytical software in the Himmelfarb Library , and watch online training videos from LinkedIn Learning  at the Talent@GW website to learn about how to perform statistical analysis with Excel and SPSS.

Software to help you create coded data extraction forms from templates include: Covidence ,  DistillerSR (needs subscription), EPPI Reviewer (subscription, free trial), or AHRQ's  SRDR tool  (free) which is web-based and has a training environment, tutorials, and example templates of systematic review data extraction forms .  If you prefer to design your own coded data extraction form from scratch  Elamin et al (2009)  offer advice on how to decide what electronic tools to use to extract data for analytical reviews. The process of designing a coded data extraction form and codebook are described in  Brown, Upchurch & Acton (2003)  and  Brown et al (2013) .  You should assign a unique identifying number to each variable field so they can be programmed into fillable form fields in whatever software you decide to use for data extraction/collection. You can use AHRQ's Systematic Review Data Repository  SRDR tool , or online survey forms such as Qualtrics, RedCAP , or Survey Monkey, or design and create your own coded fillable forms using Adobe Acrobat Pro or Microsoft Access.   You might like to include on the data extraction form a field for grading the quality of the study, see the Screening for quality page for examples of some of the quality scales you might choose to apply.

Three examples of a data extraction form are below:  

  • Data Extraction Form Example (suitable for small-scale literature review of a few dozen studies) This example was used to gather data for a poster reporting a literature review of studies of interventions to increase Emergency Department throughput. The poster can be downloaded from http://hsrc.himmelfarb.gwu.edu/libfacpres/62/
  • Data Extraction Form for the Cochrane Review Group (uncoded & used to extract fine-detail/many variables) This is one example of a form, illustrating the thoroughness of the Cochrane research methodology. You could devise a simpler one page data extraction form for a more simple literature review.
  • Coded data extraction form (fillable form fields that can be computerized for data analysis) See Table 1 of Brown, Upchurch & Acton (2013)

Study characteristics: PRISMA Item 18

The data extraction forms can be used to produce a summary table of study characteristics that were considered important for inclusion. 

In the final report in the results section the characteristics of the studies that were included in the review should be reported for PRISMA Item 18 as:

  • Summary PICOS (Patient/Population, Intervention, Comparison if any, Outcomes, Study Design Type) and other pertinent characteristics of the reviewed studies should be reported both in the text in the Results section and in the form of a table. Here is an example of a  table that summarizes the characteristics of studies  in a review, note this table could be improved by adding a column for the quality score you assigned to each study, or you could add a column with a value representing the time period in which the study was carried out if this might be useful for the reader to know. The summary table could either be an appendix or in the text itself if the table is small enough e.g. similar to Table 1 of Shah et al (2007) .

A bibliography of the included studies should always be created, particularly if you are intending to publish your review. Read the advice for authors page on the journal website, or ask the journal editor to advise you on what citation format the journal requires you to use. Himmelfarb Library recommends using  RefWorks  to manage your references.

Results: PRISMA Item 20

In the final report the results from individual studies should be reported for PRISMA Item 20 as follows:

For all outcomes considered (benefits or harms) from each included study write in the results section:

  • (a) simple summary data for each intervention group
  • (b) effect estimates and confidence intervals

In a review where you are reporting a binary outcome e.g. intervention vs placebo or control, and you are able to combine/pool results from several experimental studies done using the same methods on like populations in like settings, then in the results section you should report the relative strength of treatment effects from each study in your review and the combined effect outcome from your meta-analysis.  For a meta-analysis of Randomized trials you should represent the meta-analysis visually on a “forest plot”  (see fig. 2).  Here is another example of a meta-analysis  forest plot , and on page 2 a description of how to interpret it.  

If your review included heterogenous study types (ie some combination of experimental trials and observational studies) you won't be able to do a meta-analysis, then instead your analysis could follow the  Synthesis Without Meta-analysis (SWiM) guideline , and consider presenting your results in an alternative visually arresting graphic using a template in Excel or SPSS or from a web-based applications for  infographics .   GW faculty, staff, and students, may watch online training videos from  LinkedIn Learning  at the Talent@GW website  to learn how to work with charts and graphs, and design infographics.

  • << Previous: Study selection and appraisal
  • Next: Reporting the quality/risk of bias >>

Creative Commons License

  • Last Updated: Jun 10, 2024 2:14 PM
  • URL: https://guides.himmelfarb.gwu.edu/systematic_review

GW logo

  • Himmelfarb Intranet
  • Privacy Notice
  • Terms of Use
  • GW is committed to digital accessibility. If you experience a barrier that affects your ability to access content on this page, let us know via the Accessibility Feedback Form .
  • Himmelfarb Health Sciences Library
  • 2300 Eye St., NW, Washington, DC 20037
  • Phone: (202) 994-2850
  • [email protected]
  • https://himmelfarb.gwu.edu

A Systematic Literature Review on Big Data Extraction, Transformation and Loading (ETL)

  • Conference paper
  • First Online: 07 July 2021
  • Cite this conference paper

literature review data extraction

  • Joshua C. Nwokeji 10 &
  • Richard Matovu 10  

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 284))

2309 Accesses

9 Citations

Data analytics plays a vital role in contemporary organizations, through analytics, organizations are able to derive knowledge and intelligence from data to support strategic decisions. An important step in data analytics is data integration, during which historic data is gathered from various sources and integrated into a centralized repository called data warehouse. Although there are various approaches for data integration, Extract Transform and Load (ETL) has become one of the most efficient and popular approach. Over the decades, ETL has been applied to a wide range of domains such as finance, health and telecom to mention but a few. As the popularity and use of ETL grow, it becomes important to analyze and identify the trends in the research and practice of ETL. In this paper, we perform a systematic literature review to identify and analyze: (1) Approaches used to implement existing ETL solutions (2) Quality attributes to be considered while adopting any ETL approach. (3) The depth of coverage in ETL research and practice with regards to the application domains, frequency publications and geographical locations of papers. (4) The prevailing challenges in developing ETL solutions. Furthermore, we discuss the implications of our findings to ETL researchers and practitioners.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

literature review data extraction

ETL Data Pipeline to Analyze Scraped Data

literature review data extraction

A Systematic Review of Challenges, Tools, and Myths of Big Data Ingestion

literature review data extraction

Data Quality, Analytics, and Privacy in Big Data

El Akkaoui, Z., Zimànyi, E., Mazón, J.N.: A model-driven framework for ETL process development. In: Proceedings of the ACM (2011)

Google Scholar  

Aqlan, F., Nwokeji, J.C.: Applying product manufacturing techniques to teach data analytics in industrial engineering: a project based learning experience. In: 2018 IEEE Frontiers in Education Conference (FIE), pp. 1–7, October 2018

Aqlan, F., Nwokeji, J.C., Shamsan, A.: Teaching an introductory data analytics course using microsoft access® and excel®. In: 2020 IEEE Frontiers in Education Conference (FIE), pp. 1–10, October 2020

Bansal, S.K.: Towards a semantic extract-transform-load (ETL) framework for big data integration. In: 2014 IEEE International Congress on Big Data, pp. 522–529, June 2014

Dayal, U., Castellanos, M., Simitsis, A., Wilkinson, K.: Data integration flows for business intelligence. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT 2009, pp. 1–11. ACM, New York (2009)

Deb Nath, R.P., Hose, K., Pedersen, T.B.: Towards a programmable semantic extract-transform-load framework for semantic data warehouses. In: Proceedings of the ACM Eighteenth International Workshop on Data Warehousing and OLAP, DOLAP 2015, pp. 15–24. ACM, New York (2015)

El Akkaoui, Z., Zimanyi, E., Mazon Lopez, J.N., Trujillo Mondejar, J.C., et al.: A BPMN-based design and maintenance framework for ETL processes. Int. J. Data Warehous. Min. 9 , 46–72 (2013)

Article   Google Scholar  

Freitas, A., Kampgen, B., Oliveira, J.G., ORiain, S., Curry, E.: Representing interoperable provenance descriptions for ETL workflows. In: Extended Semantic Web Conference, pp. 43–57. Springer (2012)

Gudivada, V.N., Baeza-Yates, R.A., Raghavan, V.V.: Big data: promises and problems. IEEE Comput. 48 (3), 20–23 (2015)

Kitchenham, B., Charters, S.: Guidelines for performing systematic literature reviews in software engineering version 2.3. Engineering 45 (4ve), 1051 (2007)

Nwokeji, J.C., Aqlan, F., Olagunju, A.: Big data ETL implementation approaches: a systematic literature review (P) (2018)

Nwokeji, J.C., Aqlan, F., Barn, B., Clark, T., Kulkarni, V.: A modelling technique for enterprise agility. In: Proceedings of the 51st Hawaii International Conference on System Sciences (2018)

Nwokeji, J.C., Clark, T., Barn, B., Kulkarni, V.: A conceptual framework for enterprise agility. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp. 1242–1244. ACM (2015)

Simitsis, A., Wilkinson, K., Dayal, U., Castellanos, M.: Optimizing etl workflows for fault-tolerance. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), pp. 385–396, March 2010

Teodoro, D.H., et al.: Interoperability driven integration of biomedical data sources. Stud. Health Technol. Inf. 169 , 185–9 (2011)

Theodorou, V., Abelló, A., Lehner, W.: Quality measures for ETL processes. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 9–22. Springer (2014)

Wang, Y., Kung, L.A., Byrd, T.A.: Big data analytics: understanding its capabilities and potential benefits for healthcare organizations. Technol. Forecast. Soc. Change 126 , 3–13 (2018)

Zhang, Y., Qiu, M., Tsai, C.-W., Hassan, M.M., Alamri, A.: Health-CPS: healthcare cyber-physical system assisted by cloud and big data. IEEE Syst. J. 11 (1), 88–95 (2017)

Ziegler, P., Dittrich, K.R.: Data integration-problems, approaches, and perspectives. In: Conceptual Modelling in Information Systems Engineering, pp. 39–58. Springer (2007)

Download references

Author information

Authors and affiliations.

Gannon University, Erie, PA, 16541, USA

Joshua C. Nwokeji & Richard Matovu

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Joshua C. Nwokeji .

Editor information

Editors and affiliations.

Faculty of Science and Engineering, Saga University, Saga, Japan

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Cite this paper.

Nwokeji, J.C., Matovu, R. (2021). A Systematic Literature Review on Big Data Extraction, Transformation and Loading (ETL). In: Arai, K. (eds) Intelligent Computing. Lecture Notes in Networks and Systems, vol 284. Springer, Cham. https://doi.org/10.1007/978-3-030-80126-7_24

Download citation

DOI : https://doi.org/10.1007/978-3-030-80126-7_24

Published : 07 July 2021

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-80125-0

Online ISBN : 978-3-030-80126-7

eBook Packages : Intelligent Technologies and Robotics Intelligent Technologies and Robotics (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • - Google Chrome

Intended for healthcare professionals

  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • Validity of data...

Validity of data extraction in evidence synthesis practice of adverse events: reproducibility study

  • Related content
  • Peer review
  • Chang Xu , professor 1 2 3 ,
  • Tianqi Yu , masters candidate 4 ,
  • Luis Furuya-Kanamori , senior research fellow 5 ,
  • Lifeng Lin , assistant professor 6 ,
  • Liliane Zorzela , clinical assistant professor 7 ,
  • Xiaoqin Zhou , methodologist 9 ,
  • Hanming Dai , doctoral candidate 9 ,
  • Yoon Loke , professor 10 ,
  • 1 Key Laboratory of Population Health Across-life Cycle, Ministry of Education of the People’s Republic of China, Anhui Medical University, Anhui, China
  • 2 Anhui Provincial Key Laboratory of Population Health and Aristogenics, Anhui Medical University, Anhui, China
  • 3 School of Public Health, Anhui Medical University, Anhui, China
  • 4 Chinese Evidence-based Medicine Centre, West China Hospital, Sichuan University, Chengdu, China
  • 5 UQ Centre for Clinical Research, Faculty of Medicine, University of Queensland, Brisbane, QLD, Australia
  • 6 Department of Statistics, Florida State University, Tallahassee, FL, USA
  • 7 Department of Pediatrics, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, Alberta, AB, Canada
  • 8 Department of Clinical Research Management, West China Hospital, Sichuan University, Chengdu, China
  • 9 Mental Health Centre, West China Hospital of Sichuan University, Chengdu, China
  • 10 Norwich Medical School, University of East Anglia, Norwich, UK
  • 11 Departments of Psychiatry, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, Alberta, AB, Canada
  • Correspondence to: S Vohra svohra{at}ualberta.ca
  • Accepted 10 April 2022

Objectives To investigate the validity of data extraction in systematic reviews of adverse events, the effect of data extraction errors on the results, and to develop a classification framework for data extraction errors to support further methodological research.

Design Reproducibility study.

Data sources PubMed was searched for eligible systematic reviews published between 1 January 2015 and 1 January 2020. Metadata from the randomised controlled trials were extracted from the systematic reviews by four authors. The original data sources (eg, full text and ClinicalTrials.gov) were then referred to by the same authors to reproduce the data used in these meta-analyses.

Eligibility criteria for selecting studies Systematic reviews were included when based on randomised controlled trials for healthcare interventions that reported safety as the exclusive outcome, with at least one pair meta-analysis that included five or more randomised controlled trials and with a 2×2 table of data for event counts and sample sizes in intervention and control arms available for each trial in the meta-analysis.

Main outcome measures The primary outcome was data extraction errors summarised at three levels: study level, meta-analysis level, and systematic review level. The potential effect of such errors on the results was further investigated.

Results 201 systematic reviews and 829 pairwise meta-analyses involving 10 386 randomised controlled trials were included. Data extraction could not be reproduced in 1762 (17.0%) of 10 386 trials. In 554 (66.8%) of 829 meta-analyses, at least one randomised controlled trial had data extraction errors; 171 (85.1%) of 201 systematic reviews had at least one meta-analysis with data extraction errors. The most common types of data extraction errors were numerical errors (49.2%, 867/1762) and ambiguous errors (29.9%, 526/1762), mainly caused by ambiguous definitions of the outcomes. These categories were followed by three others: zero assumption errors, misidentification, and mismatching errors. The impact of these errors were analysed on 288 meta-analyses. Data extraction errors led to 10 (3.5%) of 288 meta-analyses changing the direction of the effect and 19 (6.6%) of 288 meta-analyses changing the significance of the P value. Meta-analyses that had two or more different types of errors were more susceptible to these changes than those with only one type of error (for moderate changes, 11 (28.2%) of 39 v 26 (10.4%) 249, P=0.002; for large changes, 5 (12.8%) of 39 v 8 (3.2%) of 249, P=0.01).

Conclusion Systematic reviews of adverse events potentially have serious issues in terms of the reproducibility of the data extraction, and these errors can mislead the conclusions. Implementation guidelines are urgently required to help authors of future systematic reviews improve the validity of data extraction.

Introduction

In an online survey of 1576 researchers by Nature , the collected opinions emphasised the need for better reproducibility in research: “More than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments.” 1

Systematic reviews and meta-analyses have become the most important tools for assessing healthcare interventions. This research involves explicit and standardised procedures to identify, appraise, and synthesise all available evidence within a specific topic. 2 During the process of systematic reviews, each step matters, and any errors could affect the reliability of the final results. Among these steps, data extraction is arguably one of the most important and is prone to errors because raw data are transferred from original studies into the systematic review that serves as the basis for evidence synthesis.

To ensure the quality of data extraction, authoritative guidelines, such as the Cochrane Handbook, highlight the importance of independent extraction by two review authors. 2 Despite this quality assurance mechanism, data extraction error in systematic reviews occurs frequently in the literature. 3 Jones et al 4 reproduced 34 Cochrane reviews published in 2003 (issue 4) and found that 20 (59%) had data extraction errors. Gøtzsche et al 5 examined 27 meta-analyses of continuous outcomes and reported that 17 (63%) of these meta-analyses had an error for at least one of the two randomly selected trials. In their subsequent study, based on 10 systematic reviews of continuous outcomes, seven (70%) were identified as erroneous data. 6

Empirical evidence suggests that the effect of data extraction error seems to be minor. 3 5 However, this conclusion is based on systematic reviews of continuous outcomes, which do not apply to binary outcomes of adverse events. Harms, especially serious harms, tend to be rare, and such data in nature are more susceptible to random or systematic errors than are common outcomes. 7 8 For example, consider a 1:1 designed trial with a sample size of 100, and the event counts of death are two intervention group and one in the control group. If the review authors incorrectly extracted the number of events in the intervention group as one, the relative risk would drop from two to one, leading to a completely different conclusion. Owing to this feature, in systematic reviews of adverse events, the validity of data extraction can considerably affect the results and even predominate the final conclusion. The erroneous conclusion would further influence the clinical practice guidelines and mislead healthcare practice.

We used a large-scale reproducibility investigation on the reproducibility of data extraction for systematic reviews of adverse events. We propose an empirical classification of the data extraction errors to help methodologists and systematic review authors better understand the sources of data extraction errors. The impact of such errors on the results is also examined based on the reproducibility dataset.

Protocol and data source

This article is an extension of our previous work describing methods to deal with double-zero-event studies. 9 A protocol was drafted on 11 April 2021 by a group of core authors (CX, TY, LL, LFK), which was then revised after expert feedback (SV, LZ, RQ, and JZ; see supplementary file). We also record the detailed implementation of this study (supplementary table 1).

A subset of the data from the previous study was used in this study. Briefly, we searched PubMed for systematic reviews of adverse events indexed from 1 January 2015 to 1 January 2020. The limit on the search date was arbitrary but allowed us to capture the practice of the most recent systematic reviews. We did not search in other databases because we did not aim to include all systematic reviews; instead, a representative sample was sufficient for the aim of the current study. The search strategy was developed by an information specialist (supplementary box 1), and the literature search was conducted on 28 July 2020, and has been recorded elsewhere. 9

Inclusion criteria and screening

We included systematic reviews of randomised controlled trials for healthcare interventions, with adverse events as the exclusive outcome. The term adverse event was defined as “any untoward medical occurrence in a patient or subject in clinical practice,” 10 which could be a side effect, adverse effect, adverse reaction, harm, or complication associated with any healthcare intervention. 11 We did not consider systematic reviews based on other types of studies because randomised controlled trials are more likely to be registered with available related summarised data for safety outcomes; this source provided another valid way to assess the reproducibility of data extraction. Additionally, we limited systematic reviews to those with at least one pairwise meta-analysis with five or more studies; the requirement of the number of studies was designed for an ongoing series of studies on synthesis methods to ensure sufficient statistical power. 12 To facilitate the reproducing of the data used in meta-analyses, we considered only systematic reviews that provided a 2×2 table of data of event counts and sample sizes in intervention and control arms of each included study in forest plots or tables. Meta-analyses of proportions and network meta-analyses were not considered. Safety outcomes with continuous type were also not considered because continuous outcomes have been investigated by others. 4 5 6 Systematic reviews in languages other than English and Chinese were excluded.

Two review authors screened the literature independently (XQ and CX). Titles and abstracts were screened first, and then the full texts of the relevant publications were read. For screening of titles and abstracts, only records excluded by both reviewer authors were excluded. Any disagreements were solved by discussion between the two authors.

Data collection

Metadata from the randomised controlled trials were collected from eligible systematic reviews. The following items were extracted: name of the first author, outcome of interest, number of participants and number of events in each group, and detailed information of intervention (eg, type of intervention, dosage, and duration) and control groups. Four experienced authors (CX, TQ, XQ, and HM) extracted the data by dividing the eligible systematic reviews into four equal portions by the initial of the first author, and each extractor led one portion. We had a pilot training for the above items to be extracted through the first systematic review before the formal data extraction. Finally, data were initially checked by the same extractors for their own portion and double checked by the other two authors separately (CX and TQ) to confirm that no errors were present from the data extraction (supplementary table 1).

Additionally, based on the reporting of each systematic review, we collected the following information according to the good practice guideline of data extraction 13 : how the data were extracted (eg, two extractors independently), whether a protocol was available, whether a clear data extraction plan was made in the protocol, whether any solution for anticipant problems in data extraction was outlined in the protocol, whether a standard data extraction form was used, whether the data extraction form was piloted, whether the data extractors were trained, the expertise of the data extractors, and whether they documented any details of data extraction. CX also collected the methods (eg, inverse variance, fixed effect model), effect estimators used for meta-analysis, and the effect with a confidence interval of the meta-analysis. TQ checked the extraction and any disagreements were solved by discussion between these two authors (with detailed records).

Reproducibility

After we extracted the data from the included systematic reviews, the four authors who extracted the data were required to reproduce the data used in meta-analyses from the original sources, which included the original publications of the randomised controlled trials and their supplementary files, ClinicalTrials.gov, and websites of the pharmaceutical companies. When the trial data used in a meta-analysis were not the same as had been reported from one of its original sources, we classified it as a “data extraction error.” If the authors of the systematic review reported that they had contacted the authors of the original paper and successfully obtained related data, we did not consider the discrepancy a data extraction error, even if the data were not the same as any of the original sources. 14 We recorded the details of the location (that is, event count (r) or total sample size (n), intervention (1) or control group (2), which are marked as r1/n1/r2/n2) and the reasons why the data could not be reproduced. Any enquires or issues that would affect our assessment were resolved by group discussion of the four extractors. Again, reproducibility was initially checked by the data extractors for their own portions of the workload. After data extraction and reproduction, the lead author (CX) and TQ separately conducted two further rounds of double checking (supplementary table 1). 15

Our primary outcome of this study was the proportions of the data extraction errors at the study level, the meta-analysis level, and the systematic review level. The secondary outcomes were the proportion of studies with data extraction error within each meta-analysis and the proportion of meta-analyses with data extraction error within each systematic review.

Statistical analysis

We summarised the frequency of data extraction errors at the study level, the meta-analysis level, and the systematic review level to estimate the aforementioned proportions. For the study level, the frequency was the total number of randomised controlled trials with data extraction errors. For the meta-analysis level, the frequency was the number of meta-analyses with at least one study with data extraction errors. For the systematic review level, the frequency was the number of systematic reviews with at least one meta-analysis with data extraction errors.

Considering that clustering effects might be present (owing to the diverse expertise and experience of the four people who extracted data), a generalised linear mixed model was further used to estimate the extractor adjusted proportion. 16 The potential associations among duplicated data extraction, development of a protocol in advance, and data extraction errors based on systematic review level were examined using multivariable logistic regression. Other recommendations listed in good practice guidelines were not examined because most systematic reviews did not report the information.

Because data extraction errors could have different mechanisms (eg, calculation errors and unclear definition of the outcome), we empirically classified these errors into different types on the basis of consensus after summarising the error information (supplementary fig 1). Then, the percentages of the different types of errors among the total number of errors were summarised based on the study level. We conducted a post-hoc comparison of the difference of the proportions of the total and the subtype errors by two types of interventions: drug interventions; and non-drug interventions (eg, surgery and device). We did this because the safety outcomes are greatly different for these two types of interventions based on our word cloud analysis (supplementary fig 2).

To investigate the potential effect of data extraction errors on the results, we used the same methods and effect estimators that the authors reported based on the corrected dataset. We repeated these meta-analyses and compared the new results to the original results. Some meta-analyses contained errors related to unclear definitions of the outcomes (that is, the ambiguous error defined in table 1 ). The true number of events is therefore impossible for readers to determine, as is the ability to investigate the effect on the results based on the full empirical dataset. Therefore, we used a subset with meta-analyses free of this type of ambiguous errors. We prespecified a 20% change of the magnitude or more of the effects as moderate impact and a 50% change or more as large impact. We also summarised the proportion of change on the direction of the effects and on the significance of the P value.

Descriptions of the different types of errors during the data extraction

  • View inline

Missing data would occur when the original data sources were not available for a few randomised controlled trials in which we were unable to verify data accuracy. For our sensitivity analysis, which investigated the robustness of the results, we removed these studies. We used Stata 15/SE for the data analysis. The estimation of the proportions was based on the meglm command under the Poisson function with the log link 28 ; we set α=0.05 as the significance level. We performed the re-evaluation of the meta-analyses by the admetan command in Stata and verified by metafor command in R 3.5.1 software, and Excel 2013 was used for visualisation.

Patient and public involvement

As this was a technical paper to assess related methodology for data extraction errors of evidence synthesis practice and the impacts of these errors on the analysis, no patients or public members were involved, nor was funding available for the same reason.

Overall, we screened 18 636 records, and initially identified 456 systematic reviews of adverse events. 9 After a further screening of the full texts, 102 were excluded for having non-randomised studies of intervention and 153 were excluded for not having a pairwise meta-analysis, having fewer than five studies in all meta-analyses, or not reporting 2×2 table data used in meta-analyses (supplementary table 3). As such, 201 systematic reviews were included in the current study ( fig 1 ).

Fig 1

Flowchart for selection of articles. RCT=randomised controlled trial

  • Download figure
  • Open in new tab
  • Download powerpoint

Among the 201 systematic reviews, 156 referred to drug interventions and the other 45 were non-drug interventions (60% were surgical or device interventions). From the 201 systematic reviews, we identified 829 pairwise meta-analyses with at least five studies involving 10 386 randomised controlled trials. The data extraction error by the four data extractors ranged from 0.5% to 5.4% based on the double-checking process, which suggested that this study had high quality data extraction (supplementary table 1).

Among the 201 systematic reviews, based on the reporting information, 167 (83.1%) stated that they had two data extractors, 31 (15.4%) did not report such information, two (1%) cannot be judged owing to insufficient information, and only one (0.5%) reported that the data were extracted by one person. Fifty four (26.9%) systematic reviews reported a protocol that was developed in advance, whereas most (147, 73.1%) did not report whether they had a protocol. For those with protocols, 32 (59.3%) of 54 had a clear plan for data extraction and 22 (40.7%) outlined a potential solution for anticipant problems for data extraction. Sixty six (32.8%) systematic reviews used a standard data extraction form, while most (135, 67.2%) did not report this information. For the systematic reviews that used a standard extraction form, six (8.8%) piloted this process. No systematic reviews reported the information of whether the data extractor was trained or the expertise of the data extractor. Only seven (3.5%) of 201 systematic reviews documented the details of the data extraction process.

Reproducibility of the data extraction

For the reproducibility of the data used in these meta-analyses, at the study level, we could not reproduce 1762 (17.0%) of 10 386 studies with an extractor addressed proportion of 15.8%. At the meta-analysis level, 554 (66.8%) of 829 meta-analyses had at least one randomised controlled trial with data extraction errors, with an extractor addressed proportion of 65.5% ( fig 2 ). For meta-analyses with data extraction errors in at least one study, the proportion of studies with data extraction errors within a meta-analysis ranged from 1.9% to 100%, with a median value of 20.6% (interquartile range 12.5-40.0; fig 2 ).

Fig 2

Data extraction errors at the meta-analysis level. Bar plot is based on studies with data extraction errors (n=554). Error rate within a meta-analysis is calculated by the number of studies with data extraction errors against the total number of studies within a meta-analysis

At the systematic review level, 171 (85.1%) of 201 systematic reviews had at least one meta-analysis with data extraction errors, with an extractor addressed proportion of 85.1% ( fig 3 ). For systematic reviews with data extraction errors in at least one meta-analysis, the proportion of meta-analyses with data extraction errors within a systematic review ranged from 16.7% to 100.0%, with a median value of 100.0% (interquartile range 66.7-100; fig 3 ).

Fig 3

Data extraction errors at the systematic review level. Bar plot is based on studies with data extraction errors (n=171). Error rate within a systematic review is calculated by the number of meta-analyses with data extraction errors against the total number of meta-analyses within a systematic review

Based on the multivariable logistic regression, those systematic reviews that reported duplicated data extraction or were checked by another author (odds ratio 0.9, 95% confidence interval 0.3 to 2.5, P=0.83) and developed a protocol in advance (0.7, 0.3 to 1.6, P=0.38) did not show a difference in the odds of errors, but there might be a weak association of errors.

Empirical classification of errors

Based on the mechanism of the data extraction errors, we empirically classified these errors into five types: numerical error, ambiguous error, zero assumption error, mismatching error, and misidentification error. Table 1 provides the definitions of these five types of data extraction errors, with detailed examples. 17 18 19 20 21 22 23 24 25 26 27

Numerical error was the most prevalent data extraction error, which accounted for 867 (49.2%) of 1762 errors recorded in the studies ( fig 4 ). The second most prevalent data extraction error was the ambiguous error, accounting for 526 (29.9%) errors. Notably, zero assumption errors accounted for as much as 221 (12.5%) errors. Misidentification accounted for 115 (6.5%) errors and mismatching errors accounted for 33 (1.9%) errors.

Fig 4

Proportion of 1762 studies classified by five types of data extraction error

Subgroup analysis by the intervention type suggested that meta-analyses with drug interventions were more likely to have data extraction errors than those involving non-drug interventions: total error (19.9% v 8.9%; P<0.001), ambiguous error (6.1% v 2.4%; P<0.001), numerical error (9.4% v 5.4%; P<0.001), zero assumption error (2.6% v 0.9%; P<0.001), and misidentification errors error (1.5% v 0.1%; P<0.001; supplementary fig 3). Although mismatching errors showed the same pattern, the data were not significantly different (0.4% v 0.2%; P=0.09).

Impact of data extraction errors on the results

After removing meta-analyses with ambiguous errors and without errors, 288 meta-analyses could be used to investigate the impact of data extraction errors on the results (supplementary table 4). Among them, 39 had two or more types of errors (mixed), and 249 had only one type of error (single). For the 249 meta-analyses, 200 had numerical errors, 25 had zero assumption errors, 16 had misidentification errors, and eight had mismatching errors. Because of the limited sample size of each subtype, we only summarised the total impact and the effect grouped by the number of types (that is, single type of error or mixed type of errors).

In total, in terms of the magnitude of the effect, when using corrected data for the 288 meta-analyses, 151 (52.4%) had decreased effects, whereas 137 (47.6%) had increased effects; 37 (12.8%) meta-analyses had moderate changes (with ≥20% changes), and 13 (4.5%) had large changes (with ≥50% changes) in the effect estimates ( fig 5 ). For those 37 studies with moderate changes, the effects in 26 (70.2%) increased, whereas those in 11 (29.7%) decreased when using corrected data. For those 13 studies with large changes, nine (69.2%) showed increased effects, whereas four (30.8%) showed decreased effects. Ten (3.5%) of the 288 meta-analyses had changes in the direction of the effect, and 19 (6.6%) of the 288 meta-analyses changed the significance of the P value. For those studies that had changes in the direction, two (20.0%) of 10 changed from beneficial to harmful effects, and eight (80.0%) of the 10 changed from harmful to beneficial effects. For studies that changed in significance, 10 (52.6%) of 19 changed from non-significance to significance, and nine (47.4%) of 19 changed from significance to non-significance. Some examples are presented in table 2 . Studies with two or more types of errors had higher proportions of moderate (28.2% v 10.4%, P=0.002) and large changes (12.8% v 3.2%, P=0.01; fig 5 ) than did with only a single error.

Fig 5

Impact of data extraction errors on results

Examples of changes in the effects and significance when using corrected data

Sensitivity analysis

For 318 (3.1%) of 10 386 studies in the total dataset, we could not obtain full texts or had no access to the original data source to verify data accuracy. After treating them as missing values and removing them from the analyses, no changes were obvious in the proportions of data extraction errors: 16.2% for the study level, 65.7% for the meta-analysis level, and 85.1% for the systematic review level (addressed by extractor clustering effects).

Principal findings

We investigated the reproducibility of the data extraction of 829 pairwise meta-analyses within 201 systematic reviews of safety outcomes by repeating the data extraction from all the included studies. Our results suggested that as much as 85% of the systematic reviews had data extraction errors in at least one meta-analysis. From the point of meta-analysis level, as many as 67% of the meta-analyses had at least one study with data extraction error. Our findings support the seriousness of the findings from the survey conducted by Nature regarding reproducibility of basic science research (70%). 1 At the systematic review level, the problem is even more serious.

Our subgroup analysis showed that data for the safety outcomes of drug interventions had a higher proportion of extraction error (19.9%) than did data for non-drug interventions (8.9%). One important reason could be that safety outcomes of different types of interventions vary considerably (supplementary fig 1). For non-drug interventions, most interventions were surgical or a device, where safety outcomes might be easier to define. For example, a common safety outcome in surgical intervention is bleeding during surgery, whereas a common outcome of drug interventions is liver toxicity, which might be more complex to define and measure. Additionally, the reporting of adverse events in surgical interventions heavily relies on the surgical staff, whereas for adverse events of a drug, patients might also participate in the reporting process. Selective reporting could exist for adverse events of surgical interventions without patients’ participation, 29 and mild but complex adverse events (eg, muscular pain) might be neglected and further make reported adverse events appear more straightforward.

We classified data extraction errors into five types based on the mechanism. Based on this classification, we further found that numerical errors, ambiguous errors, and zero assumption errors accounted for 91% of the total errors. The classification and related findings are important because these data provide a theoretical basis for researchers to develop implementation guidelines and help systematic review authors to reduce the frequency of errors during the data extraction process. Another important reason for data extraction errors might be the poor reporting of adverse events in randomised controlled trials, which have varying terminology, poorly defined categories, and diverse data sources. 30 31 32 If trials did not clearly define an adverse outcome and report it transparently, then systematic review authors would face difficulties during data extraction and the process would be prone to errors, especially with regard to the ambiguous types. We believe that with proper implementation guidance and more explicit trial reporting guidelines for adverse events, these errors can substantially be reduced.

The classification also provides a theoretical basis for methodologists to investigate the potential impact of different types of data extraction errors on the results. The impact of different types of errors on the results could vary. For example, the zero assumption error is expected to push the final effect towards the null when related studies have balanced sample sizes in two arms. 33 The mismatching error has a similar effect because the error pushes the effect towards the opposite direction. By contrast, the direction of the effect is difficult to predict in the other three types of errors. In our empirical data, because of the small number of meta-analyses in each category, we were unable to investigate the impact of each single type of error on the results. One of the most important reasons is that many meta-analyses have ambiguous errors. Nevertheless, we were able to compare the effect of multiple error types against a single error type for meta-analyses. Our results suggested that meta-analyses with multiple types of data extraction errors were prone to be affected. Because different methods can vary on this assumption (eg, two-stage methods with odds ratios assume that double-zero studies are non-informative 9 ), the use of different synthesis methods and effect estimates might have different impacts. 34 35 The impact of data extraction errors on the results is expected to be thoroughly investigated by simulation research.

Strengths and limitations

This large empirical study investigates the reproducibility of the data extraction of systematic reviews of adverse events and its impact on the results of related meta-analyses. The findings of our study pose a serious warning to the community that much progress is needed to achieve high quality, evidence-based practice. We are confident that the results of our findings are reliable because the data have been through five rounds of cross-checking within our tightly structured collaborative team. Additionally, this study is the first time that data extraction errors were defined based on their mechanism, which we think will benefit future methodological research in this area.

However, some limitations are still present. Firstly, owing to the large amount of work, data collection was divided into four portions, and each portion was conducted by a separate author. Although all authors undertook pilot training in advance, their judgments might still differ. Nevertheless, our analysis used the generalised linear mixed model, which accounted for the potential clustering effect by different extractors, of which the findings suggested no obvious impact on the results. Secondly, our study covered only systematic reviews published in the five year period from 2015 to 2020; therefore, the validity of the data extraction in earlier studies is unclear. Whether this issue has deteriorated or improved over time could not be assessed. Thirdly, a small proportion of studies could not have reproducibility checked, and these studies were treated as if no data extraction errors existed, which could lead to a slight underestimation of data extraction error overall. 36

Furthermore, we only focused on systematic reviews of randomised controlled trials and did not consider observational studies. Because the sample sizes of randomised controlled trials tend to be small, the impact might be exacerbated. Finally, poor reporting has been commonly investigated in literature 37 38 ; owing to the limited information of the data extraction process reported by review authors, we could not fully investigate the association between good practice recommendations and the likelihood of data extraction. For the same reason, the association among duplicated data extraction, development of a protocol in advance, and data extraction errors should be interpreted with caution. Further studies based on randomised controlled design might be helpful. However, we believe these limitations have little impact on our main results and conclusions.

Conclusions

Systematic reviews of adverse events face serious issues in terms of the reproducibility of their data extraction. Prevalence of data extraction errors is high among these systematic reviews and these errors could lead to the changing of the conclusions and further mislead the healthcare practice. A series of expanded reproducibility studies on other types of meta-analyses might be useful for further evidence-based practice. Additionally, implementation guidelines on data extraction for systematic reviews are urgently required to help future review authors improve the validity of their findings.

What is already known on this topic

In evidence synthesis practice, data extraction is an important step and prone to errors, because raw data are transferred from the original studies into the meta-analysis

Data extraction errors in systematic reviews occur frequently in the literature, although these errors generally have a minor effect on the results

However, this conclusion is based on systematic reviews of continuous outcomes, and might not apply to binary outcomes of adverse events

What this study adds

In a large-scale reproducibility investigation of 201 systematic reviews of adverse events with 829 pairwise meta-analyses, data extraction errors frequently occurred for binary outcomes of adverse events

These errors could be grouped into five categories based on the mechanism: numerical error, ambiguous error, zero assumption error, mismatching error, and misidentification error

The errors can lead to changes in the conclusions of the findings, and meta-analyses that had two or more types of errors were more susceptible to these changes

Ethics statements

Ethical approval.

Not required.

Data availability statement

A subset of the data can be found at https://osf.io/czyqa /. The dataset could be obtained from the first author ( [email protected] ) or the corresponding author ( [email protected] ) on request.

Acknowledgments

We thank Riaz Qureshi from Johns Hopkins University and Zhang Jiaxin from Guizhou Provincial People's Hospital for their comments and edits on our protocol. We also thank Lu Cuncun from Lanzhou University for developing the search strategy for the whole project.

Contributors: CX and SV conceived and designed the study; CX collected the data, analysed the data, and drafted the manuscript; ZXQ and CX screened the literature; YTQ, CX, DHM, ZXQ extracted and reproduced the data; YTQ and CX contributed to the data checking; CX, SV, LFK, LL, LZ, and YL provided methodological comments, and revised the manuscript. All authors approved the final version to be published. CX and SV are the study guarantors. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

Funding: LFK is funded by an Australian National Health and Medical Research Council Fellowship (APP1158469). LL is funded by the US National Institutes of Health/National Library of Medicine grant R01 LM012982 and the National Institutes of Health/National Institute of Mental Health grant R03 MH128727. The funding body had no role in any process of the study (that is, study design, analysis, interpretation of data, writing of the report, and decision to submit the article for publication).

Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: support from the Australian National Health and Medical Research Council Fellowship, US National Institutes of Health, National Library of Medicine, and National Institute of Mental Health for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

The lead author affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as originally planned (and, if relevant, registered) have been explained.

Dissemination to participants and related patient and public communities: We plan to present our findings at national and international scientific meetings and to use social media outlets to disseminate findings.

Provenance and peer review: Not commissioned; externally peer reviewed.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ .

  • Higgins JPT ,
  • Chandler J ,
  • Cumpston M ,
  • Remmington T ,
  • Williamson PR ,
  • Gøtzsche PC ,
  • Hróbjartsson A ,
  • Higgins JP ,
  • Austin PC ,
  • Steyerberg EW
  • Zorzela L ,
  • Ioannidis JP ,
  • PRISMAHarms Group
  • Jackson D ,
  • Taylor KS ,
  • Mahtani KR ,
  • Berstock J ,
  • Büchter RB ,
  • Piccart-Gebhart M ,
  • Baselga J ,
  • Blackwell KL ,
  • Burstein HJ ,
  • Storniolo AM ,
  • Mubarak N ,
  • Shehata S ,
  • Mahabadi AA ,
  • Flaherty KT ,
  • Infante JR ,
  • Motzer RJ ,
  • Tannir NM ,
  • McDermott DF ,
  • CheckMate 214 Investigators
  • Chesney J ,
  • Pavlick AC ,
  • Postow MA ,
  • Furuya-Kanamori L ,
  • Altman DG ,
  • Boutron I ,
  • Hodkinson A ,
  • Kirkham JJ ,
  • Tudur-Smith C ,
  • Kahale LA ,
  • Khamis AM ,
  • Di Santo P ,
  • Clifford C ,

literature review data extraction

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

enago-life-sciences

The Future of Research: AI-Driven Automation in Systematic Reviews

literature review data extraction

The systematic literature review (SLR) is the gold standard that provides firm scientific evidence to support decision-making. SLRs play a vital role in offering a holistic assessment of efficacy, safety, and cost-effectiveness of a diagnostic aid or therapy by synthesizing data from various clinical studies. The process of conducting an SLR begins with formulating an unbiased search strategy to identify pertinent research articles for a thorough examination. In today’s digital age, literature searches often uncover numerous publications, necessitating intensive manual review and analysis, thereby potentially resulting in findings becoming outdated by the time an SLR is completed and published. Artificial intelligence (AI) has the potential to greatly enhance the efficiency and precision of SLRs. AI significantly enhances the efficiency and accuracy of SLRs by automating the literature search, screening, and data extraction processes, reducing the time and potential for human error. Tools like Rayyan and DistillerSR leverage machine learning and natural language processing to streamline and standardize these tasks, making SLRs more scalable and less biased. In 2024, Rayyan and DistillerSR have both introduced significant updates to enhance their functionality for systematic reviews. Rayyan has incorporated a beta version of the PRISMA guideline integration, an auto-resolver feature to automatically handle conflicts between reviewer decisions, and significant improvements to its mobile app, including offline capabilities and team progress monitoring. Additionally, Rayyan has enhanced its advanced filtration and de-duplication tools and offers comprehensive training sessions to facilitate easier onboarding for new users . DistillerSR has focused on boosting its AI-powered automation for tasks like data extraction and reference screening, improving integration with reference management tools such as EndNote and Mendeley, and upgrading its user interface for a more intuitive experience. Furthermore, DistillerSR ensures compliance with various regulatory standards, making it suitable for clinical trials and sensitive research areas. These updates reflect a continued commitment to improving the efficiency, usability, and compliance of systematic review processes through advanced technological solutions.

There are certain guidelines to increase rigor, transparency, and replicability of SLRs.  AI and Machine Learning Techniques (MLTs) developed with computer programming languages can provide methods to increase SLRs’ speed, rigor, transparency, and repeatability. Aimed towards researchers who want to utilize AI and MLTs to synthesise and abstract data obtained through a SLR, this article sets out how computer languages can be used to facilitate unsupervised machine learning for synthesising and abstracting data sets extracted during a SLR. Utilizing an already known qualitative method, Deductive Qualitative Analysis, this article illustrates the supportive role that AI and MLTs can play in the coding and categorisation of extracted SLR data, and in synthesising SLR data. Using a data set extracted during a SLR as a proof-of-concept, this article will include the coding used to create a well-established MLT, Topic Modelling using Latent Dirichlet allocation. This technique provides a working example of how researchers can use AI and MLTs to automate the data synthesis and abstraction stage of their SLR, and aide in increasing the speed, frugality, and rigor of research projects.

Typically, SLRs involve the following steps:

  • Product Development
  • Literature search
  • Literature screening
  • Evidence Generation
  • Quality assessment
  • Preparation of SLR

AI can efficiently assist in carrying out certain steps within SLRs such as,

Automated Search and Screening

literature review data extraction

AI is primarily utilized in SLRs to expedite the initial phases by automating literature search and article screening based on predefined criteria. Search engines increasingly employ AI, by enhancing Retrieval-Augmented Generation (RAG) frameworks with large language models. These frameworks enable the formulation of complex search queries, surpassing the limitations of conventional keyword-based searches.

ML classifiers are utilized to discover more relevant articles. These classifiers undergo training on an initial set of user-selected papers. Then, through iterative processes, they utilize automatic classifications to refine and improve their ability to identify further pertinent literature.

Automated tools leverage AI techniques analyze various components of an article such as its title, abstract, or full text. Natural Language Processing (NLP) algorithms dissect abstracts, titles, and keywords to gauge their relevance to the research topic. Additionally, these AI techniques can incorporate statistical selection processes to identify key terms characterizing each cluster. This involves scoring each citation based on the presence of keywords, aiding screeners in making more informed decisions regarding their relevance. Consequently, resulting clusters highlight the most representative terms, facilitating better judgment regarding the inclusion or omission of a publication from the analysis.

Data Extraction and Evidence Generation

literature review data extraction

In health research, researchers apply various protocols for literature review depending on the type of report to be generated. These include PICO (population, intervention, comparison, outcome), PCC (population, context, concept), PICODR (elements of PICO plus duration and results), PIBOSO (population, intervention, background, outcome, study design, and others).

AI processes data from predetermined fields in interventional, diagnostic, or prognostic SLRs. NLP algorithms can extract crucial details like study methodologies, results, and statistical information which are subsequently synthesized and analyzed to derive valuable insights. AI technologies leverage domain ontology to structure the data, providing a formal depiction of variable types and their interrelationships.

Quality Assessment

literature review data extraction

Minimizing selection bias and enhancing both the external and internal validity of selected publications in a SLR is crucial. Evaluating the quality of an SLR provides insight into its overall robustness and credibility. AI can assist quality assessment by analyzing various factors such as study design, sample size, and methodology of included studies.

Machine learning algorithms, when trained on available datasets, can identify patterns suggestive of high-quality research, thus aiding researchers in efficiently assessing evidence reliability. Many validated checklists, like PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses), advise independent reviewer assessment of bias in literature search and selection. Integrating manual quality checks with automated screening is crucial for identifying gaps and inconsistencies, which can then be addressed by reconciling differences between the screener’s and reviewer’s decisions.

Analysis, Data Visualization, and Preparation of the Report

literature review data extraction

AI-powered tools can aid in meta-analyzing data derived from multiple studies, allowing researchers to synthesize findings quantitatively and evaluate overall effect sizes. Through semantic analysis and clustering techniques, AI can facilitate the organization and categorization of extensive literature volumes. By discerning common themes and relationships among studies, AI algorithms assist researchers in gaining profound insights into existing literature, assessing the current research landscape, and pinpointing areas for further exploration.

AI-driven visualization techniques can streamline the presentation of intricate information, making it more understandable and facilitating decision-making. Leveraging algorithms and models, AI technology can identify patterns, trends, outliers, and correlations within diverse data sets. Insights and recommendations gleaned from SLR data can aid researchers in comprehending the implications of knowledge gaps, processes, research methodologies, and policies.

By integrating feedback from researchers and refining algorithms based on new data, AI holds the potential to continually enhance the accuracy and efficiency of SLRs, thereby elevating the quality of research outcomes.

Challenges and Future Directions

Despite the transformative potential of AI in SLR and decision making, several challenges remain. Ensuring the transparency, interpretability, and ethical use of AI algorithms is paramount to fostering trust and acceptance within the healthcare community. Additionally, addressing issues related to data quality, interoperability, and bias in AI-driven analyses is essential for safeguarding the integrity of evidence-based medicine.

Looking ahead, continued research and innovation are necessary to harness the full potential of AI in healthcare. Collaborative efforts between interdisciplinary teams comprising clinicians, researchers, data scientists, and policymakers will be instrumental in overcoming challenges and unlocking new opportunities for leveraging AI in evidence synthesis and decision-making.

In conclusion, the integration of AI in Systematic Literature Reviews and Health Technology Assessment decision-making represents a paradigm shift in evidence-based medicine. By harnessing the power of AI, we can streamline processes, enhance decision quality, and ultimately improve healthcare outcomes for patients worldwide. As we navigate this transformative journey, it is imperative to prioritize ethics, transparency, and collaboration to realize the full benefits of AI in healthcare.

 References :

  • Francisco B, Angelo S, Osborne F, et al. Artificial Intelligence for Literature Reviews: Opportunities and Challenges doi: https://doi.org/10.48550/arXiv.2402.08565
  • Marshall, C., & Wallace, D. (2019). Toward automated systematic reviews: A study of the precision and recall of AI-assisted screening. Journal of the American Medical Informatics Association, 26(11), 1215-1222. doi: 10.1186/s13643-019-1074-9.
  • O’Mara-Eves, A., et al. (2015). Using text mining for study identification in systematic reviews: A systematic review of current approaches. Systematic Reviews, 4(1), 5. DOI:  1186/2046-4053-4-5

literature review data extraction

Asif Syed, PhD. Senior Scientific Writer II Connect with Asif on LinkedIn

literature review data extraction

Raghuraj Puthige, PhD. Function Head, Medical Communications – Enago Life Sciences Connect with Raghuraj on  LinkedIn

literature review data extraction

The Impact of AI on Medical Writing: How Artificial Intelligence is Revolutionizing Medical Content Creation

The Impact of AI on Medical Writing: How Artificial Intelligence is Revolutionizing…

Enago Life Sciences at EASL Congress 2024: Connect with us for an exciting exchange…

Reimagining Medical Communications without Time and Money Constraints: A Thought…

The Role of Investigative Journalism in Uncovering Healthcare Issues

Leave A Reply Cancel Reply

Your email address will not be published.

Save my name, email, and website in this browser for the next time I comment.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of sysrev

Automating data extraction in systematic reviews: a systematic review

Siddhartha r. jonnalagadda.

1 Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, 750 North Lake Shore Drive, 11th Floor, Chicago, IL 60611 USA

Pawan Goyal

2 Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, 721302 West Bengal India

Mark D. Huffman

3 Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, USA

Automation of the parts of systematic review process, specifically the data extraction step, may be an important strategy to reduce the time necessary to complete a systematic review. However, the state of the science of automatically extracting data elements from full texts has not been well described. This paper performs a systematic review of published and unpublished methods to automate data extraction for systematic reviews.

We systematically searched PubMed, IEEEXplore, and ACM Digital Library to identify potentially relevant articles. We included reports that met the following criteria: 1) methods or results section described what entities were or need to be extracted, and 2) at least one entity was automatically extracted with evaluation results that were presented for that entity. We also reviewed the citations from included reports.

Out of a total of 1190 unique citations that met our search criteria, we found 26 published reports describing automatic extraction of at least one of more than 52 potential data elements used in systematic reviews. For 25 (48 %) of the data elements used in systematic reviews, there were attempts from various researchers to extract information automatically from the publication text. Out of these, 14 (27 %) data elements were completely extracted, but the highest number of data elements extracted automatically by a single study was 7. Most of the data elements were extracted with F-scores (a mean of sensitivity and positive predictive value) of over 70 %.

Conclusions

We found no unified information extraction framework tailored to the systematic review process, and published reports focused on a limited (1–7) number of data elements. Biomedical natural language processing techniques have not been fully utilized to fully or even partially automate the data extraction step of systematic reviews.

Systematic reviews identify, assess, synthesize, and interpret published and unpublished evidence, which improves decision-making for clinicians, patients, policymakers, and other stakeholders [ 1 ]. Systematic reviews also identify research gaps to develop new research ideas. The steps to conduct a systematic review [ 1 – 3 ] are:

  • Define the review question and develop criteria for including studies
  • Search for studies addressing the review question
  • Select studies that meet criteria for inclusion in the review
  • Extract data from included studies
  • Assess the risk of bias in the included studies, by appraising them critically
  • Where appropriate, analyze the included data by undertaking meta-analyses
  • Address reporting biases

Despite their widely acknowledged usefulness [ 4 ], the process of systematic review, specifically the data extraction step (step 4), can be time-consuming. In fact, it typically takes 2.5–6.5 years for a primary study publication to be included and published in a new systematic review [ 5 ]. Further, within 2 years of the publication of systematic reviews, 23 % are out of date because they have not incorporated new evidence that might change the systematic review’s primary results [ 6 ].

Natural language processing (NLP), including text mining, involves information extraction, which is the discovery by computer of new, previously unfound information by automatically extracting information from different written resources [ 7 ]. Information extraction primarily constitutes concept extraction, also known as named entity recognition, and relation extraction, also known as association extraction. NLP handles written text at level of documents, words, grammar, meaning, and context. NLP techniques have been used to automate extraction of genomic and clinical information from biomedical literature. Similarly, automation of the data extraction step of the systematic review process through NLP may be one strategy to reduce the time necessary to complete and update a systematic review. The data extraction step is one of the most time-consuming steps of a systematic review. Automating or even semi-automating this step could substantially decrease the time taken to complete systematic reviews and thus decrease the time lag for research evidence to be translated into clinical practice. Despite these potential gains from NLP, the state of the science of automating data extraction has not been well described.

To date, there is limited knowledge and methods on how to automate the data extraction phase of the systematic reviews, despite being one of the most time-consuming steps. To address this gap in knowledge, we sought to perform a systematic review of methods to automate the data extraction component of the systematic review process.

Our methodology was based on the Standards for Systematic Reviews set by the Institute of Medicine [ 8 ]. We conducted our study procedures as detailed below with input from the Cochrane Heart Group US Satellite.

Eligibility criteria

We included a report that met the following criteria: 1) the methods or results section describes what entities were or needed to be extracted, and 2) at least one entity was automatically extracted with evaluation results that were presented for that entity.

We excluded a report that met any of the following criteria: 1) the methods were not applied to the data extraction step of a systematic review; 2) the report was an editorial, commentary, or other non-original research report; or 3) there was no evaluation component.

Information sources and searches

For collecting the initial set of articles for our review, we developed search strategies with the help of the Cochrane Heart Group US Satellite, which includes systematic reviewers and a medical librarian. We refined these strategies using relevant citations from related papers. We searched three datasets: PubMed, IEEExplore, and ACM digital library, and our searches were limited between January 1, 2000 and January 6, 2015 (see Appendix 1 ). We restricted our search to these dates because biomedical information extraction algorithms prior to 2000 are unlikely to be accurate enough to be used for systematic reviews.

We retrieved articles that dealt with the extraction of various data elements, defined as categories of data that pertained to any information about or deriving from a study, including details of methods, participants, setting, context, interventions, outcomes, results, publications, and investigators [ 1 ] from included study reports. After we retrieved the initial set of reports from the search results, we then evaluated reports included in the references of these reports. We also sought expert opinion for additional relevant citations.

Study selection

We first de-duplicated the retrieve citations. For calibration and refinement of the inclusion and exclusion criteria, 100 citations were randomly selected and independently reviewed by a two authors (SRJ and PG). Disagreements were resolved by consensus with a third author (MH). In a second round, another set of 100 randomly selected abstracts was independently reviewed by two study authors (SRJ and PG), whereby we achieved a strong level of agreement (kappa = 0.97). Given the high level of agreement, the remaining studies were reviewed only by one author (PG). In this phase, we identified reports as “not relevant” or “potentially relevant”.

Two authors (PG and SRJ) independently reviewed the full text of all citations ( N  = 74) that were identified as “potentially relevant”. We classified included reports into various categories based on the particular data element that they attempted to extract from the original, scientific articles. Example of these data elements might be overall evidence, specific interventions, among others (Table  1 ). We resolved disagreements between the two reviewers through consensus with a third author (MDH).

Data elements, category, sources and existing automation work

Data elementCategoryIncluded in standardsPublished method to extract?
Total number of participantsParticipantsCochrane, PICO, PECODR, PIBOSO, STARDYes [ , , – , , , – , , ]
SettingsParticipantsCochrane, CONSORT, STARDNo
Diagnostic criteriaParticipantsCochrane, STARDNo
AgeParticipantsCochrane, STARDYes [ , , , ]
SexParticipantsCochrane, STARDYes [ , , ]
CountryParticipantsCochraneYes [ , ]
Co-morbidityParticipantsCochrane, STARDYes [ ]
Socio-demographicsParticipantsCochrane, STARDNo
Spectrum of presenting symptoms, current treatments, recruitment centersParticipantsSTARDYes [ , , , , , ]
EthnicityParticipantsCochraneYes [ ]
Date of studyParticipantsCochraneYes [ ]
Date of recruitment and follow-upParticipantsCONSORT, STARDNo
Participant samplingParticipantsSTARDNo
Total number of intervention groupsInterventionCochraneYes [ , ]
Specific interventionInterventionCochrane, PICO, PIBOSO, PECODRYes [ , , – , , , , , , ]
Intervention details (sufficient for replication, if feasible)InterventionCochrane, CONSORTYes [ ]
Integrity of interventionInterventionCochraneNo
Outcomes and time points (i) collected; (ii) reportedOutcomesCochrane, CONSORT, PICO, PECODR, PIBOSOYes [ , , – , , , , – , ]
Outcome definition (with diagnostic criteria if relevant)OutcomesCochraneNo
Unit of measurement (if relevant)OutcomesCochraneNo
For scales: upper and lower limits, and whether high or low score is goodOutcomesCochraneNo
ComparisonComparisonsPICO, PECODRYes [ , , , ]
Sample sizeResultsCochrane, CONSORTYes [ , ]
Missing participantsResultsCochraneNo
Summary data for each intervention group (e.g. 2 × 2 table for dichotomous data; means and SDs for continuous data)ResultsCochrane, PECODR, STARDNo
Estimate of effect with confidence interval; valueResultsCochraneNo
Subgroup analysesResultsCochraneNo
Adverse events and side effects for each study groupResultsCONSORT, STARDNo
Overall evidenceInterpretationCONSORTYes [ , ]
Generalizability: external validity of trial findingsInterpretationCONSORTYes [ ]
Research questions and hypothesesObjectivesCONSORT, PECODR, PIBOSO, STARDYes [ , ]
Reference standard and its rationaleMethodSTARDNo
Technical specifications of material and methods involved including how and when measurements were taken, and/or cite references for index tests and reference standardMethodSTARDNo
Study designMethodCochrane, PIBOSOYes [ , , , ]
Total study durationMethodCochrane, PECODRYes [ , , ]
Sequence generationMethodCochraneYes [ ]
Allocation sequence concealmentMethodCochraneYes [ ]
BlindingMethodCochrane, CONSORT, STARDYes [ ]
Methods used to generate random allocation sequence, implementationMethodCONSORT, STARDYes [ ]
Other concerns about biasMethodCochraneNo
Methods used to compare groups for primary outcomes and for additional analysesMethodCONSORT, STARDNo
Methods for calculating test reproducibilityMethodSTARDNo
Definition and rationale for the units, cutoffs and/or categories of the results of the index tests and reference standardMethodSTARDNo
Number, training, and expertise of the persons executing and reading the index tests and the reference standardMethodSTARDNo
Participant flow: flow of participants through each stage: randomly assigned, received intended treatment, completed study, analyzed for primary outcome, inclusion and exclusion criteriaMethodCONSORTYes [ , , ]
Funding sourceMiscellaneousCochraneNo
Key conclusions of the study authorsMiscellaneousCochraneYes [ ]
Clinical applicability of the study findingsMiscellaneousSTARDNo
Miscellaneous comments from the study authorsMiscellaneousCochraneNo
References to other relevant studiesMiscellaneousCochraneNo
Correspondence requiredMiscellaneousCochraneNo
Miscellaneous comments by the review authorsMiscellaneousCochraneNo

Data collection process

Two authors (PG and SRJ) independently reviewed the included articles to extract data, such as the particular entity automatically extracted by the study, algorithm or technique used, and evaluation results into a data abstraction spreadsheet. We resolved disagreements through consensus with a third author (MDH).

We reviewed the Cochrane Handbook for Systematic Reviews [ 1 ], the CONsolidated Standards Of Reporting Trials (CONSORT) [ 9 ] statement, the Standards for Reporting of Diagnostic Accuracy (STARD) initiative [ 10 ], and PICO [ 11 ], PECODR [ 12 ], and PIBOSO [ 13 ] frameworks to obtain the data elements to be considered. PICO stands for Population, Intervention, Comparison, Outcomes; PECODR stands for Patient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results; and PIBOSO stands for Population, Intervention, Background, Outcome, Study Design, Other.

Data synthesis and analysis

Because of the large variation in study methods and measurements, a meta-analysis of methodological features and contextual factors associated with the frequency of data extraction methods was not possible. We therefore present a narrative synthesis of our findings. We did not thoroughly assess risk of bias, including reporting bias, for these reports because the study designs did not match domains evaluated in commonly used instruments such as the Cochrane Risk of Bias tool [ 1 ] or QUADAS-2 instrument used for systematic reviews of randomized trials and diagnostic test accuracy studies, respectively [ 14 ].

Of 1190 unique citations retrieved, we selected 75 reports for full-text screening, and we included 26 articles that met our inclusion criteria (Fig.  1 ). Agreement on abstract and full-text screening was 0.97 and 1.00.

An external file that holds a picture, illustration, etc.
Object name is 13643_2015_66_Fig1_HTML.jpg

Process of screening the articles to be included for this systematic review

Study characteristics

Table  1 provides a list of items to be considered in the data extraction process based on the Cochrane Handbook (Appendix 2 ) [ 1 ], CONSORT statement [ 9 ], STARD initiative [ 10 ], and PICO [ 11 ], PECODR [ 12 ], and PIBOSO [ 13 ] frameworks. We provide the major group for each field and report which standard focused on that field. Finally, we report whether there was a published method to extract that field. Table  1 also identifies the data elements relevant to systematic review process categorized by their domain and the standard from which the element was adopted and was associated with existing automation methods, where present.

Results of individual studies

Table  2 summarizes the existing information extraction studies. For each study, the table provides the citation to the study (study: column 1), data elements that the study focused on (extracted elements: column 2), dataset used by the study (dataset: column 3), algorithm and methods used for extraction (method: column 4), whether the study extracted only the sentence containing the data elements, full concept or neither of these (sentence/concept/neither: column 5), whether the extraction was done from full-text or abstracts (full text/abstract: column 6) and the main accuracy results reported by the system (results: column 7). The studies are arranged by increasing complexity by ordering studies that classified sentences before those that extracted the concepts and ordering studies that extracted data from abstracts before those that extracted data from full-text reports.

A summary of included extraction methods and their evaluation

StudyExtracted elementsDatasetMethodSentence/Concept/NeitherFull text (F)/Abstract (A)Results
Dawes et al. (2007) [ ]PECODR20 evidence-based medicine journal synopses (759 extracts from the corresponding PubMed abstracts)Proposed potential lexical patterns and assessed using NVIvo softwareNeitherAbstractAgreement among the annotators was 86.6 and 85 %, which rose up to 98.4 and 96.9 % after consensus. No automated system.
Kim et al. (2011) [ ]PIBOSO1000 medical abstracts (PIBOSO corpus)Conditional random fields with various features based on lexical, semantic, structural and sequential informationSentenceAbstractMicro-averaged F-scores on structured and unstructured: 80.9 and 66.9 %, 63.1 % on an external dataset
Boudin et al. (2010) [ ]PICO (I and C were combined together)26,000 abstracts from PubMed, first sentences from the structured abstractCombination of multiple supervised classification algorithms: random forests (RF), naive Bayes (NB), support vector machines (SVM), and multi-layer perceptron (MLP)SentenceAbstractF-score of 86.3 % for P, 67 % for I (and C), and 56.3 % for O
Huang et al. (2011) [ ]PICO (except C)23,472 sentences from the structured abstractsnaïve BayesSentenceAbstractF-measure of 0.91 for patient/problem, 0.75 for intervention, and 0.88 for outcome
Verbeke et al. (2012) [ ]PIBOSOPIBOSO corpusStatistical relational learning with kernels, kLogSentenceAbstractMicro-averaged F of 84.29 % on structured abstracts and 67.14 % on unstructured abstracts
Huang et al. (2013) [ ]PICO (except C)19,854 structured abstracts of randomized controlled trialsFirst sentence of the section or all sentences in the section, NB classifierSentenceAbstractFirst sentence of the section: F-scores for P: 0.74, I: 0.66, and O: 0.73
All sentences in the section: F-scores for P: 0.73, I: 0.73, and O: 0.74
Hassanzadeh et al. (2014) [ ]PIBOSO (Population-Intervention-Background-Outcome-Study Design-Other)PIBOSO corpus, 1000 structured and unstructured abstractsCRF with discriminate set of featuresSentenceAbstractMicro-averaged F-score: 91
Robinson (2012) [ ]Patient-oriented evidence: morbidity, morality, symptom severity, quality of life1356 PubMed abstractsSVM, NB, multinomial NB, logistic regressionSentenceAbstractBest results achieved via SVM: F-measure of 0.86
Chung (2009) [ ]Intervention, comparisons203 RCT abstracts for training and 124 for testingCoordinating constructs are identified using a full parser, which are further classified as positive or not using CRFSentenceAbstractF-score: 0.76
Hara and Matsumoto (2007) [ ]Patient population, comparison200 abstracts labeled as ‘Neoplasms’ and ‘Clinical Trial, Phase III’Categorizing noun phrases (NPs) into classes such as ‘Disease’, ‘Treatment’ etc. using CRF and use regular expressions on the sentence with classified Noun PhrasesSentenceAbstractF-measure of 0.91 for the task of noun phrase classification. Results of sentence classification: F-,measure of 0.8 for patient population and 0.81 for comparisons
Davis-Desmond and Molla (2012) [ ]Detecting statistical evidence194 randomized controlled trial abstracts from PubMedRule-based classifier using negation expressionsSentenceAbstractAccuracy: between 88 and 98 % at 95 % CI
Zhao et al. (2012) [ ]Patient, result, Intervention, Study Design, Research Goal19,893 medical abstracts and full text articles from 17 journal websitesConditional random fieldsSentenceFull textF-scores for sentence classification: patient: 0.75, intervention: 0.61, result: 0.91, study design: 0.79, research goal: 0.76
Hsu et al. (2012) [ ]Hypothesis, statistical method, outcomes and generalizability42 full-text papersRegular expressionsSentenceFull textFor classification task, F-score of 0.86 for hypothesis, 0.84 for statistical method, 0.9 for outcomes, and 0.59 for generalizability
Song et al. (2013) [ ]Analysis (statistical facts), general (generally accepted facts), recommend (recommendations about interventions), rule (guidelines)346 sentences from three clinical guideline documentMaximum entropy (MaxEnt), SVM, MLP, radial basis function network (RBFN), NB as classifiers and information gain (IG), genetic algorithm (GA) for feature selectionSentenceFull textF-score of 0.98 for classifying sentences
Demner-Fushman and Lin (2007) [ ]PICO (I and C were combined)275 manually annotated abstractsRule-based approach to identify sentence containing PICO and supervised classifier for OutcomesConceptAbstractPrecision of 0.8 for population, 0.86 for problem, 0.80 for intervention, 0.64–0.95 for outcome
Kelly and Yang (2013) [ ]Age of subjects, duration of study, ethnicity of subjects, gender of subjects, health status of subjects, number of subjects386 abstracts from PubMed obtained with the query ‘soy and cancer’Regular expressions, gazetteerConceptAbstractF-scores for age of subjects: 1.0, duration of study: 0.911, ethnicity of subjects: 0.949, gender of subjects: 1.0, health status of subjects: 0.874, number of subjects: 0.963
Hansen et al. (2008) [ ]Number of trial participants233 abstracts from PubMedSupport vector machinesConceptAbstractF-measure: 0.86
Xu et al. (2007) [ ]Subject demographics such as subject descriptors, number of participants and diseases/symptoms and their descriptors250 randomized controlled trial abstractsText classification augmented with hidden Markov models was used to identify sentences; rules over parse tree to extract relevant informationSentence, conceptAbstractPrecision for subject descriptors: 0.83 %, number of trial participants: 0.923, diseases/symptoms: 51.0 %, descriptors of diseases/symptoms: 92.0 %
Summerscales et al. (2009) [ ]Treatments, groups and outcomes100 abstracts from Conditional random fieldsConceptAbstractF-scores for treatments: 0.49, groups: 0.82, outcomes: 0.54
Summerscales et al. (2011) [ ]Groups, outcomes, group sizes, outcome numbers263 abstracts from between 2005 and 2009CRF, MaxEnt, template fillingConceptAbstractF-scores for groups: 0.76, outcomes: 0.42, group sizes: 0.80, outcome numbers: 0.71
Kiritchenko et al. (2010) [ ]Eligibility criteria, sample size, drug dosage, primary outcomes50 full-text journal articles with 1050 test instancesSVM classifier to recover relevant sentences, extraction rules for correct solutionsConceptFull textP5 precision for the classifier: 0.88, precision and recall of the extraction rules: 93 and 91 %, respectively
Lin et al. (2010) [ ]Intervention, age group of the patients, geographical area, number of patients, time duration of the study93 open access full-text literature documenting oncological and cardio-vascular studies from 2005 to 2008Linear chain, conditional random fieldsConceptFull textPrecision of 0.4 for intervention, 0.63 for age group, 0.44 for geographical area, 0.43 for number of patients and 0.83 for time period
Restificar et al. (2012) [ ]Eligibility criteria44,203 full-text articles with clinical trialsLatent Dirichlet allocation along with logistic regressionConceptFull text75 and 70 % accuracy based on similarity for inclusion and exclusion criteria, respectively.
De Bruijn et al. (2008) [ ]Eligibility criteria, sample size, treatment duration, intervention, primary and secondary outcomes88 randomized controlled trials full-text articles from five medical journalsSVM classifier to identify the most promising sentences; manually crafted weak extraction rules for the information elementsSentence, conceptFull textPrecision for eligibility criteria: 0.69, sample size: 0.62, treatment duration: 0.94, intervention: 0.67, primary outcome: 1.00, secondary outcome: 0.67
Zhu et al. (2012) [ ]Subject demographics: patient age, gender, disease and ethnicity50 randomized controlled trials full-text articlesManually crafted rules for extraction from the parse treeConceptFull textDisease extraction: for exact matching, the F-score was 0.64. For partially matched, it was 0.85.
Marshall et al. (2014) [ ]Risk of bias concerning sequence generation, allocation concealment and blinding2200 clinical trial reportsSoft-margin SVM for a joint model of risk of bias prediction and supporting sentence extractionSentenceFull textFor sentence identification: F-score of 0.56, 0.48, 0.35 and 0.38 for random sequence generation, allocation concealment, blinding of participants and personnel, and blinding of outcome assessment

The accuracy of most ( N  = 18, 69 %) studies was measured using a standard text mining metric known as F-score, which is the harmonic mean of precision (positive predictive value) and recall (sensitivity). Some studies ( N  = 5, 19 %) reported only the precision of their method, while some reported the accuracy values ( N  = 2, 8 %). One study (4 %) reported P5 precision, which indicates the fraction of positive predictions among the top 5 results returned by the system.

Studies that did not implement a data extraction system

Dawes et al. [ 12 ] identified 20 evidence-based medicine journal synopses with 759 extracts in the corresponding PubMed abstracts. Annotators agreed with the identification of an element 85 and 87 % for the evidence-based medicine synopses and PubMed abstracts, respectively. After consensus among the annotators, agreement rose to 97 and 98 %, respectively. The authors proposed various lexical patterns and developed rules to discover each PECODR element from the PubMed abstracts and the corresponding evidence-based medicine journal synopses that might make it possible to partially or fully automate the data extraction process.

Studies that identified sentences but did not extract data elements from abstracts only

Kim et al. [ 13 ] used conditional random fields (CRF) [ 15 ] for the task of classifying sentences in one of the PICO categories. The features were based on lexical, syntactic, structural, and sequential information in the data. The authors found that unigrams, section headings, and sequential information from preceding sentences were useful features for the classification task. They used 1000 medical abstracts from PIBOSO corpus and achieved micro-averaged F-scores of 91 and 67 % over datasets of structured and unstructured abstracts, respectively.

Boudin et al. [ 16 ] utilized a combination of multiple supervised classification techniques for detecting PICO elements in the medical abstracts. They utilized features such as MeSH semantic types, word overlap with title, number of punctuation marks on random forests (RF), naive Bayes (NB), support vector machines (SVM), and multi-layer perceptron (MLP) classifiers. Using 26,000 abstracts from PubMed, the authors took the first sentence in the structured abstracts and assigned a label automatically to build a large training data. They obtained an F-score of 86 % for identifying participants (P), 67 % for interventions (I) and controls (C), and 56 % for outcomes (O).

Huang et al. [ 17 ] used a naive Bayes classifier for the PICO classification task. The training data were generated automatically from the structured abstracts. For instance, all sentences in the section of the structured abstract that started with the term “PATIENT” were used to identify participants (P). In this way, the authors could generate a dataset of 23,472 sentences. Using 23,472 sentences from the structured abstracts, they obtained an F-score of 91 % for identifying participants (P), 75 % for interventions (I), and 88 % for outcomes (O).

Verbeke et al. [ 18 ] used a statistical relational learning-based approach (kLog) that utilized relational features for classifying sentences. The authors also used the PIBOSO corpus for evaluation and achieved micro-averaged F-score of 84 % on structured abstracts and 67 % on unstructured abstracts, which was a better performance than Kim et al. [ 13 ].

Huang et al. [ 19 ] used 19,854 structured extracts and trained two classifiers: one by taking the first sentences of each section (termed CF by the authors) and the other by taking all the sentences in each section (termed CA by the authors). The authors used the naive Bayes classifier and achieved F-scores of 74, 66, and 73 % for identifying participants (P), interventions (I), and outcomes (O), respectively, by the CF classifier. The CA classifier gave F-scores of 73, 73, and 74 % for identifying participants (P), interventions (I), and outcomes (O), respectively.

Hassanzadeh et al. [ 20 ] used the PIBOSO corpus for the identification of sentences with PIBOSO elements. Using conditional random fields (CRF) with discriminative set of features, they achieved micro-averaged F-score of 91 %.

Robinson [ 21 ] used four machine learning models, 1) support vector machines, 2) naive Bayes, 3) naive Bayes multinomial, and 4) logistic regression to identify medical abstracts that contained patient-oriented evidence or not. These data included morbidity, mortality, symptom severity, and health-related quality of life. On a dataset of 1356 PubMed abstracts, the authors achieved the highest accuracy using a support vector machines learning model and achieved an F-measure of 86 %.

Chung [ 22 ] utilized a full sentence parser to identify the descriptions of the assignment of treatment arms in clinical trials. The authors used predicate-argument structure along with other linguistic features with a maximum entropy classifier. They utilized 203 abstracts from randomized trials for training and 124 abstracts for testing and achieved an F-score of 76 %.

Hara and Matsumoto [ 23 ] dealt with the problem of extracting “patient population” and “compared treatments” from medical abstracts. Given a sentence from the abstract, the authors first performed base noun-phrase chunking and then categorized the base noun-phrase into one of the five classes: “disease”, “treatment”, “patient”, “study”, and “others” using support vector machine and conditional random field models. After categorization, the authors used regular expression to extract the target words for patient population and comparison. The authors used 200 abstracts including terms such as “neoplasms” and “clinical trial, phase III” and obtained 91 % accuracy for the task of noun phrase classification. For sentence classification, the authors obtained a precision of 80 % for patient population and 82 % for comparisons.

Studies that identified only sentences but did not extract data elements from full-text reports

Zhao et al. [ 24 ] used two classification tasks to extract study data including patient details, including one at the sentence level and another at the keyword level. The authors first used a five-class scheme including 1) patient, 2) result, 3) intervention, 4) study design, and 5) research goal and tried to classify sentences into one of these five classes. They further used six classes for keywords such as sex (e.g., male, female), age (e.g., 54-year-old), race (e.g., Chinese), condition (e.g., asthma), intervention, and study design (e.g., randomized trial). They utilized conditional random fields for the classification task. Using 19,893 medical abstracts and full-text articles from 17 journal websites, they achieved F-scores of 75 % for identifying patients, 61 % for intervention, 91 % for results, 79 % for study design, and 76 % for research goal.

Hsu et al. [ 25 ] attempted to classify whether a sentence contains the “hypothesis”, “statistical method”, “outcomes”, or “generalizability” of the study and then extracted the values. Using 42 full-text papers, the authors obtained F-scores of 86 % for identifying hypothesis, 84 % for statistical method, 90 % for outcomes, and 59 % for generalizability.

Song et al. [ 26 ] used machine learning-based classifiers such as maximum entropy classifier (MaxEnt), support vector machines (SVM), multi-layer perceptron (MLP), naive Bayes (NB), and radial basis function network (RBFN) to classify the sentences into categories such as analysis (statistical facts found by clinical experiment), general (generally accepted scientific facts, process, and methodology), recommendation (recommendations about interventions), and rule (guidelines). They utilized the principle of information gain (IG) as well as genetic algorithm (GA) for feature selection. They used 346 sentences from the clinical guideline document and obtained an F-score of 98 % for classifying sentences.

Marshall et al. [ 27 ] used soft-margin support vector machines in a joint model for risk of bias assessment along with supporting sentences for random sequence generation, allocation concealment, blinding of participants and personnel, and blinding of outcome assessment, among others. They utilized presence of unigrams in the supporting sentences as features in their model. Working with full text of 2200 clinical trials, the joint model achieved F-scores of 56, 48, 35, and 38 % for identifying sentences corresponding to random sequence generation, allocation concealment, blinding of participants and personnel, and blinding of outcome assessment, respectively.

Studies that identified data elements only from abstracts but not from full texts

Demner-Fushman and Lin [ 28 ] used a rule-based approach to identify sentences containing PICO. Using 275 manually annotated abstracts, the authors achieved an accuracy of 80 % for population extraction and 86 % for problem extraction. They also utilized a supervised classifier for outcome extraction and achieved accuracy from 64 to 95 % across various experiments.

Kelly and Yang [ 29 ] used regular expressions and gazetteer to extract the number of participants, participant age, gender, ethnicity, and study characteristics. The authors utilized 386 abstracts from PubMed obtained with the query “soy and cancer” and achieved F-scores of 96 % for identifying the number of participants, 100 % for age of participants, 100 % for gender of participants, 95 % for ethnicity of participants, 91 % for duration of study, and 87 % for health status of participants.

Hansen et al. [ 30 ] used support vector machines [ 31 ] to extract number of trial participants from abstracts of the randomized control trials. The authors utilized features such as part-of-speech tag of the previous and next words and whether the sentence is grammatically complete (contained a verb). Using 233 abstracts from PubMed, they achieved an F-score of 86 % for identifying participants.

Xu et al. [ 32 ] utilized text classifications augmented with hidden Markov models [ 33 ] to identify sentences about subject demographics. These sentences were then parsed to extract information regarding participant descriptors (e.g., men, healthy, elderly), number of trial participants, disease/symptom name, and disease/symptom descriptors. After testing over 250 RCT abstracts, the authors obtained an accuracy of 83 % for participant descriptors: 83 %, 93 % for number of trial participants, 51 % for diseases/symptoms, and 92 % for descriptors of diseases/symptoms.

Summerscales et al. [ 34 ] used a conditional random field-based approach to identify various named entities such as treatments (drug names or complex phrases) and outcomes. The authors extracted 100 abstracts of randomized trials from the BMJ and achieved F-scores of 49 % for identifying treatment, 82 % for groups, and 54 % for outcomes.

Summerscales et al. [ 35 ] also proposed a method for automatic summarization of results from the clinical trials. The authors first identified the sentences that contained at least one integer (group size, outcome numbers, etc.). They then used the conditional random field classifier to find the entity mentions corresponding to treatment groups or outcomes. The treatment groups, outcomes, etc. were then treated as various “events.” To identify all the relevant information for these events, the authors utilized templates with slots. The slots were then filled using a maximum entropy classifier. They utilized 263 abstracts from the BMJ and achieved F-scores of 76 % for identifying groups, 42 % for outcomes, 80 % for group sizes, and 71 % for outcome numbers.

Studies that identified data elements from full-text reports

Kiritchenko et al. [ 36 ] developed ExaCT, a tool that assists users with locating and extracting key trial characteristics such as eligibility criteria, sample size, drug dosage, and primary outcomes from full-text journal articles. The authors utilized a text classifier in the first stage to recover the relevant sentences. In the next stage, they utilized extraction rules to find the correct solutions. The authors evaluated their system using 50 full-text articles describing randomized trials with 1050 test instances and achieved a P5 precision of 88 % for identifying the classifier. Precision and recall of their extraction rules was found to be 93 and 91 %, respectively.

Restificar et al. [ 37 ] utilized latent Dirichlet allocation [ 38 ] to infer the latent topics in the sample documents and then used logistic regression to compute the probability that a given candidate criterion belongs to a particular topic. Using 44,203 full-text reports of randomized trials, the authors achieved accuracies of 75 and 70 % for inclusion and exclusion criteria, respectively.

Lin et al. [ 39 ] used linear-chain conditional random field for extracting various metadata elements such as number of patients, age group of the patients, geographical area, intervention, and time duration of the study. Using 93 full-text articles, the authors achieved a threefold cross validation precision of 43 % for identifying number of patients, 63 % for age group, 44 % for geographical area, 40 % for intervention, and 83 % for time period.

De Bruijn et al. [ 40 ] used support vector machine classifier to first identify sentences describing information elements such as eligibility criteria, sample size, etc. The authors then used manually crafted weak extraction rules to extract various information elements. Testing this two-stage architecture on 88 randomized trial reports, they obtained a precision of 69 % for identifying eligibility criteria, 62 % for sample size, 94 % for treatment duration, 67 % for intervention, 100 % for primary outcome estimates, and 67 % for secondary outcomes.

Zhu et al. [ 41 ] also used manually crafted rules to extract various subject demographics such as disease, age, gender, and ethnicity. The authors tested their method on 50 articles and for disease extraction obtained an F-score of 64 and 85 % for exactly matched and partially matched cases, respectively.

Risk of bias across studies

In general, many studies have a high risk of selection bias because the gold standards used in the respective studies were not randomly selected. The risk of performance bias is also likely to be high because the investigators were not blinded. For the systems that used rule-based approaches, it was unclear whether the gold standard was used to train the rules or if there were a separate training set. The risk of attrition bias is unclear based on the study design of these non-randomized studies evaluating the performance of NLP methods. Lastly, the risk of reporting bias is unclear because of the lack of protocols in the development, implementation, and evaluation of NLP methods.

Summary of evidence

Extracting the data elements.

  • Participants — Sixteen studies explored the extraction of the number of participants [ 12 , 13 , 16 – 20 , 23 , 24 , 28 – 30 , 32 , 39 ], their age [ 24 , 29 , 39 , 41 ], sex [ 24 , 39 ], ethnicity [ 41 ], country [ 24 , 39 ], comorbidities [ 21 ], spectrum of presenting symptoms, current treatments, and recruiting centers [ 21 , 24 , 28 , 29 , 32 , 41 ], and date of study [ 39 ]. Among them, only six studies [ 28 – 30 , 32 , 39 , 41 ] extracted data elements as opposed to highlighting the sentence containing the data element. Unfortunately, each of these studies used a different corpus of reports, which makes direct comparisons impossible. For example, Kelly and Yang [ 29 ] achieved high F-scores of 100 % for age of participants, 91 % for duration of study, 95 % for ethnicity of participants, 100 % for gender of subjects, 87 % for health status of participants, and 96 % for number of participants on a dataset of 386 abstracts.
  • Intervention — Thirteen studies explored the extraction of interventions [ 12 , 13 , 16 – 20 , 22 , 24 , 28 , 34 , 39 , 40 ], intervention groups [ 34 , 35 ], and intervention details (for replication if feasible) [ 36 ]. Of these, only six studies [ 28 , 34 – 36 , 39 , 40 ] extracted intervention elements. Unfortunately again, each of these studies used a different corpus. For example, Kiritchenko et al. [ 36 ] achieved an F-score of 75–86 % for intervention data elements on a dataset of 50 full-text journal articles.
  • Outcomes and comparisons — Fourteen studies also explored the extraction of outcomes and time points of collection and reporting [ 12 , 13 , 16 – 20 , 24 , 25 , 28 , 34 – 36 , 40 ] and extraction of comparisons [ 12 , 16 , 22 , 23 ]. Of these, only six studies [ 28 , 34 – 36 , 40 ] extracted the actual data elements. For example, De Bruijn et al. [ 40 ] obtained an F-score of 100 % for extracting primary outcome and 67 % for secondary outcome from 88 full-text articles. Summerscales [ 35 ] utilized 263 abstracts from the BMJ and achieved an F-score of 42 % for extracting outcomes.
  • Results — Two studies [ 36 , 40 ] extracted sample size data element from full text on two different data sets. De Bruijn et al. [ 40 ] obtained an accuracy of 67 %, and Kiritchenko et al. [ 36 ] achieved an F-score of 88 %.
  • Interpretation — Three studies explored extraction of overall evidence [ 26 , 42 ] and external validity of trial findings [ 25 ]. However, all these studies only highlighted sentences containing the data elements relevant to interpretation.
  • Objectives — Two studies [ 24 , 25 ] explored the extraction of research questions and hypotheses. However, both these studies only highlighted sentences containing the data elements relevant to interpretation.
  • Methods — Twelve studies explored the extraction of the study design [ 13 , 18 , 20 , 24 ], study duration [ 12 , 29 , 40 ], randomization method [ 25 ], participant flow [ 36 , 37 , 40 ], and risk of bias assessment [ 27 ]. Of these, only four studies [ 29 , 36 , 37 , 40 ] extracted the corresponding data elements from text using different sets of corpora. For example, Restificar et al. [ 37 ] utilized 44,203 full-text clinical trial articles and achieved accuracies of 75 and 70 % for inclusion and exclusion criteria, respectively.
  • Miscellaneous — One study [ 26 ] explored extraction of key conclusion sentence and achieved a high F-score of 98 %.

Related reviews and studies

Previous reviews on the automation of systematic review processes describe technologies for automating the overall process or other steps. Tsafnat et al. [ 43 ] surveyed the informatics systems that automate some of the tasks of systematic review and report systems for each stage of systematic review. Here, we focus on data extraction. None of the existing reviews [ 43 – 47 ] focus on the data extraction step. For example, Tsafnat et al. [ 43 ] presented a review of techniques to automate various aspects of systematic reviews, and while data extraction has been described as a task in their review, they only highlighted three studies as an acknowledgement of the ongoing work. In comparison, we identified 26 studies and critically examined their contribution in relation to all the data elements that need to be extracted to fully support the data extraction step.

Thomas et al. [ 44 ] described the application of text mining technologies such as automatic term recognition, document clustering, classification, and summarization to support the identification of relevant studies in systematic reviews. The authors also pointed out the potential of these technologies to assist at various stages of the systematic review. Slaughter et al. [ 45 ] discussed necessary next steps towards developing “living systematic reviews” rather than a static publication, where the systematic reviews can be continuously updated with the latest knowledge available. The authors mentioned the need for development of new tools for reporting on and searching for structured data from clinical trials.

Tsafnat et al. [ 46 ] described four main tasks in systematic review: identifying the relevant studies, evaluating risk of bias in selected trials, synthesis of the evidence, and publishing the systematic reviews by generating human-readable text from trial reports. They mentioned text extraction algorithms for evaluating risk of bias and evidence synthesis but remain limited to one particular method for extraction of PICO elements.

Most natural language processing research has focused on reducing the workload for the screening step of systematic reviews (Step 3). Wallace et al. [ 48 , 49 ] and Miwa et al. [ 50 ] proposed an active learning framework to reduce the workload in citation screening for inclusion in the systematic reviews. Jonnalagadda et al. [ 51 ] designed a distributional semantics-based relevance feedback model to semi-automatically screen citations. Cohen et al. [ 52 ] proposed a module for grouping studies that are closely related and an automated system to rank publications according to the likelihood for meeting the inclusion criteria of a systematic review. Choong et al. [ 53 ] proposed an automated method for automatic citation snowballing to recursively pursue relevant literature for helping in evidence retrieval for systematic reviews. Cohen et al. [ 54 ] constructed a voting perceptron-based automated citation classification system to classify each article as to whether it contains high-quality, drug-specific evidence. Adeva et al. [ 55 ] also proposed a classification system for screening articles for systematic review. Shemilt et al. [ 56 ] also discussed the use of text mining to reduce screening workload in systematic reviews.

Research implications

No standard gold standards or dataset.

Among the 26 studies included in this systematic review, only three of them use a common corpus, namely 1000 medical abstracts from the PIBOSO corpus. Unfortunately, even that corpus facilitates only classification of sentences into whether they contain one of the data elements corresponding to the PIBOSO categories. No two other studies shared the same gold standard or dataset for evaluation. This limitation made it impossible for us to compare and assess the relative significance of the reported accuracy measures.

Separate systems for each data element

Few data elements, which are also relatively straightforward to extract automatically, such as the total number of participants (14 overall and 5 for extracting the actual data elements), have a relatively higher number of studies aiming towards extracting the same data element. This is not the case with other data elements. There are 27 out of 52 potential data elements that have not been explored for automated extraction, even if for highlighting the sentences containing them; seven more data elements were explored just by one study. There are 38 out of 52 potential data elements (>70 %) that have not been explored for automated extraction of the actual data elements; three more data elements were explored just by one study. The highest number of data elements extracted by a single study is only seven (14 %). This finding means that not only are more studies needed to explore the remaining 70 % data elements, but that there is an urgent need for a unified framework or system to extract all necessary data elements. The current state of informatics research for data extraction is exploratory, and multiple studies need to be conducted using the same gold standard and on the extraction of the same data elements for effective comparison.

Limitations

Our study has limitations. First, there is a possibility that data extraction algorithms were not published in journals or that our search might have missed them. We sought to minimize this limitation by searching in multiple bibliographic databases, including PubMed, IEEExplore, and ACM Digital Library. However, investigators may have also failed to publish algorithms that had lower F-scores than were previously reported, which we would not have captured. Second, we did not publish a protocol a priori, and our initial findings may have influenced our methods. However, we performed key steps, including screening, full-text review, and data extraction in duplicate to minimize potential bias in our systematic review.

Future work

“On demand” access to summarized evidence and best practices has been considered a sound strategy to satisfy clinicians’ information needs and enhance decision-making [ 57 – 65 ]. A systematic review of 26 studies concluded that information-retrieval technology produces positive impact on physicians in terms of decision enhancement, learning, recall, reassurance, and confirmation [ 62 ]. Slaughter et al. [ 45 ] discussed necessary next steps towards developing “living systematic reviews” rather than a static publication, where the systematic reviews can be continuously updated with the latest knowledge available. The authors mention the need for development of new tools for reporting on and searching for structured data from published literature. Automated information extraction framework that extract data elements have the potential to assist the systematic reviewers and to eventually automate the screening and data extraction steps.

Medical science is currently witnessing a rapid pace at which medical knowledge is being created—75 clinical trials a day [ 66 ]. Evidence-based medicine [ 67 ] requires clinicians to keep up with published scientific studies and use them at the point of care. However, it has been shown that it is practically impossible to do that even within a narrow specialty [ 68 ]. A critical barrier is that finding relevant information, which may be located in several documents, takes an amount of time and cognitive effort that is incompatible with the busy clinical workflow [ 69 , 70 ]. Rapid systematic reviews using automation technologies will enable clinicians with up-to-date and systematic summaries of the latest evidence.

Our systematic review describes previously reported methods to identify sentences containing some of the data elements for systematic reviews and only a few studies that have reported methods to extract these data elements. However, most of the data elements that would need to be considered for systematic reviews have been insufficiently explored to date, which identifies a major scope for future work. We hope that these automated extraction approaches might first act as checks for manual data extraction currently performed in duplicate; then serve to validate manual data extraction done by a single reviewer; then become the primary source for data element extraction that would be validated by a human; and eventually completely automate data extraction to enable living systematic reviews.

Abbreviations

NLPnatural language processing
ONSORTCONsolidated Standards Of Reporting Trials
STARDStandards for Reporting of Diagnostic Accuracy
PICOPopulation, Intervention, Comparison, Outcomes
PECODRPatient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results
PIBOSOPopulation, Intervention, Background, Outcome, Study Design, Other
CRFconditional random fields
NBnaive Bayes
RCTrandomized control trial
BMJBritish Medical Journal

Search strategies

Below, we provide the search strategies used in PubMed, ACM Digital Library, and IEEExplore. The search was conducted on January 6, 2015.

(“identification” [Title] OR “extraction” [Title] OR “extracting” [Title] OR “detection” [Title] OR “identifying” [Title] OR “summarization” [Title] OR “learning approach” [Title] OR “automatically” [Title] OR “summarization” [Title] OR “identify sections” [Title] OR “learning algorithms” [Title] OR “Interpreting” [Title] OR “Inferring” [Title] OR “Finding” [Title] OR “classification” [Title]) AND (“medical evidence”[Title] OR “PICO”[Title] OR “PECODR” [Title] OR “intervention arms” [Title] OR “experimental methods” [Title] OR “study design parameters” [Title] OR “Patient oriented Evidence” [Title] OR “eligibility criteria” [Title] OR “clinical trial characteristics” [Title] OR “evidence based medicine” [Title] OR “clinically important elements” [Title] OR “evidence based practice” [Title] “results from clinical trials” [Title] OR “statistical analyses” [Title] OR “research results” [Title] OR “clinical evidence” [Title] OR “Meta Analysis” [Title] OR “Clinical Research” [Title] OR “medical abstracts” [Title] OR “clinical trial literature” [Title] OR ”clinical trial characteristics” [Title] OR “clinical trial protocols” [Title] OR “clinical practice guidelines” [Title]).

We performed this search only in the metadata.

(“identification” OR “extraction” OR “extracting” OR “detection” OR “Identifying” OR “summarization” OR “learning approach” OR “automatically” OR “summarization” OR “identify sections” OR “learning algorithms” OR “Interpreting” OR “Inferring” OR “Finding” OR “classification”) AND (“medical evidence” OR “PICO” OR “intervention arms” OR “experimental methods” OR “eligibility criteria” OR “clinical trial characteristics” OR “evidence based medicine” OR “clinically important elements” OR “results from clinical trials” OR “statistical analyses” OR “clinical evidence” OR “Meta Analysis” OR “clinical research” OR “medical abstracts” OR “clinical trial literature” OR “clinical trial protocols”).

ACM digital library

((Title: “identification” or Title: “extraction” or Title: “extracting” or Title: “detection” or Title: “Identifying” or Title: “summarization” or Title: “learning approach” or Title: “automatically” or Title: “summarization “or Title: “identify sections” or Title: “learning algorithms” or Title: “scientific artefacts” or Title: “Interpreting” or Title: “Inferring” or Title: “Finding” or Title: “classification” or “statistical techniques”) and (Title: “medical evidence” or Abstract: “medical evidence” or Title: “PICO” or Abstract: “PICO” or Title: “intervention arms” or Title: “experimental methods” or Title: “study design parameters” or Title: “Patient oriented Evidence” or Abstract: “Patient oriented Evidence” or Title: “eligibility criteria” or Abstract: “eligibility criteria” or Title: “clinical trial characteristics” or Abstract: “clinical trial characteristics” or Title: “evidence based medicine” or Abstract: “evidence based medicine” or Title: “clinically important elements” or Title: “evidence based practice” or Title: “treatments” or Title: “groups” or Title: “outcomes” or Title: “results from clinical trials” or Title: “statistical analyses” or Abstract: “statistical analyses” or Title: “research results” or Title: “clinical evidence” or Abstract: “clinical evidence” or Title: “Meta Analysis” or Abstract:“Meta Analysis” or Title:“Clinical Research” or Title: “medical abstracts” or Title: “clinical trial literature” or Title: “Clinical Practice” or Title: “clinical trial protocols” or Abstract: “clinical trial protocols” or Title: “clinical questions” or Title: “clinical trial design”)).

Checklist of items to consider in data collection or data extraction from Cochrane Handbook [ 1 ]

Source
 • Study ID (created by review author)
 • Report ID (created by review author)
 • Review author ID (created by review author)
 • Citation and contact details
Eligibility
 • Confirm eligibility for review
 • Reason for exclusion
Methods
 • Study design
 • Total study duration
 • Sequence generation
 • Allocation sequence concealment
 • Blinding
 • Other concerns about bias
Participants
 • Total number
 • Setting
 • Diagnostic criteria
 • Age
 • Sex
 • Country
 • [Co-morbidity]
 • [Socio-demographics]
 • [Ethnicity]
 • [Date of study]
Interventions
 • Total number of intervention groups.
For each intervention and comparison group of interest:
 • Specific intervention
 • Intervention details (sufficient for replication, if feasible)
 • [Integrity of intervention]
Outcomes
 • Outcomes and time points (i) collected; (ii) reported
For each outcome of interest:
 • Outcome definition (with diagnostic criteria if relevant)
 • Unit of measurement (if relevant)
 • For scales: upper and lower limits, and whether high or low score is good
Results
 • Number of participants allocated to each intervention group.
For each outcome of interest:
 • Sample size
 • Missing participants
 • Summary data for each intervention group (e.g. 2 × 2 table for dichotomous data; means and SDs for continuous data)
 • [Estimate of effect with confidence interval; value]
 • [Subgroup analyses]
Miscellaneous
 • Funding source
 • Key conclusions of the study authors
 • Miscellaneous comments from the study authors
 • References to other relevant studies
 • Correspondence required
 • Miscellaneous comments by the review authors

Items without parentheses should normally be collected in all reviews; items in square brackets may be relevant to some reviews and not to others

a Full description required for standard items in the ‘Risk of bias’ tool

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

SRJ and PG had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design were done by SRJ. SRJ, PG, and MDH did the acquisition, analysis, or interpretation of data. SRJ and PG drafted the manuscript. SRJ, PG, and MDH did the critical revision of the manuscript for important intellectual content. SRJ obtained funding. PG and SRJ provided administrative, technical, or material support. SRJ did the study supervision. All authors read and approved the final manuscript.

Funding/Support

This project was partly supported by the National Library of Medicine (grant 5R00LM011389). The Cochrane Heart Group US Satellite at Northwestern University is supported by an intramural grant from the Northwestern University Feinberg School of Medicine.

Role of the sponsors

The funding source had no role in the design and conduct of the study; collection, management, analysis, or interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Library of Medicine.

Additional contributions

Mark Berendsen (Research Librarian, Galter Health Sciences Library, Northwestern University Feinberg School of Medicine) provided insights on the design of this study, including the search strategies, and Dr. Kalpana Raja (Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine) reviewed the manuscript. None of them received compensation for their contributions.

Contributor Information

Siddhartha R. Jonnalagadda, Email: ude.nretsewhtron@dis .

Pawan Goyal, Email: ni.tenre.pgktii.esc@gnawap .

Mark D. Huffman, Email: ude.nretsewhtron@namffuh-m .

literature review data extraction

  • Subscribe to journal Subscribe
  • Get new issue alerts Get alerts
  • Submit your manuscript

Secondary Logo

Journal logo.

Colleague's E-mail is Invalid

Your message has been successfully sent to your colleague.

Save my selection

Using theater as an innovative knowledge translation approach for health research: a scoping review protocol

Jackson, Poppy 1 ; Luke, Alison 1,2 ; Goudreau, Alex 2,3 ; Doucet, Shelley 1,2

1 Centre for Research in Integrated Care, University of New Brunswick, Saint John, NB, Canada

2 The University of New Brunswick (UNB) Saint John Collaboration for Evidence-Informed Healthcare: A JBI Centre of Excellence, Saint John, NB, Canada

3 UNB Libraries, University of New Brunswick, Saint John, NB, Canada

The authors declare no conflicts of interest.

Correspondence: Poppy Jackson, [email protected]

Objective: 

The objective of this review is to synthesize the existing literature on how theater has been used as a knowledge translation approach for health research and to identify the outcome measures employed for evaluation as well as the facilitators/challenges related to this approach.

Introduction: 

The use of arts-based knowledge translation methods is relatively new in health research but has already shown to have positive impacts on knowledge, attitudes, policy, and practice. Specifically, theater has proven to be an effective approach for communicating research findings in a way that stimulates thought and discussion on important health-related topics.

Inclusion criteria: 

This review will include scholarly literature on how theater is being used as a knowledge translation approach for health research. The review will not impose any limitations related to demographic variables, health issues, or settings. The review will consider papers using any study design, and will also consider other literature, such as protocols, descriptive papers, unpublished papers, and evaluation reports.

Methods: 

This review will be conducted in accordance with the JBI methodology for scoping reviews. The databases to be searched will include CINAHL (EBSCOhost), Embase, MEDLINE (Ovid), Academic Search Premier (EBSCOhost), and Scopus. Google/Google Scholar and ProQuest Dissertations and Theses will also be searched for unpublished studies and gray literature. All literature identified in the search will be screened by 2 independent reviewers and the results will be presented in a Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram. The data extracted from the included literature will be presented in both tabular and narrative format.

Review registration: 

Open Science Framework https://osf.io/gbcpj

Introduction

Arts-based knowledge translation (KT) methods are increasingly being used in health research. This approach, which includes methods such as performance art (eg, theater), visual art (eg, photography), and literary art (eg, poetry), is being used as an innovative way to communicate research findings to audiences all over the world. This scoping review protocol focuses on the use of theater as it is one of the most prominent arts-based KT approaches in the existing health literature. 1,2

The concept of KT emerged as a response to the evidence-based medicine (EBM) movement in the mid-1980s. EBM refers to “integrating individual clinical expertise with the best available external clinical evidence from systematic research” 3 (p.71) and aims to reduce clinical errors and ensure high-quality care. 4 The EBM movement represented an important advancement in health care; however, there were delays between the production of new evidence and its implementation in practice, meaning EBM was not supported by substantial changes in the health field. 4 KT research attempts to understand why these gaps between research and practice have persisted and how they can be addressed. 4 KT refers to the methods used to effectively communicate knowledge from research and conventional academic platforms—such as journals or conferences—to those who can put it into practice. In other words, KT is to move “beyond the simple dissemination of knowledge into actual use of knowledge.” 5 (p.165) In the context of health research, this primarily includes reaching health professionals, students, patients and their families, and the public. The goal of KT is to make health research more accessible.

Arts-based KT can be defined as the use of any art form for the purpose of disseminating and communicating knowledge. 6 This review will specifically examine the use of theater due to its prominence in the literature and its potential to affect audience perspectives and behaviors. Theater can be an effective KT approach as it can elicit an emotional response from the audience while also inspiring discussion and critical reflection. 7 When used for KT, theatrical performances tend to be developed by interdisciplinary teams who draw on research findings to create their scripts. The resulting performances may have either specific target audiences, such as university nursing classes or health professionals caring for a particular patient population, or be geared toward the public.

For the purposes of this review, the term theater will be used as an umbrella term that covers all types of theatrical presentations, such as plays, musical theater, skits, storytelling, and readers theater. Theater can also be divided into the following 4 categories based on the degree to which it adheres to research data, according to Rossiter et al. 7 : i) non-theatrical performances (adheres closely to data and employs minimal traditional theatrical elements); ii) ethnodramas (theatrical performances that remain faithful to primary research subjects and data); iii) theatrical research-based performances (theater that is informed by the research process but does not strictly adhere to data); and iv) fictional theatrical performances (theater for health education that does not rely on research data). This review will include studies employing the first 3 categories of theater to examine how theater is being used as a KT tool for health research, along with the outcome measures and facilitators/challenges associated with each approach. Studies using fictional theatrical performances will not be included as they do not utilize specific research data, and thus do not constitute a method of disseminating research findings.

Theater as a KT approach for health research has been studied in multiple ways. Our preliminary literature search found that theater has been used to spread awareness about and discuss the experiences of those living with mental illness, 8 incurable diseases, 9,10 cancer, 11 or dementia, 12,13 and to address other forms of illness and experiences with care. 14,15 Portraying the patient experience with illness and care is the primary commonality among the theatrical performances discussed in these publications. In a scoping review that addresses all arts-based health research methods, Boydell et al. 2 reported promising results for employing theater as a KT approach. For example, one included study reported that a theatrical production of research on prostate cancer targeting practitioners had the effect of humanizing the experiences of patients and families and increasing empathy. 11 Another study reported practitioner disclosure of intent to change their practice based on new knowledge and understanding gained after observing a play on living with dementia. 13 For theater targeting patients, studies have identified benefits such as easing the sense of isolation associated with experiencing certain illnesses 16 and changes in audience-reported behavior. 17 These positive impacts demonstrate the utility of conducting a scoping review that synthesizes how theater has been used as a KT method to inform future research and practice. Specifically, the review will provide guidance for researchers seeking to improve the accessibility of their findings and engage wider audiences.

Although there are existing reviews on this topic, they are more general in scope in terms of the KT approaches they examine and more specific in their target populations. To the best of our knowledge, most existing reviews address theater in addition to other arts-based KT tools in health research. 1,2,18–20 One of the most recent reviews discussing theater was published in 2019 and focused on all arts-based KT methods in relation to the themes of aging and dementia. 1 Furthermore, a systematic review published in 2022 examined the use of KT for Indigenous peoples’ health research, and only one of the 51 included studies used theater. 20 One review on readers theater as an approach to education was identified; however, it did not capture other forms of theater and focused on studies from all disciplines, beyond health research. 21 Our scoping review will use a more specific lens than the majority of existing reviews published on this topic by looking only at the use of theater as an arts-based KT approach for health research; however, it will not be limited by population. There will also be no imposed date limitations to capture potential trends across time. Thus, our review will provide a comprehensive synthesis of the existing published and gray literature on this topic for all populations and across all settings and will therefore be useful for informing others’ implementation of theater for knowledge dissemination. A preliminary search of PROSPERO and JBI Evidence Synthesis was conducted and no current or in-progress scoping or systematic reviews on our proposed topic were identified.

Review questions

  • How is theater being used as a KT approach for health research?
  • What outcome measures are reported to evaluate the use of theater as a KT approach for health research?
  • What are the facilitators and challenges associated with the use of theater as a KT approach for health research?

Inclusion criteria

Participants.

This scoping review will consider sources discussing any participants. The review will not focus on any specific health condition or demographic variable, such as age, gender, or ethnicity. While theater has primarily been used for KT targeting patients, their families, and health professionals, performances may also target the public who seek to further their understanding of health topics. Target audience and participant details will not be used to determine study selection.

One main concept is theater. This review will define theater as an activity or presentation that uses drama to entertain an audience, 22 and that is based on health research findings. Articles discussing theater from 3 of the 4 categories proposed by Rossiter et al. 7 , namely, non-theatrical performance, ethnodrama, and theatrical research-based performance, will be considered for inclusion. Articles discussing only fictional theatrical performances will not be eligible for inclusion, as such performances are not based on or seek to disseminate specific health research data. 7 To be eligible for inclusion, a paper must clearly identify the source of the material used in the theatrical performance. We will accept other theater-related terms used in the same way in the literature, such as drama, performance, performing arts, actor, narrative, storytelling, and edutainment. Included articles will need to discuss the use of theater (eg, description of theater method, content focus, target audience, theater setting, type of theater method used, source of theater content, and evaluation method if applicable) as a KT method for health research.

An additional main concept is KT. KT has been defined by the Canadian Institutes of Health Research as “a dynamic and iterative process that includes synthesis, dissemination, exchange and ethically-sound application of knowledge to improve the health of Canadians, provide more effective health services and products and strengthen the health care system.” 23 (para.4) This review will define KT in this way, but will not be limited to the Canadian context. We will also accept other terms related to KT, such as knowledge transfer, knowledge dissemination, knowledge mobilization , and knowledge representation , as well as implementation science and research utilization . Included articles will need to discuss KT for health research.

A third main concept is health research. Health research may involve biomedical, clinical, health services, and population health research. 24 Theatrical performances must be based on and seek to disseminate specific health research data to be considered for inclusion.

Secondary concepts include outcome measures, facilitators, and challenges related to the use of theater as a KT method for health research. Outcome measures will be defined as the measures reported to evaluate theater as a KT method (eg, self-reported changes in knowledge, attitudes, and behavior 8,10 ). Facilitators refer to factors that enable the use of theater (eg, working with theater experts, time and funding flexibility 2 ), while challenges refer to barriers that prevent or hinder the use of theater as a KT tool (eg, ethical concerns, 2 unfamiliarity with arts-based KT methods among health researchers 25 ). Papers do not need to report on outcome measures, facilitators, and/or challenges to be considered for inclusion. Sources will be included if they describe the main concepts (ie, the use of theater as a KT approach for health research).

This review will consider the use of theater as a KT tool in all settings. Theatrical performances can be based on health research from any setting, such as hospitals, clinics, or community/social care settings, and can be performed in any setting, such as theater venues, conferences, classrooms, or workshops. This review is not limited to in-person theater. Thus, we will also consider virtual forms of theater for inclusion, such as pre-recorded videos or live performances conducted over virtual platforms. There will be no geographical limitations on included sources.

Types of sources

This review will consider published papers using any study design, including quantitative, qualitative, and mixed methods studies. It will also consider protocols, descriptive papers, unpublished papers, and evaluation reports. Systematic, scoping, and literature reviews will not be included; however, those that meet the inclusion criteria will be hand-searched for relevant articles. Sources published in English and French will be included given the linguistic expertise of the team. Databases will be searched with no imposed date limitations to allow for the examination of potential trends in the use of theater as a KT approach across time.

This review will be conducted in accordance with the JBI methodology for scoping reviews. 26

Search strategy

The search strategy will aim to locate both published and unpublished literature. It was developed by a JBI-trained librarian (AG) and takes a multi-step approach. The first step was an exploratory search performed in MEDLINE (Ovid), CINAHL (EBSCOhost), and Embase to locate records relevant to the topic to analyze the words contained in the titles, abstracts, and subject descriptors. The terms identified in this step were tested in a variety of combinations to develop a full search strategy for MEDLINE (Ovid). Only terms producing unique results were included in the final strategy. For example, the terms “performance” and “drama” were not necessary to include after testing, and “actor” was kept but not “actress.” A draft search strategy was then prepared and reviewed by a second librarian (JP) using the Peer Review of Electronic Search Strategies (PRESS) guidelines. 27 Recommended adjustments were made, and the search strategy was finalized (Appendix I).

The search strategy, including all identified keywords and index terms, will be adapted and tested in the following 5 databases: CINAHL with Full Text (EBSCOhost), Embase, MEDLINE (Ovid), Academic Search Premier (EBSCOhost), and Scopus. Databases will be searched from inception until the present, with no limits applied to the results. Full search strategies for all databases will be included in the scoping review.

Sources of unpublished studies and gray literature will also be searched, including Google, Google Scholar, and ProQuest Dissertations and Theses. For Google and Google Scholar, sources will be screened until the point of saturation (until 2 pages are reviewed without opening a link). The reference lists of all sources meeting the inclusion criteria will be manually back-searched for additional studies, and Google Scholar and Scopus will be used for forward citation tracking to identify further studies.

Source selection

Following the search, all identified citations will be collated and uploaded into Covidence (Veritas Health Innovation, Melbourne, Australia) where duplicates will be removed. The duplicates will be reviewed manually to ensure accuracy. Following a pilot test, titles and abstracts will be screened by 2 independent reviewers for assessment against the inclusion criteria. The full texts of potentially relevant studies will be retrieved and their citation details imported into the JBI System for the Unified Management, Assessment and Review of Information (JBI SUMARI; JBI, Adelaide, Australia). 28 The full text of selected citations will be assessed against the inclusion criteria by 2 independent reviewers. Reasons for exclusion of full-text sources will be recorded and reported in the scoping review. Any disagreements that arise between the reviewers at any stage of the study selection process will be resolved through discussion or with a third reviewer. The results of the search and the study inclusion process will be reported in full in the scoping review and presented in a Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram. 29

Data extraction

Data will be extracted from all sources included in the scoping review by 2 independent reviewers using a data extraction tool (Appendix II). The information collected will include title, author(s) and year of publication, country of origin, type of source/study design, study aim, theater setting, type of theater method used, description of theater method, content focus, source of theater content, target audience, evaluation method, outcome measures, facilitators, and challenges. The draft data extraction tool will be piloted on 3 sources and modified as necessary. All completed sources will be reviewed if the data extraction tool is revised. Modifications will be detailed in the scoping review. Any disagreements that arise between the reviewers will be resolved through discussion or with a third reviewer. Authors of literature sources will be contacted up to 2 times to request missing or additional data, where required.

Data analysis and presentation

The results of the search will be synthesized, summarized, and reported in full in the final scoping review and presented following the PRISMA extension for Scoping Reviews guidelines. 30 Content analysis will be used to analyze the data. The extracted data, including the information listed in the data extraction tool (Appendix II), will be presented in tabular format accompanied by a narrative summary to describe how the results relate to the review objective and questions. The results will be presented in 2 tables corresponding to the review questions: Table 1 will include source and intervention characteristics to detail how theater has been used as a KT method and what outcome measures have been employed to evaluate its effectiveness, while Table 2 will present the facilitators and challenges. Where possible, if a study includes more than one type of theater, findings will be reported separately for each theater method used. Other forms of visual data presentation will also be explored after data extraction and analysis have been completed.

Author contributions

AL and SD developed the initial idea for this scoping review and hired PJ as a summer student to lead the project. PJ worked collaboratively with AG to conduct a preliminary literature search and develop a list of key terms for the search strategy. AG developed the search strategy. PJ developed the inclusion criteria, data extraction, and analysis plans, and wrote the protocol under the supervision of AL and SD. PJ will be performing the title and abstract screening, full-text screening, data extraction, and data analysis with the support of 2 other summer students as the second independent reviewers. PJ will prepare the final scoping review manuscript, with the support of AL and SD.

Acknowledgments

Jackie Phinney, instruction/liaison librarian from Dalhousie Medicine New Brunswick, for peer-reviewing the search strategy; Amy Reid, for supporting the early stages of the development of this protocol.

This project received funding from the New Brunswick Health Research Foundation. The funder did not have any input into this protocol.

Appendix I: Search strategy

Ovid medline(r) and epub ahead of print, in-process, in-data-review & other non-indexed citations, daily and versions(r) 1946 to august 31, 2023.

Search conducted: September 1, 2023

# Query Records retrieved
1 exp Art/ 38,252
2 exp Drama/ 2088
3 exp Psychodrama/ 3118
4 exp Narration/ 10,405
5 theat*.ab,kf,kw,ti. 16,087
6 storytell*.ab,kf,kw,ti. 2357
7 ((arts or drama) adj1 based).ab,kf,kw,ti. 745
8 actor.ab,kf,kw,ti. 7101
9 musical*.ab,kf,kw,ti. 9397
10 edutainment.ab,kf,kw,ti. 120
11 (perform* adj3 arts).ab,kf,kw,ti. 525
12 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 87,366
13 exp Translational Medical Research/ 12,926
14 ((Knowledge or research) adj3 (implement* or translat* or transfer* or creat* or communication or product* or shar* or action or transform*)).ab,kf,kw,ti. 90,226
15 disseminat*.ab,kf,kw,ti. 164,405
16 Information Dissemination/ 19,334
17 knowing.ab,kf,kw,ti. 27,040
18 13 or 14 or 15 or 16 or 17 299,681
19 12 and 18 1764

Appendix II: Draft data extraction instrument

Draft data Study A Study B Study C
Title
Author/Publication year
Country of origin
Type of source/study design
Aim
Theater setting
Type of theater method used (eg, ethnodrama, non-theatrical performance)
Description of theater method
Content focus (ie, what aspect of health is addressed)
Source of content (ie, what research is informing the theater)
Target audience
Evaluation method (if applicable)
Outcome measures (if applicable)
Facilitators (if applicable)
Challenges (if applicable)
  • Cited Here |
  • Google Scholar

health research; knowledge translation; protocol; scoping review; theater

  • + Favorites
  • View in Gallery
  • Search Menu
  • Sign in through your institution
  • Advance Articles
  • Editor's Choice
  • Supplements
  • E-Collections
  • Virtual Roundtables
  • Author Videos
  • Author Guidelines
  • Submission Site
  • Open Access Options
  • About The European Journal of Public Health
  • About the European Public Health Association
  • Editorial Board
  • Advertising and Corporate Services
  • Journals Career Network
  • Self-Archiving Policy
  • Terms and Conditions
  • Explore Publishing with EJPH
  • Journals on Oxford Academic
  • Books on Oxford Academic

Issue Cover

Article Contents

Introduction, conclusions, recommendations and limitations of the study, supplementary data, data availability.

  • < Previous

A systematic review of literature examining the application of a social model of health and wellbeing

  • Article contents
  • Figures & tables
  • Supplementary Data

Rachel Rahman, Caitlin Reid, Philip Kloer, Anna Henchie, Andrew Thomas, Reyer Zwiggelaar, A systematic review of literature examining the application of a social model of health and wellbeing, European Journal of Public Health , Volume 34, Issue 3, June 2024, Pages 467–472, https://doi.org/10.1093/eurpub/ckae008

  • Permissions Icon Permissions

Following years of sustained pressure on the UK health service, there is recognition amongst health professionals and stakeholders that current models of healthcare are likely to be inadequate going forward. Therefore, a fundamental review of existing social models of healthcare is needed to ascertain current thinking in this area, and whether there is a need to change perspective on current thinking.

Through a systematic research review, this paper seeks to address how previous literature has conceptualized a social model of healthcare and, how implementation of the models has been evaluated. Analysis and data were extracted from 222 publications and explored the country of origin, methodological approach, and the health and social care contexts which they were set.

The publications predominantly drawn from the USA, UK, Australia, Canada and Europe identified five themes namely: the lack of a clear and unified definition of a social model of health and wellbeing; the need to understand context; the need for cultural change; improved integration and collaboration towards a holistic and person-centred approach; measuring and evaluating the performance of a social model of health.

The review identified a need for a clear definition of a social model of health and wellbeing. Furthermore, consideration is needed on how a model integrates with current models and whether it will act as a descriptive framework or, will be developed into an operational model. The review highlights the importance of engagement with users and partner organizations in the co-creation of a model of healthcare.

Following years of sustained and increasing pressure brought about through inadequate planning and chronic under-resourcing including the unprecedented challenges of the Covid-19 pandemic, the UK NHS is at crisis point. 1 The incidents of chronic disease continue to increase alongside an ageing population who have more complex health and wellbeing needs, whilst recruitment and retention of staff continue to be insufficient to meet these increased demands. 1 Furthermore, the Covid-19 pandemic has only served to exacerbate pressures, resulting in delays in; patient presentation, 2 poor public mental health 3 strain and burnout amongst workforce. 4 However, preceding the pandemic there was already recognition of a need for a change to the current biomedical model of care to better prevent and treat the needs of the population. 5

While it is recognized that demands on the healthcare system are increasing rapidly, the biomedical model used to deal with these issues (which is the current model of healthcare provision in the UK) has largely remained unchanged over the years. The biomedical model takes the perspective that ill-health stems from biological factors and operates on the theory that good health and wellbeing is merely the absence of illness. Application of the model therefore focuses treatment on the management of symptoms and cure of disease from a biological perspective. This suggests that the biomedical approach is mainly reactive in nature and whilst rapid advancements in technology such as diagnostics and robotics have significantly improved patient outcomes and identification of early onset of disease, it does not fully extend into managing the social determinants that can play an important role in the prevention of disease. Therefore, despite its contribution in advancing many areas of biological and health research, the biomedical model has come under increasing scrutiny. 6 This is in part due to the growing recognition of the impact of those wider social determinants on health, ill-health and wellbeing including physical, mental and social wellbeing which moves the focus beyond individual physical abilities or dysfunction. 7–9 In order to address these determinants, action needs to be taken through developing policies in a range of non-medical areas such as social, economic and environment so that they regulate the commercial and corporate determinants. In this sense, we can quickly see that the traditional biological model rapidly becomes inadequate. With the current model, health care and clinical staff can do little to affect these determinants and as such can do little to assist the individual patient or society. The efficiency and effectiveness of clinical work will undoubtedly improve if staff have the ability to observe and understand the wider social determinants and consequences of the individual patients’ condition. Therefore, in order to provide a basis for understanding the determinants of disease and arriving at rational treatments and patterns of health care, a medical model must also take into account the patient, the social context in which they live, and a system devised by society to deal with the disruptive effects of illness, that is, the physician’s role and that of the health care system. Models such as Engel’s biopsychosocial model, 9 , 10 the social model of disability, social–ecological models of health 10 , 11 including the World Health Organisation’s framework for action on social determinants of health 8 , 9 are all proposed as attempting to integrate these wider social determinants.

However, the ability of health systems to effectively transition away from a dominant biomedical model to the adoption of a social model of health and care have yet to be fully developed. Responsibility for taking action on these social determinants will need to come from other sectors and policy areas and so future health policy will need to evolve into a more comprehensive and holistic social model of health and wellbeing. Wales’ flagship Wellbeing of Future Generations Act 12 for instance outlines ways of working towards sustainable development and includes the need to collaborate with society and communities in developing and achieving wellbeing goals. However, developing and implementing an effective operational model that allows multi-stakeholder integration will prove far more difficult to achieve than creating the polices. Furthermore, if the implementation of a robust model of social health is achievable, it’s efficiency, effectiveness and ability to deliver has yet to be proven. Therefore, any future model will need to extend past its conceptual development and provide an ability to manage the complex interactions that will exist between the stakeholders and polices.

Therefore, the use of the term ‘model’ poses its own challenges and debates. Different disciplines attribute differing parameters to what constitutes a model and this in turn may influence the interpretations or expectations surrounding what a model should comprise of or deliver. 13 According to numerous authors, a model has no ontological category and as such anything from physical entities, theoretical concepts, descriptive frameworks or equations can feasibly be considered a model. 14 It appears therefore, that much discussion has focussed on the move towards a ‘descriptive’ Social Model of Health and Wellbeing in an attempt to view health more holistically and identify a wider range of determinants that can impact on the health of the population. However, in defining an operational social model of health that can facilitate organizational change, there may be a need to consider a more systems- or process-based approach.

As a result, this review seeks to systematically explore the academic literature in order to better understand how a social model of health and wellbeing is conceptualized, implemented, operationalized and evaluated in health and social care.

The review seeks to address the research questions:

How is ‘a social model of health and wellbeing’ conceptualized?

How have social models of health and wellbeing been implemented and evaluated?

A systematic search of the literature was carried out between 6 January 2022 and 20 January 2022. Using the search terms shown in table 1 , a systematic search was carried out using online databases PsycINFO, ASSIA, IBSS, Medline, Web of Science, CINHAL and SCOPUS. English language and peer-reviewed journals were selected as limiters.

Search terms

Selection and extraction criteria

The search strategy considered research that explicitly included, framed, or adopted a ‘social model of health and wellbeing’. Each paper was checked for relevance and screened. The authors reviewed the literature using the Preferred Reporting Items for Systematic Reviews and Meta Analysis (PRISMA) method using the updated guidelines from 2020. 15   Figure 1 represents the process followed.

PRISMA flow chart.

PRISMA flow chart.

Data extraction and analysis

A systematic search of the literature identified 222 eligible papers for inclusion in the final review. A data extraction table was used to extract information regarding location of the research, type of paper (e.g. review, empirical), service of interest and key findings. Quantitative studies were explored with a view to conducting a quantitative meta-analysis; however, given the disparate nature of the outcome measures, and research designs, this was deemed unfeasible. All included papers were coded using NVivo software with the identified research questions in mind, and re-analysed using Thematic Analysis 16 to explore common themes of relevance.

The majority of papers were from the USA (34%), with the UK (28%), Australia (16%), Canada (6%) and wider Europe (10%) also contributing to the field. The ‘other’ category (6%) was made up of single papers from other countries. Papers ranged in date from 1983 to 2021 with no noticeable temporal patterns in country of origin, health context or model definition. However, the volume of papers published relating to the social model for healthcare in each decade increased significantly, thus suggesting the increasing research interest towards the social model of healthcare. Table 2 shows the number of publications per decade that were identified from this study.

Publications identifying social models of healthcare.

Year of publicationNumber of publications identifying social models of healthcare
1980s5
1990s11
200070
201087
2020–2249
Year of publicationNumber of publications identifying social models of healthcare
1980s5
1990s11
200070
201087
2020–2249

Most of the papers were narrative reviews ( n  = 90) with a smaller number of systematic reviews ( n  = 9) and empirical research studies including qualitative ( n  = 47), quantitative ( n  = 39) and mixed methods ( n  = 14) research. The remaining papers ( n  = 23) comprised small samples of, for example, clinical commentaries, cost effectiveness analysis, discussion papers and impact assessment development papers. The qualitative meta-analysis identified five overarching themes in relation to the research questions, some with underlying sub-themes, which are outlined in figure 2 .

Overview of meta-synthesis themes.

Overview of meta-synthesis themes.

The lack of a clear and unified definition of a social model of health and wellbeing

There was common recognition amongst the papers that a key aim of applying a social model of health and wellbeing was to better address the social determinants of health. Papers identified and reviewed relevant frameworks and models, which they later used to conceptualize or frame their approach when attempting to apply a social model of health. Amongst the most commonly referenced was the WHO’s framework. 17 Engel’s biopsychosocial model 9 which was referred to as a seminal framework by many of the researchers. However, once criticism of the biopsychosocial model was its inability to fully address social needs. As a result, a number of papers reported the development of new or enhanced models that used the biopsychosocial model as their underpinning ‘social model’ 18 , 19 but then extended their work by including a wider set of social elements in their resulting models. 20 The Social ecological model, 11 the Society-Behaviour-Biology Nexus, 21 and the Environmental Affordances Model are such examples. 22 Further examples of ‘Social Models’ included the Model of Social Determinants of Health 23 which framed specific determinants of interest (namely social gradient, stress, early life, social exclusion, work, unemployment, social support, addiction, food and transport). Similarly, Dahlgren and Whitehead’s ‘social model’ 10 illustrates social determinants via a range of influential factors from the individual to the wider cultural and socioeconomic influences. However, none of these papers formally developed a working ‘definition’ of a social model of health and wellbeing, instead applying guiding principles and philosophies associated with a social model to their discussions or interventions. 24 , 25

The need to understand context

Numerous articles highlight that in order to move towards a social model of health and wellbeing, it is important to understand the context of the environment in which the model will need to operate. This includes balancing the needs of the individual with the resulting model to have been co-created, developed and implemented within the community whilst ensuring that the complexity of interaction between the social determinants of health and their influence on health and wellbeing outcomes are delivered effectively and efficiently.

The literature identified the complex multi-disciplinary nature of a variety of conditions or situations involving medical care. These included issues such as, but not exclusively, chronic pain, 26 cancer, 27 older adult care 28 and dementia, 29 thus indicating the complex arrangement of medical issues that a model will need to address and, where many authors acknowledged that the frequently used biomedical models failed to fully capture the holistic nature and need of patients. Papers outlined some of the key social determinants of health affecting the specific population of interest in their own context, highlighting the interactions between wider socioeconomic and cultural factors such as poverty, housing, isolation and transport and health and wellbeing outcomes. Interventions that had successfully addressed individual needs and successful embedded services in communities reported improved outcomes for end users and staff in the form of empowerment, agency, education and belonging. 30 There was also recognition that the transition to more community-based care could be challenging for health and social care providers who were having to work outside of their traditional models of care and accept a certain level of risk.

The need for cultural change

A number of papers referred to the need for a ‘culture change’ or ‘cultural shift’ in order to move towards a social model of health and wellbeing. Papers identified how ‘culture change models’ were implemented as a way of adapting to a social model. It was recognized that for culture change models to be effective, staff and the general public needed to be fully engaged with the entire move towards a social model, informing and shaping the mechanisms for the cultural shift as well as the application of the model itself.

Integration and collaboration towards a holistic and person-centred approach

The importance of integration and collaboration between health professionals, (which includes public, private and third sector organizations), services users and patients were emphasized in the ambition to achieve best practice when applying a social model of health and wellbeing. Papers identified the reported benefits of improved collaboration between, and integration of services which included improved continuity of care throughout complex pathways, 31 improved return to home or other setting on discharge, 25 and social connectedness. 32 Numerous papers discussed the importance of multi-disciplinary teams who were able to support individuals beyond the medicalized model.

A number of papers suggested specific professional roles or structures that would be ideal to act as champions or integrators of collaborative services and communities. 25 , 33 These could act as a link between secondary, primary and community level care helping to identify patient needs and supporting the integration of relevant services.

Measuring and evaluating a social model of health

Individual papers applying and evaluating interventions based on a social model used a variety of methods to evaluate success. Amongst these, some of the most common outcome measures included; general self-report measures of outcomes such as mental health and perceptions of safety, 34 wellbeing, 35 life satisfaction and health social networks and support 19 Some included condition specific self-report outcomes relevant to the condition in question (e.g. pregnancy, anxiety) and pain inventories. 36 Other papers considered the in-depth experiences of users or service implementers through qualitative techniques such as in-person interviews. 37 , 38

However, the complexity of developing effective methods to evaluate social models of health were recognized. The need to consider the complex interactions between social determinants, and health, wellbeing, economic and societal outcomes posed particular challenges in developing consistency across evaluations that would enable a conclusive evaluation of the benefits of social models to wider health systems and societal health. Some criticized the over-reliance of quantitative and evidence-based practice methods of evaluation highlighting how these could fail to fully capture the complexity of human behaviour and the manner in which their lives could be affected.

The aim of this systematic review was to better understand how a social model of health and wellbeing is conceptualized, implemented and evaluated in health and social care. The review sought to address the research questions identified in the ‘Introduction’ section of this paper.

With regards to the conceptualization of a social model of health and wellbeing, analysis of the literature suggests that whilst the ethos, values and aspirations of achieving a unified model appears to have consensus. However, a fundamental weakness exists in that there is no single unified definition or operational model of a social model of health and wellbeing applied to the health and social care sector. The decision about how best to conceptualize a ‘social model’ is important both in terms of its operational value but also the implication of the associated semantics. However, without a single or unified definition then implementation or further, operationalization of any model will be almost impossible to develop. Furthermore, use of the term ‘social model’ arguably loses site of the biological factors that are clearly relevant in many elements of clinical medicine. Furthermore, there is no clarification in the literature about what would ‘not’ be considered a social model of health and wellbeing, potentially leading to confusion within health and social care sectors when addressing their wider social remit. This raises questions and requires decisions about whether implementation of a social model of health and wellbeing will need to work alongside or replace the existing biomedical approach.

Authors have advocated that a social model provides a way of ‘thinking’ or articulating an organization’s values and culture. 24 Common elements of the values associated with a social model amongst the papers reviewed included recognition and awareness of the social determinants of health, increased focus on preventative rather than reactive care, and similarly the importance of quality of ‘life’ as opposed to a focus on quality of ‘care’. However, whilst this approach enables individual services to consider how well their own practices align with a social model, the authors suggest that this does not provide large organizations such as the NHS, with multifaceted services and complex internal and external connections and networks, sufficient guidance to enable large scale evaluation or transition to a widespread operational model of a social model of health and wellbeing. This raises questions about what the model should be: whether its function is to support communication of a complex ethos to encourage reflection and engagement of its staff and end users, or to develop the current illustrative framework into a predictive model that can be utilized as an evaluative tool to inform and measure the success of widespread systems change.

Regarding the potential implementation of a future social model of health and wellbeing, none of the papers evaluated the complex widespread organizational implementation of a social model, instead focusing on specific organizational contexts of services such as long-term care in care homes, etc. Despite this, common elements of successful implementation did emerge from the synthesis. This included the need to wholeheartedly engage and be inclusive of end users in policy and practice change to fully understand the complexity of their social worlds and to ensure that changes to practice and policy were ‘developed with’, as opposed to ‘create for’, the wider public. This also involved ensuring that health, social care and wider multi-disciplinary teams were actively included in the process of culture change from an early stage.

Implications for future research

The analysis identifies that a significant change of mindset and removal of perceived and actual hierarchical structures (that are historically embedded in health and social care structures) amongst both staff and public is needed although, eradicating socially embedded hierarchies will pose significant challenges in practice. Furthermore, the study revealed that many of the models proposed were conceptually underdeveloped and lacked the capability to be operationalized which in turn compromised their ability to be empirically tested. Therefore, in order that a future ‘implementable and operational’ model of social care and wellbeing can be created, further research into organizational behaviours, organizational learning and stakeholder theory (amongst others) applied to the social care and health environment is needed.

Towards defining a social model of health and wellbeing

In attempting to conceptualize a definition for a social model of health and wellbeing, it is important to note that the model needs to be sufficiently broad in scope in order to include the prevailing biomedical while also including the need to draw in the social determinants that provide a view and future trajectory towards social health and wellbeing. Therefore, the authors suggest that the ‘preventative’ approach brought by the improvements in the social health determinants (social, cultural, political, environmental ) need to be balanced effectively with the ‘remedial/preventative’ focus of the biomedical model (and the associated advancements in diagnostics, technology, vaccines, etc), ensuring that a future model drives cultural change; improved integration and collaboration towards a holistic and person-centred approach whilst ensuring engagement with citizens, users, multi-disciplinary teams and partner organizations to ensure that transition towards a social model of health and wellbeing is undertaken.

Through a comprehensive literature analysis, this paper has provided evidence that advocates a move towards a social model of health and wellbeing. However, the study has predominantly considered mainly literature from the USA, UK, Canada and Australia and therefore is limited in scope at this stage. The authors are aware of the need to consider research undertaken in non-English speaking countries where a considerable body of knowledge also exists and which will add to further discussion about how that work dovetails into this body of literature and, how it aligns with the biomedical perspective. There is a need for complex organizations such as the NHS and allied organizations to agree a working definition of their model of health and wellbeing, whether that be a social model of health and wellbeing, a biopsychosocial model, a combined model, or indeed a new or revised perspective. 39

One limitation seen of the models within this study is that at a systems level, most models were conceptual models that characterized current systems or conditions and interventions to the current system that result in localized improvements in systems’ performance. However, for meaningful change to occur, a ‘future state’ model may need to focus on a behavioural systems approach allowing modelling of the complete system to take place in order to understand how the elements within the model 40 behave under different external conditions and how these behaviours affect overall system performance.

Furthermore, considerable work will be required to engage on a more equal footing with the public, health and social care staff as well as wider supporting organizations in developing workable principles and processes that fully embrace the equality of a social model and challenging the ‘power’ imbalances of the current biomedical model.

Supplementary data are available at EURPUB online.

This research was funded/commissioned by Hywel Dda University Health Board. The research was funded in two phases.

Conflicts of interest: None declared.

The datasets generated and/or analysed during the current study are available in the Data Archive at Aberystwyth University and have been included in the supplementary file attached to this submission. A full table of references for studies included in the review will be provided as a supplementary document. The references below refer to citations in the report which are in addition to the included studies of the synthesis.

The review identified five themes namely: the lack of a clear definition of a social model of health and wellbeing; the need to understand context; the need for cultural change; improved integration and collaboration towards a holistic and person-centred approach; measuring and evaluating the performance of a social model of health.

The review identified a need for organizations to decide on how a social model is to be defined especially at the interfaces between partner organizations and communities.

The implications for public policy in this paper highlights the importance of engagement with citizens, users, multi-disciplinary teams and partner organizations to ensure that transition towards a social model of health and wellbeing is undertaken with holistic needs as a central value.

British Medical Association (ND). An NHS under pressure. Accessed via An NHS under pressure (bma.org.uk). https://www.bma.org.uk/advice-and-support/nhs-delivery-and-workforce/pressures/an-nhs-under-pressure (26 June 2023, date last accessed).

Nuffield Trust ( 2022 ). NHS performance summary. Accessed via NHS performance summary | The Nuffield Trust. https://www.nuffieldtrust.org.uk/news-item/nhs-performance-summary-january-february-2022 (26 June 2023, date last accessed).

NHS confederation , ( 2022 ) Running hot: The impact of the Covid-19 pandemic on mental health services. Accessed via Running hot | NHS Confederation. https://www.nhsconfed.org/publications/running-hot (26 June 2023, date last accessed).

Gemine R , Davies GR , Tarrant S , et al.    Factors associated with work-related burnout in NHS staff during COVID-19: a cross-sectional mixed methods study . BMJ Open   2021 ; 11 : e042591 .

Google Scholar

Iacobucci G.   Medical models of care needs updating say experts . BMJ   2018 ; 360 : K1034 .

Podgorski CA , Anderson SD , Parmar J.   A biopsychosocial-ecological framework for family-framed dementia care . Front Psychiatry   2021 ; 12 : 744806 .

Marmot M.   Social determinants of health inequalities . Lancet   2005 ; 365 : 1099 – 104 .

World Health Organisation ( 1946 ) Preamble to the Constitution of the World Health Organization as adopted by the International Health Conference . New York: World Health Organization, 19–22 June, 1946.

World Health Organisation ( 2010 ). A conceptual framework for action on the social determinants of health. Accessed via A Conceptual Framework for Action on the Social Determinants of Health (who.int) (26 June 2023, date last accessed).

Engel G.   The need for a new medical model: a challenge for biomedicine . Science   1977 ; 196 : 129 – 36 .

Dahlgren G , Whitehead M. ( 2006 ). European strategies for tackling social inequities in health: Levelling up part 2. Studies on Social Economic Determinants of Population Health, 1–105. Available at: http://www.euro.who.int/__data/assets/pdf_file/0018/103824/E89384.pdf (12 October 2023, date last accessed).

McLeroy KR , Bibeau D , Steckler A , Glanz K.   An ecological perspective on health promotion programs . Health Educ Q   1988 ; 15 : 351 – 77 .

Welsh Government , Wellbeing of Future Generations Act 2015. Available at: https://www.gov.wales/sites/default/files/publications/2021-10/well-being-future-generations-wales-act-2015-the-essentials-2021.pdf (12 October 2023, date last accessed).

Stanford Encyclopaedia of Philosophy ( 2006 , 2020). Models in Science. Available at: https://plato.stanford.edu/entries/models-science/ (26 June 2023, date last accessed).

Page MJ , McKenzie JE , Bossuyt PM , et al.    The PRISMA 2020 statement: an updated guideline for reporting systematic reviews . BMJ   2021 ; 372 : n71 .

Braun V , Clarke V.   Using thematic analysis in psychology . Qual Res Psychol   2006 ; 3 : 77 – 101 .

Thomas J , Harden A.   Methods for the thematic synthesis of qualitative research in systematic reviews . BMC Med Res Methodol   2008 ; 8 : 45 .

Solar O , Irwin A. ( 2016 ) “A conceptual framework for action on the social determinants of health. Geneva, Switzerland: WHO; 2010”. (Social determinants of health discussion paper 2 (policy and practice)). Available at: http://www.who.int/sdhconference/resources/ConceptualframeworkforactiononSDH_eng.pdf (12 October 2023, date last accessed).

Farre A , Rapley T.   The new old (and old new) medical model: four decades navigating the biomedical and psychosocial understandings of health and illness . Healthcare   2017 ; 5 : 88 .

Smedema SM.   Evaluation of a concentric biopsychosocial model of well-being in persons with spinal cord injuries . Rehabil Psychol   2017 ; 62 : 186 – 97 . PMID: 28569533.

Robles B , Kuo T , Thomas Tobin CS.   What are the relationships between psychosocial community characteristics and dietary behaviors in a racially/ethnically diverse urban population in Los Angeles county? . Int J Environ Res Public Health   2021 ; 18 : 9868 .

Glass TA , McAtee MJ.   Behavioral science at the crossroads in public health: extending horizons, envisioning the future . Soc Sci Med   2006 ; 62 : 1650 – 71 .

Mezuk B , Abdou CM , Hudson D , et al.    "White Box" epidemiology and the social neuroscience of health behaviors: the environmental affordances model . Soc Ment Health   2013 ; 3 : 10.1177/2156869313480892

Wilkinson RG , Marmot M , editors. Social Determinants of Health: The Solid Facts . Copenhagen, Denmark: World Health Organization , 2003 .

Google Preview

Mannion R , Davies H.   Understanding organisational culture for healthcare quality improvement . BMJ   2018 ; 363 : k4907 .

Blount A , Bayona J.   Toward a system of integrated primary care . Fam Syst Health   1994 ; 12 : 171 – 82 .

Berger MY , Gieteling MJ , Benninga MA.   Chronic abdominal pain in children . BMJ   2007 ; 334 : 997 – 1002 . PMID: 17494020; PMCID: PMC1867894.

Berríos-Rivera R , Rivero-Vergne A , Romero I.   The pediatric cancer hospitalization experience: reality co-constructed . J Pediatr Oncol Nurs   2008 ; 25 : 340 – 53 .

Doty MM , Koren MJ , Sturla EL. ( 2008 ). Culture change in nursing homes: How far have we come? Findings from the Commonwealth Fund 2007 National Survey. The Commonwealth Fund, 91. Available at: http://www.commonwealthfund.org/Content/Publications/Fund-Reports/2008/May/Culture-Change-in-NursingHomes-How-Far-Have-We-Come-Findings-FromThe-Commonwealth-Fund-2007-Nati.aspx (16 October 2023, date last accessed).

Robinson L , Tang E , Taylor J.   Dementia: timely diagnosis and early intervention . BMJ   2015 ; 350 : h3029 .

Baxter S , Johnson M , Chambers D , et al.    Understanding new models of integrated care in developed countries: a systematic review . Health Serv Deliv Res   2018 ; 6 : 1 .

Seys D , Panella M , VanZelm R , et al.    Care pathways are complex interventions in complex systems: new European Pathway Association framework . Int J Care Coord   2019 ; 22 : 5 – 9 .

Agarwal G , Brydges M.   Effects of a community health promotion program on social factors in a vulnerable older adult population residing in social housing” . BMC Geriatr   2018 ; 18 : 95 . PMID: 29661136; PMCID: PMC5902999.

Franklin CM , Bernhardt JM , Lopez RP , et al.    Interprofessional teamwork and collaboration between community health workers and healthcare teams: an integrative review . Health Serv Res Manag Epidemiol   2015 ; 2 : 2333392815573312 . PMID: 28462254; PMCID: PMC5266454.

Gagné T , Henderson C , McMunn A.   Is the self-reporting of mental health problems sensitive to public stigma towards mental illness? A comparison of time trends across English regions (2009-19) . Soc Psychiatry Psychiatr Epidemiol   2023 ; 58 : 671 – 80 . PMID: 36473961; PMCID: PMC9735159.

Geyh S , Nick E , Stirnimann D , et al.    Biopsychosocial outcomes in individuals with and without spinal cord injury: a Swiss comparative study . Spinal Cord   2012 ; 50 : 614 – 22 .

Davies C , Knuiman M , Rosenberg M.   The art of being mentally healthy: a study to quantify the relationship between recreational arts engagement and mental well-being in the general population . BMC Public Health   2016 ; 16 : 15 . PMID: 26733272; PMCID: PMC4702355.

Duberstein Z , Brunner J , Panisch L , et al.    The biopsychosocial model and perinatal health care: determinants of perinatal care in a community sample . Front Psychiatry   2021 ; 12 : 746803 .

The King’s Fund , ( 2021 ). Health inequalities in a nutshell. Accessed via Health inequalities in a nutshell | The King's Fund (kingsfund.org.uk) https://www.kingsfund.org.uk/projects/nhs-in-a-nutshell/health-inequalities (23 October 2023, date last accessed)

Blount A.   Integrated primary care: organizing the evidence . Fam Syst Health   2003 ; 21 : 121 – 33 .

Month: Total Views:
January 2024 176
February 2024 297
March 2024 435
April 2024 655
May 2024 537
June 2024 360

Email alerts

Citing articles via.

  • Contact EUPHA
  • Recommend to your Library

Affiliations

  • Online ISSN 1464-360X
  • Print ISSN 1101-1262
  • Copyright © 2024 European Public Health Association
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • For authors
  • Browse by collection
  • BMJ Journals

You are here

  • Volume 14, Issue 6
  • Prevalence of mental, behavioural or neurodevelopmental disorders according to the International Classification of Diseases 11: a scoping review protocol
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0009-0004-0825-2146 Kiana Nafarieh 1 ,
  • http://orcid.org/0000-0002-1191-9050 Sophia Krüger 1 ,
  • http://orcid.org/0000-0002-3396-5138 Karl Deutscher 1 ,
  • http://orcid.org/0000-0001-9598-0029 Stefanie Schreiter 1 ,
  • Andreas Jung 2 ,
  • http://orcid.org/0000-0002-5383-5365 Seena Fazel 3 ,
  • http://orcid.org/0000-0001-5405-9065 Andreas Heinz 1 ,
  • http://orcid.org/0009-0001-7150-6071 Stefan Gutwinski 1
  • 1 Department of Psychiatry and Psychotherapy , Charité Universitätsmedizin , Berlin , Germany
  • 2 EX-IN Hessen e.V , Marburg , Germany
  • 3 Department of Psychiatry , University of Oxford , Oxford , UK
  • Correspondence to Dr Stefan Gutwinski; stefan.gutwinski{at}charite.de

Introduction Due to a change in diagnostic prerequisites and the inclusion of novel diagnostic entities, the implementation of the 11th revision of the International Classification of Diseases (ICD-11) will presumably change prevalence rates of specific mental, behavioural or neurodevelopmental disorders and result in an altered prevalence rate for this grouping overall. This scoping review aims to summarise the characteristics of primary studies examining the prevalence of mental, behavioural or neurodevelopmental disorders based on ICD-11 criteria. The knowledge attained through this review will primarily characterise the methodological approaches of this research field and additionally assist in deciding which psychiatric diagnoses are—given the current literature—most relevant for subsequent systematic reviews and meta-analyses intended to approximate the magnitude of prevalence rates while providing a first glimpse of the range of expected (differences in) prevalence rates in these conditions.

Methods and analysis MEDLINE, Embase, Web of Science and PsycINFO will be searched from 2011 to present without any language filters. This scoping review will follow the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Review guidelines.

We will consider (a) cross-sectional and longitudinal studies (b) focusing on the prevalence rates of mental, behavioural or neurodevelopmental disorders (c) using ICD-11 criteria for inclusion. The omission of (a) case numbers and sample size, (b) study period and period of data collection or (c) diagnostic procedures on full-text level is considered an exclusion criterion.

This screening will be conducted by two reviewers independently from one another and a third reviewer will be consulted with disagreements. Data extraction and synthesis will focus on outlining methodological aspects.

Ethics and dissemination We intend to publish our review in a scientific journal. As the primary data are publicly available, we do not require research ethics approval.

  • EPIDEMIOLOGY
  • STATISTICS & RESEARCH METHODS
  • MENTAL HEALTH
  • Systematic Review

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:  http://creativecommons.org/licenses/by-nc/4.0/ .

https://doi.org/10.1136/bmjopen-2023-081082

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

This scoping review will be the first to summarise the characteristics of the literature assessing prevalence rates of mental, behavioural or neurodevelopmental disorders (MBND) according to the 11th revision of the International Classification of Diseases (ICD-11). Additionally, it will identify research gaps and inform subsequent systematic reviews and meta-analyses on the prevalence of the mentioned disorders.

Our search strategy consists of four electronic databases targeting peer-reviewed literature as well as grey literature sources to reduce publication bias; it will be conducted with no language restrictions.

We will adhere to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines for the conduct of Scoping Reviews to ensure transparent reporting.

To the end of a timely review, this scoping review covers the vast majority but not the entirety of diagnostic entities located within the MBND chapter of ICD-11.

Introduction

In 2019, mental health conditions were among the 10 primary contributors to disease burden worldwide—an increase in burden being observable since 1990. 1 The current Global Burden of Disease study estimates roughly 970 million cases of mental health disorders worldwide to be responsible for more than 125 million disability-adjusted life years and for 15% of all years lived with disability 1 : numbers which highlight the relevance of mental health conditions as a global public health concern.

Reliable and standardised measurements of health issues—relying on proper categorisation of diseases and associated processes—are necessary to understand, prevent and treat diseases while guaranteeing efficient resource utilisation. 2

Through several revisions, 3 the International Classification of Diseases (ICD) has evolved from a limited catalogue of causes of death 4 into the ‘essential infrastructure for health information’ 2 and as such should serve the aforementioned functions. 2

The product of its 10th revision process, the ICD-11, was accepted by the World Health Assembly of WHO in May 2019. 2 Notable differences in its mental, behavioural or neurodevelopmental disorders (MBND) chapter were described by Gaebel et al and are summarised as follows 5 :

Altered subchapter structure: with 21 subchapters, the MBND chapter encompasses almost twice as many as chapter V of the ICD-10. 5 This change resulted from the removal of a rule limiting the number of subchapters to 10 at every level of the ICD-10. 6 Cross-links within chapter VI refer to the new sleep-wake disorders and conditions related to sexual health chapters, and in an effort to emphasise the continuous nature of development, the subchapter on mental or behavioural disorders with onset during childhood and adolescence was disintegrated, locating the respective diagnoses elsewhere. 5 7

New diagnostic entities: the revision resulted in the elimination of diagnostic groupings and the introduction of new diagnostic entities such as body dysmorphic disorder, prolonged grief disorder and complex post-traumatic stress disorder (complex PTSD). 5

Changes regarding diagnostic criteria: examples comprise a higher diagnostic threshold for PTSD 5 and schizoaffective disorders 6 and a new conceptualization of personality disorders, which removes the established category types classification of the ICD-10. 5 6 8

As observed in the context of other revision processes, changes in diagnostic criteria can lead to a change in prevalence rates of diagnoses. 9 The introduction of a reduced diagnostic threshold for attention deficit hyperactivity disorder in older adolescents and adults by the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5), for instance, led to an increase of 65% in reported prevalence rates within these populations. 9 Considering this, the publication of the ICD-11 alpha browser in May 2011 10 initiated a growing body of work pertaining to the prevalence rates of new diagnostic entities and the difference in prevalence rates of MBND assessed according to ICD-11 and ICD-10 criteria. 11–13

As accurate estimates of prevalence rates are of key importance for public health planning, healthcare resource allocation as well as identifying risk factors or health disparities, this scoping review seeks to provide an overview of primary studies which examine the prevalence of mental disorders based on ICD-11 criteria. It aims to analyse the methodologies used to determine prevalence rates, including data sources, sampling methods, diagnostic tools and population characteristics. As such it will also support the decision on which diagnoses are most suitable for subsequent systematic reviews and meta-analyses, which can provide more accurate estimates on how the ICD-11 will impact prevalence rates of specific MBND and disorders of this grouping in general.

The purpose of this review is represented by its rationales

Rationale 1: the rationale of this review is to outline how prevalence rates of MBND of ICD-11 have been assessed so far and thereby summarise the approaches of currently available primary studies.

Associated review questions are:

What are the sample characteristics of primary studies?

Where were the primary studies based?

What was the timeframe for data collection within primary studies?

Study period.

Year of data collection.

What are the study designs of primary studies?

What were research aims of the primary studies?

Which MBND are most frequently assessed?

How were diagnoses assessed?

What measurement tools were used?

How was data collected?

What prevalence was estimated for the diagnoses?

Additionally, research gaps will be identified:

Which mental, behavioural or neurodevelopmental disorders are least frequently assessed within prevalence studies?

Rationale 2: identify mental, behavioural or neurodevelopmental disorders most suitable for subsequent systematic reviews and meta-analyses: here we are interested in:

Disorders, where multiple (≥2) primary studies exist which assess the prevalence of the disorders listed below (table 1) according to ICD-11 criteria and ICD-10 criteria within one cohort.

Newly introduced disorders, where multiple (≥2) primary studies exist which assess the prevalence of the disorders listed below (table 1) according to ICD-11 criteria.

As is reflected within these rationales, the main outcome of our project is a summary of the study characteristics of a body of work. A scoping review lends itself to the most appropriate method of evidence synthesis.

A preliminary search of MEDLINE, Embase and PsycINFO for existing scoping and systematic reviews on the topic was performed on 6 October 2023. We did not identify reviews pertaining to a similar topic.

This scoping review in its final form will be reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension tool for Scoping Reviews. 14 This protocol has been developed in accordance with the JBI methodological guidelines. 15 We will describe protocol modifications with their respective dates.

Eligibility criteria

We will include

Cross-sectional and longitudinal studies

assessing the prevalence of MBND listed (table 1)

as per ICD-11 criteria

In sight of the feasibility of our project, provision of insufficient data on full-text level of primary studies constitutes an exclusion criterion: we will exclude primary studies which fail to provide

Case numbers and sample size

Study period and period of data collection

Details of diagnostic procedures

Information sources

We will conduct a search for the peer-reviewed literature from 2011 to present in the following databases: MEDLINE, Embase, PsycINFO and Web of Science. No language restriction will be applied. Sources identified in other languages which require translation for the full-text screening will be translated by state-certified translators.

Search strategy

Our search strategy for the peer-reviewed databases will consist of a search string of the general pattern:

String element for retrieving articles on each of the specific diagnoses as listed in table 1

String element for retrieving diagnoses according to ICD-11 criteria

Search filter identifying cross-sectional and longitudinal studies

  • View inline

Relevant mental, behavioural or neurodevelopmental disorders adapted from the WHO ICD-11 browser

The MBND listed in table 1 will be searched for.

The reference list of all included studies will be searched for additional sources. Sources of grey literature will also be identified and searched.

The search strategy was developed in consultation with an information specialist. The search string will be modified for the grey literature sources. We will repeat the search before the final analysis. The exact search strategy for MEDLINE via Ovid can be found in the online supplemental material . The planned start and end dates for this study are May 2024 and May 2026, respectively.

Supplemental material

Data management and study selection process.

After performing searches across the databases, the title and abstract of each article will be exported to EndNote. Any duplicates will be removed at this stage. The titles and abstracts of all articles will be reviewed by two reviewers (KN and SG) according to the inclusion/exclusion criteria. Disagreements at this screening stage will be resolved by consensus of a third reviewer (SF) and studies will be retrieved for full-text review, if not excluded at this stage. Similarly, the full-text review will be conducted by two reviewers and disagreements will be resolved by consulting a third reviewer.

Data extraction

Following the review of titles and abstracts, an Excel spreadsheet will be created for the full-text review where the reviewers will have to document (a) whether the article is to be included or excluded, (b) record the reason for exclusion for excluded sources and (c) extract key information from each included paper. Data will be extracted by two reviewers, and discrepancies will be solved by a third reviewer.

The data extraction form will be piloted on a sample of the included studies and possibly modified.

Inclusion of a primary source provided; we intend to contact authors for further information when necessary.

Concerning the data extraction—in alignment with the aims of this project—our current data extraction form contains the following items:

Bibliographic information

Last name of the first author

Year of publication

Peer-review status (peer reviewed: eg, yes, no as in preprint)

Journal/source

Study location

Study period/year of data collection

Study design

Scope of the investigation/research aims

Investigating the prevalence

Investigating predictors

Investigating consequences

Investigating psychosocial correlates

Study sample

Study sample (as in sampling process)

Sample size

Age range of the study population

Sex/gender ratio

Psychiatric disorders assessed

Diagnostic tool

Measurement tools used

Method(s) of data collection

Prevalence of psychiatric disorders

Analysis performed

As these data points provide the basis for an appropriate description of the methodology of this body of work, we cannot distinguish between main and additional outcomes.

Due to the aim of our work (ie, to give an overview of prevalence data available and methodological approaches used to obtain these estimates), we will use the JBI prevalence critical appraisal tool (possibly with minor modifications) to assess the methodological limitations or risk of bias of the evidence of primary studies included.

Data synthesis

For all studies meeting the inclusion criteria of the scoping review, we will use a descriptive synthesis approach. Our summary will focus on the extracted data. The results will be presented as charts, maps or tables. We will choose those visualisation and summary approaches that best fit the extracted content.

Patient and public involvement

This project aims to analyse an existing body of research studies, and we include an expert of experience (peer-to-peer trainer) and a representative of relatives in our research group. The expert of experience (AJ) was involved in the development of this protocol and will be consulted during the process of data synthesis and the discussion of our results. The representatives of relatives will be consulted during the process of data synthesis and the discussion of our results.

Dissemination and ethics

Regarding the dissemination of our work, the scoping review will be provided to scientific journals for consideration for publication, and its results may be presented as conference posters and presentations. No ethics approval is required as the analysed data originates from publicly available material.

Ethics statements

Patient consent for publication.

Not applicable.

  • Ferrari AJ ,
  • Santomauro DF ,
  • Herrera AMM , et al
  • Harrison JE ,
  • Jakob R , et al
  • World Health Organization
  • Stricker J ,
  • Zielasek J ,
  • Doering S , et al
  • Kalansky A , et al
  • World Health Organization
  • Miller MW ,
  • Wolf EJ , et al
  • Boelen PA ,
  • Lenferink LIM ,
  • Nickerson A , et al
  • Barbano AC ,
  • van der Mei WF ,
  • Bryant RA , et al
  • Tricco AC ,
  • Zarin W , et al
  • Aromataris E ,
  • Lockwood C ,

Contributors SG and KN conceptualised this scoping review. KN is the author of the first draft of this protocol. SF, AH, SG, SS, KD, SK and AJ critically reviewed the manuscript and provided amendments. The search strategy was developed by KN with input from information scientists, SG and SK. All authors read and approved the final manuscript.

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests None declared.

Patient and public involvement Patients and/or the public were involved in the design, conduct, reporting or dissemination plans of this research. Refer to the Methods section for further details.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Read the full text or download the PDF:

  • Research article
  • Open access
  • Published: 19 October 2020

Development, testing and use of data extraction forms in systematic reviews: a review of methodological guidance

  • Roland Brian Büchter   ORCID: orcid.org/0000-0002-2437-4790 1 ,
  • Alina Weise 1 &
  • Dawid Pieper 1  

BMC Medical Research Methodology volume  20 , Article number:  259 ( 2020 ) Cite this article

36k Accesses

34 Citations

27 Altmetric

Metrics details

Data extraction forms link systematic reviews with primary research and provide the foundation for appraising, analysing, summarising and interpreting a body of evidence. This makes their development, pilot testing and use a crucial part of the systematic reviews process. Several studies have shown that data extraction errors are frequent in systematic reviews, especially regarding outcome data.

We reviewed guidance on the development and pilot testing of data extraction forms and the data extraction process. We reviewed four types of sources: 1) methodological handbooks of systematic review organisations (SRO); 2) textbooks on conducting systematic reviews; 3) method documents from health technology assessment (HTA) agencies and 4) journal articles. HTA documents were retrieved in February 2019 and database searches conducted in December 2019. One author extracted the recommendations and a second author checked them for accuracy. Results are presented descriptively.

Our analysis includes recommendations from 25 documents: 4 SRO handbooks, 11 textbooks, 5 HTA method documents and 5 journal articles. Across these sources the most common recommendations on form development are to use customized or adapted standardised extraction forms (14/25); provide detailed instructions on their use (10/25); ensure clear and consistent coding and response options (9/25); plan in advance which data are needed (9/25); obtain additional data if required (8/25); and link multiple reports of the same study (8/25). The most frequent recommendations on piloting extractions forms are that forms should be piloted on a sample of studies (18/25); and that data extractors should be trained in the use of the forms (7/25). The most frequent recommendations on data extraction are that extraction should be conducted by at least two people (17/25); that independent parallel extraction should be used (11/25); and that procedures to resolve disagreements between data extractors should be in place (14/25).

Conclusions

Overall, our results suggest a lack of comprehensiveness of recommendations. This may be particularly problematic for less experienced reviewers. Limitations of our method are the scoping nature of the review and that we did not analyse internal documents of health technology agencies.

Peer Review reports

Evidence-based medicine has been defined as the integration of the best-available evidence and individual clinical expertise [ 1 ]. Its practice rests on three fundamental principles: 1) that knowledge of the evidence should ideally come from systematic reviews, 2) that the trustworthiness of the evidence should be taken into account and 3) that the evidence does not speak for itself and appropriate decision making requires trade-offs and consideration of context [ 2 ]. While the first principle directly speaks to the importance of systematic reviews, the second and third have important implications for their conduct. The second principle implies that systematic reviews should be based on rigorous, bias-reducing methods. The third principle implies that decision makers require sufficient information on the primary evidence to make sense of a review’s findings and apply them to their context.

Broadly speaking, a systematic review consists of five steps: 1) formulating a clear question, 2) searching for studies able to answer this question, 3) assessing and extracting data from the studies, 4) synthesizing the data and 5) interpreting the findings [ 3 ]. At a minimum, steps two to five rely on appropriate and thorough data collection methods. In order to collate data from primary studies, standardised data collection forms are used [ 4 ]. These link systematic reviews with primary research and provide the foundation for appraising, analysing, summarising and interpreting a body of evidence. This makes their development, pilot testing and application a crucial part of the systematic reviews process.

Studies on the prevalence and impact of data extraction errors have recently been summarised by Mathes and colleagues [ 5 ]. They identified four studies that looked at the frequency of data extraction errors in systematic reviews. The error rate for outcome data ranged from 8 to 63%. The impact of the errors on summary results and review conclusions varied. In one of the studies the effect size from the meta-analytic point estimates changed by more than 0.1 in 70% of cases (measured as standardised differences in means) [ 6 ]. Considering that most interventions have small to moderate effects, this can have a large impact on conclusions and decisions. Little research has been conducted on extraction errors relating to non-outcome data.

The importance of a rigorous data extraction process is not restricted to outcome data. As previously mentioned, users of systematic reviews need sufficient information on non-outcome data to make sense of the underlying primary studies and assess their applicability. Despite this, many systematic reviews do not sufficiently report this information. In one study almost 90% of systematic reviews of interventions did not provide the information required for treatments to be replicated in practice – compared to 35% of clinical trials [ 7 ]. While there are several possible reasons for this – including the quality of reporting – insufficient data collection forms or procedures may to contribute to the problem.

Against this background, we sought to review the guidance that is available to systematic reviewers for the development and pilot testing of data extraction forms and the data extraction process, these being central elements in systematic reviews.

This project was conducted as part of a dissertation, for which an exposé is available in German. We did not publish a protocol for this descriptive analysis, however. As there are no specific reporting guidelines for this type of methodological review, we reported our methods in accordance with the PRISMA statement as applicable [ 8 ].

Systematic reviews are conducted in a variety of different contexts – most notably as part of dissertations or academic research projects, as standalone projects, by health technology assessment (HTA) agencies and by systematic review organisations (SROs). Thus, we looked at a broad group of sources to identify recommendations:

Methodological handbooks from major SROs

Textbooks aimed at students and researchers endeavouring to conduct a systematic review

Method documents from HTA agencies

Published journal articles making recommendations on how to conduct a systematic review or how to develop data extraction forms

While the sources that we searched mainly focus on medicine and health, we did not exclude other health-related areas such as the social sciences or psychology.

Data sources

Regarding the methodological handbooks from SROs, we considered the following to be the most relevant to our analysis:

The Centre for Reviews and Dissemination’s guidance for undertaking reviews in health care (CRD guidance)

The Cochrane Handbook of Systematic Reviews of Interventions (Cochrane Handbook)

The Institute of Medicine’s Finding What Works in Health Care: Standards for Systematic Reviews (IoM Standards)

The Joanna Briggs Institute’s Reviewer Manual (JBI Manual)

The list of textbooks was based on a recently published article that reviewed systematic review definitions used in textbooks and other sources [ 9 ]. The authors did not carry out a systematic search for textbooks, but included textbooks from a broad range of disciplines including medicine, nursing, education, health library specialties and the social sciences published between 1998 and 2017. These textbooks included information on data extraction in systematic reviews, but none of them focussed on this topic exclusively.

Regarding the HTA agencies, we compiled a list of all member organisations of the European Network for Health Technology Assessment (EUnetHTA), the International Network of Agencies for Health Technology Assessment (INAHTA), Health Technology Assessment international (HTAi) and the Health Technology Assessment Network of the Americas (Red de Evaluación de Tecnologías en Salud de las Américas – RedETSA). The reference month for the compilation of this list was January 2019, the list is included in additional file  1 . We searched these websites for potentially relevant documents and downloaded these. We then reviewed the full texts of all documents for eligibility and included those that fulfilled our inclusion criteria. The website searches and the full text screening of the documents were conducted by two authors independently (RBB and AW). Disagreements were resolved by discussion. We also planned to include the newly founded Asia-Pacific HTA network (HTAsiaLink), but the webpage had not yet been launched during our research period.

To identify relevant journal articles, we first searched the Scientific Resource Center’s Methods Library (SRCML). This is a bibliography of publications relevant to evidence synthesis methods which was maintained until the third quarter of 2017 and has been archived as a RefWorks library. Because the SRCML is no longer updated, we conducted a supplementary search of Medline from the 1st of October 2017 to the 12th of December 2019. Finally, we searched the Cochrane Methodology Register (CMR), a reference database of publications relevant to the conduct of systematic reviews that was curated by the Cochrane Methods Group. The CMR was discontinued on the 31st of May 2012 and has been archived. Due to the limited search and export functions of these archived SRCML and CMR, we used pragmatic search methods for these sources. The search terms that were used for the databases searches are included in additional file  2 . The titles and abstracts from the database searches and the full texts of potentially relevant articles were screened for eligibility by two authors independently (RBB and AW). Disagreements were resolved by discussion or, if this was unsuccessful, arbitration with DP.

Inclusion criteria

To be eligible for inclusion in our review, documents had to fulfil the following criteria:

Published method document (e.g. handbook, guidance, standard operating procedure, manual), academic textbook or journal article

Include recommendations on the development or piloting of data extraction forms or the data extraction process in systematic reviews

Available in English or German

We excluded empirical research on different data extraction methods as well as papers on technical aspects, because these have been reviewed elsewhere [ 10 , 11 , 12 ]. This includes, for example, publications on the merits and downsides of different types of software (word processors, spreadsheets, database or specialised software) or the use of pencil and paper versus electronic extraction forms. We also excluded conference abstracts and other documents not published in full.

For journal articles we specified the inclusion and exclusion criteria more narrowly as this group includes a much broader variety of sources (for example we excluded “primers”, i.e. articles that provide an introduction to reading or appraising a systematic review for practitioners). The full list of inclusion and exclusion criteria for journal articles is published in additional file 2 .

Items of interest

We looked at a variety of items relevant to three categories of interest:

the development of data extraction forms,

the piloting of data extraction forms and

the data extraction process.

To our knowledge, no comprehensive list of potentially relevant items exists. We therefore developed a list of potentially relevant items based on iterative reading of the most influential method handbooks from SROs (see above) and our personal experience. The full list of items included in our extraction form is reported in additional file  3 together with a proposed rationale for each item.

We did not examine recommendations regarding the specific information that should be extracted from studies, because this depends on a review’s question. For example, reviewers might choose to include information on surrogate outcomes in order to aid interpretation of effects or they might choose not to, because they often poorly correlate with clinical endpoints and the researchers are interested in patient-relevant outcomes [ 13 , 14 ]. Furthermore, the specific information that is extracted for a review depends on the area of interest with special requirements for complex intervention or adverse effects reviews, for example [ 15 ]. For the same reason, we did not examine recommendations regarding specific methodological or statistical aspects. For instance, when a generic inverse variance meta-analysis is conducted, standard errors are of interest, whereas in other cases standard deviations may be preferably extracted.

  • Data extraction

One author developed the first draft of the data extraction form to gather information on the items of interest. This was reviewed by DP and complemented and revised after discussion. We collected bibliographic data, direct quotations on recommendations from the source text and page numbers.

Each item was coded using a coding scheme of five possible attributes:

recommendation for the use of this method

recommendation against the use of this method

optional use of this method

a general statement on this method without a recommendation

method not mentioned

For some items descriptive information was of additional interest. This included specific recommendations on the sample of studies that should be used to pilot the data extraction form or the experience or expertise of the reviewers that should be involved. Descriptive information was copied and pasted into the form. The form also included an open field for comments in case any additional items of interest were identified.

One author (RBB) extracted the information of interest from the included documents using the final version of the extraction form. A second author double-checked the information for each of the extracted items (AW). Discrepancies were resolved by discussion or by arbitration with DP.

During extraction, one major change was required to the form. Initially, we considered quantifying agreement only during the piloting phase of an extraction form, but later realised that some sources recommended this for the extraction phase of a review. We thus added items on quantifying agreement to this category.

Data analysis

We separately analysed and reported the four groups of documents (handbooks from SROs, documents from HTA agencies, textbooks and journal articles) and the three categories of interest (development, piloting and extraction). We summarised the results of our findings descriptively. We also aggregated the results across sources for each item using frequencies. Additional information is presented descriptively in the text.

In our primary analysis we only included documents that made recommendations for interventional reviews or generic recommendations. We did this because almost all included documents focussed on these types of reviews and, more importantly, to avoid inclusion of multiple recommendations from one institution. This was particularly relevant for the Joanna Briggs Institute’s Reviewer Manual which at the time of our analysis had 10 separate chapters on a variety of different systematic review types. The decision to restrict the primary analysis to documents focussing on interventional reviews and generic documents was made post hoc. Results for other types of reviews (e.g. scoping reviews, umbrella reviews, economic reviews) are presented as a secondary analysis.

We identified and searched 158 webpages of HTA agencies via the member lists of EUnetHTA, INAHTA, HTAi and RedETSA (see additional file 1 ). This resulted in 155 potentially relevant method documents from 67 agencies. After full text screening, 6 documents remained that fulfilled our inclusion criteria. The database searches resulted in 2982 records. After title and abstract screening, 15 potentially relevant full texts remained. Of these 5 fulfilled our inclusion criteria. A PRISMA flow chart depicting the screening process for the database searches is provided in additional file 2 and for the HTA method documents in additional file 1 .

In total, we collected data from 14 chapters in 4 handbooks of SROs [ 16 , 17 , 18 , 19 ], 11 textbooks [ 3 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 ], 6 method documents from HTA agencies [ 30 , 31 , 32 , 33 , 34 , 35 ] and 5 journal articles [ 36 , 37 , 38 , 39 , 40 ]. Additional file  4 lists all documents that fulfilled our inclusion criteria. In our primary analysis we describe recommendations from a total of 25 sources: 4 chapters from 4 SRO handbooks, 11 textbooks, 5 method documents from HTA agencies and 5 journal articles. Our secondary analysis on recommendations for non-interventional systematic reviews is included in Additional file  5 and the detailed results for the primary analysis in Additional file  6 .

Synthesis of the primary analysis

In sum, we analysed recommendations from 25 sources in our primary analysis. The most frequent recommendations on the development of extraction forms are to use customised or adapted standardised extraction forms (14/25); provide detailed instructions on their use (10/25); ensure clear and consistent coding and response options (9/25); plan in advance which data are needed (9/25); obtain additional data if required (8/25); and link multiple reports of the same study (8/25).

The most frequent recommendations on piloting extractions forms are that forms should be piloted on a sample of studies (18/25); and that data extractors should be trained in the use of the forms (7/25).

The most frequent recommendations on data extraction are that data extraction should be conducted by at least two people (17/25); that independent parallel extraction should be used (11/25); and that procedures to resolve disagreements between data extractors should be in place (14/25).

To provide a more comprehensible overview and illustrate areas where guidance is sparse, we have aggregated the results for definite recommendations (excluding optional recommendations or general statements) in Tables 1 , 2 and 3 . To avoid any misconceptions, we emphasise that by aggregating these results we by no means suggest that all items are of equal importance. Some are in fact mutually exclusive or interconnected.

The following sections provide details for each groups of documents sorted by the three categories of interest.

Handbooks of systematic review organisations

Category: development of extraction forms.

Three handbooks recommend that reviewers should plan in advance which data to extract [ 16 , 17 , 18 ]. Furthermore, three recommended that reviewers develop a customized data extraction form or adapt an existing form to meet the specific review needs [ 17 , 18 , 19 ]. In contrast, the JBI recommends use of their own standardised data extraction form, but allows reviewers to use others, if this is justified and the forms are described [ 16 ]. All four handbooks recommend that reviewers link multiple reports of the same study to avoid multiple inclusions of the same data [ 16 , 17 , 18 , 19 ]. Three handbooks make statements on strategies for obtaining unpublished data [ 16 , 17 , 18 ]. The Cochrane Handbook recommends contacting authors to obtain additional data, while the CRD guidance makes a general statement in light of the chances of success and resources available. The JBI manual makes this optional but requires the systematic reviewers to report whether authors of included studies are contacted in the review protocol.

Two handbooks recommend that the data collection form includes consistent and clear coding instructions and response options and that data extractors are provided with detailed instructions on how to complete the form [ 17 , 18 ]. The Cochrane Handbook also recommends that the entire review team should be involved in the development of the data extraction form and that this should include authors with expertise in the content area, review methods, statisticians and data extractors. The Cochrane Handbook also recommends that reviewers check compatibility of electronic forms or data systems with analytical software and ensure methods are in place to record, assess and correct data entry errors.

Category: piloting of extraction forms

Three handbooks recommended that authors pilot test their data extraction form [ 17 , 18 , 19 ]. The Cochrane Handbook recommends that “several people” are involved and “at least a few articles” used. The CRD guidance states that “a sample of included studies” should be used for piloting. The Cochrane Handbook also recommends that data extractors are trained; that piloting may need to be repeated if major changes to the extraction form are made during the review process; and that reports that have already been extracted should be re-checked in this case. None of the handbooks makes an explicit recommendation on who should be involved in piloting the data extraction form or their expertise. Furthermore, none of the handbooks makes a recommendation on quantifying agreement during the piloting process or using a quantified reliability threshold that should be reached before beginning the extraction process.

Category: data extraction

All handbooks recommend that data should be extracted by at least two reviewers (dual data extraction) [ 16 , 17 , 18 , 19 ]. Three handbooks recommend that data are extracted by two reviewers independently (parallel extraction) [ 16 , 18 , 19 ], one also considers it acceptable that one reviewer extracts the data and a second reviewer checks it for accuracy and completeness (double-checking) [ 17 ]. Furthermore, two of the handbooks make an optional recommendation that independent parallel extraction could be done only for critical data such as risk of bias and outcome data, while non-critical data is extracted by a single reviewer and double-checked by a second reviewer [ 18 , 19 ]. The Cochrane Handbook also recommends that data extractors have a basic understanding of the review topic and knowledge of study design, data analysis and statistics [ 18 ].

All handbooks recommend that reviewers should have procedures in place to resolve disagreements arising from dual data extraction [ 16 , 17 , 18 , 19 ]. In all cases discussion between extractors or arbitration with a third person are suggested. The Cochrane Handbook recommends hierarchical use of these strategies, while the other sources do not specify this [ 18 ]. Of note, the IoM Standards highlights the need for a fair procedure that ensures both reviewers judgements are considered in case of a power or experience asymmetry [ 19 ]. The Cochrane Handbook also recommends that disagreements that remain unresolved after discussion, arbitration or contact with study authors should be reported in the systematic review [ 18 ].

Two handbooks recommend to informally consider the reliability of coding throughout the review process [ 17 , 18 ]. These handbooks also mention the possibility of quantifying agreement of the extracted data. The Cochrane Handbook considers this optional and recommends it only for critical outcomes such as risk of bias assessments or key outcome data, if done [ 18 ]. The CRD guidance mentions this possibility without making a recommendation [ 17 ]. Two handbooks recommend that reviewers document disagreements and how they were resolved [ 17 , 18 ] and two recommend reporting who was involved in data extraction [ 18 , 19 ]. The IoM Standards specify this in that the number of individual data extractors and their qualifications should be reported in the methods section of the review [ 19 ].

Textbooks on conducting systematic reviews

Regarding the development of data extraction forms, the most frequent recommendation in the analysed textbooks is that reviewers should develop a customized extraction form or adapt an existing one to suit the needs of their review (6/11) [ 20 , 21 , 23 , 24 , 26 , 29 ]. Two textbooks consider the choice between customized and generic or pre-existing extraction forms optional [ 3 , 25 ].

Many of the textbooks also make statements on unpublished data (7/11). Most of them recommend that reviewers develop a strategy for obtaining unpublished data (4/11) [ 24 , 25 , 26 , 29 ]. One textbook makes an optional recommendation on obtaining unpublished data and mentions the alternative of conducting sensitivity analysis to account for missing data [ 3 ]. Two textbooks make general statements regarding missing data without a compulsory or optional recommendation [ 22 , 23 ].

Four textbooks recommend that reviewers ensure consistent and easy coding rules and response options in their data collection form [ 3 , 22 , 25 , 29 ]; three to provide detailed instruction on how to complete the data collection form [ 22 , 24 , 25 ]; and three to link multiple reports of the same study [ 3 , 24 , 26 ]. One textbook discusses the impact of including multiple study reports but makes no specific recommendation [ 23 ].

Two textbooks recommend reviewers to plan in advance which data they will need to extract for their review [ 24 , 28 ]. One textbook makes an optional recommendation, depending on the number of included studies [ 22 ]. For reviews with a small number of studies it considers an iterative process appropriate; for large data sets it recommends a thoroughly developed and overinclusive extraction form to avoid the need to go back to study reports later in the review process.

One textbook recommends that clinical experts or methodologists are consulted in developing the extraction form to ensure important study aspects are included [ 26 ]. None includes statements on the recording and handling of extraction errors.

For this category, the most frequently made recommendation in the analysed textbooks is that reviewers should pilot test their data extraction form (8/11) [ 3 , 20 , 22 , 23 , 24 , 25 , 26 , 29 ]. One textbook makes a general statement on piloting, but no specific recommendation [ 27 ].

Three textbooks recommend that data extractors are trained [ 22 , 24 , 25 ]. One textbook states that extraction should not begin before satisfactory agreement is achieved but does not define how this should be assessed [ 22 ]. No recommendations were identified for any of the other items regarding piloting of extraction form in the analysed textbooks.

Six textbooks recommend data extraction by at least two reviewers [ 22 , 23 , 24 , 25 , 26 , 29 ]. Four of these recommend parallel extraction [ 23 , 24 , 25 , 26 ], while two do not specify the exact procedure [ 22 , 29 ]. One textbook explains the different types of dual extraction modes but makes no recommendation on their use [ 27 ].

One textbook recommends that reviewer agreement for extracted data is quantified using a reliability measure [ 25 ], while two mention this possibility without making a clear recommendation [ 22 , 26 ]. Two of these mention Cohen’s kappa as possible measures for quantifying agreement [ 22 , 26 ], one also mentions raw agreement [ 22 ].

Five textbooks recommend that reviewers develop explicit procedures for resolving disagreements, either by discussion or consultation of a third person [ 22 , 24 , 25 , 26 , 29 ]. Two textbooks suggest a hierarchical approach using discussion and, if this is unsuccessful, arbitration with a third person [ 25 , 29 ]. One textbook also suggests the possibility of including the entire review team in discussions [ 24 ]. One textbook emphasizes that educated discussions should be preferred over voting procedures [ 26 ]. One textbook also recommends that reviewers document disagreements and how they were resolved [ 26 ].

One textbook makes recommendations on the expertise of the data extractors [ 24 ]. It suggests that data extraction is conducted by statisticians, data managers and methods experts with the possible involvement of content experts, when required.

Documents from HTA agencies

In two documents from HTA agencies it is recommended that a customised extraction form is developed [ 31 , 35 ]. One of these roughly outlines the contents of extraction forms that can be used as a starting point [ 31 ]. Three documents recommend that detailed instructions on using the extraction form should be provided [ 30 , 31 , 34 ]. Two documents recommend that reviewers develop a strategy for obtaining unpublished data [ 30 , 31 ].

The following recommendations are only included in one method document each: planning in advance which data will be required for the synthesis [ 30 ]; ensuring consistent coding and response options in the data collection form [ 31 ] and linking multiple reports of the same study to avoid including data from the same study more than once [ 31 ].

For this category the only recommendation we found in HTA documents is that data collection forms should be piloted before use (3/5) [ 30 , 31 , 33 ]. None of the documents specifies how this may be done, for example regarding the number or types of studies involved. One of the documents makes a vague suggestion that all reviewers ought to be involved in pilot testing.

In most documents it is recommended that data extraction should be conducted by two reviewers (4/5) [ 30 , 31 , 34 , 35 ]. Two make an optional recommendation for either parallel extraction or a double-checking procedure [ 30 , 31 ], one recommends parallel extraction [ 34 ] and one reports use of double-checking [ 35 ]. Three method documents recommend that reviewers resolving disagreements by discussion [ 30 , 31 , 35 ]. One method document recommends that reviewers report who was involved in data extraction [ 34 ].

Journal articles

We identified 5 journal articles that fulfilled our inclusion criteria. This included a journal article specifying the methods used by the Cochrane Back and Neck Group [ 36 ], an article describing the data extraction and synthesis methods used in JBI systematic reviews [ 38 ], a paper on guidelines for systematic review in the environmental research field [ 39 ] and two in-depth papers on data extraction and coding methods within systematic reviews [ 37 , 40 ]. One of these used the Systematic Reviews Data Suppository (SRDS) as an example, but the recommendations made were not exclusive to this system [ 37 ].

Three journal articles recommended that authors should plan in advance which data they require for the review [ 37 , 39 , 40 ]. A recommendation for developing a customized extraction form (or adapting one) for the specific purpose of the review was also made in three journal articles [ 36 , 37 , 40 ]. Two articles recommended that consistent and clear coding and response options should be ensured and detailed instruction provided to data extractors [ 37 , 40 ]. Furthermore, two articles recommended that mechanisms should be in place for recording, assessing and correcting data entry errors [ 36 , 37 ]. Both referred to plausibility or logic checks of the data and/or statistics.

One article recommends that reviewers try to obtain further data from the included studies, where required [ 39 ], while one makes an optional recommendation [ 36 ] and another a general statement without a specific recommendation [ 37 ]. One of the articles also makes recommendations on the expertise of the reviewers that should be involved in the development of the extraction form. It recommends that all members of the team are involved including data extractors, content area experts, statisticians and reviewers with formal training in form design such as epidemiologists [ 37 ].

Four articles recommend that reviewers should pilot test their extraction form [ 36 , 37 , 38 , 40 ]. Three articles recommend training of data extractors [ 37 , 38 , 40 ]. One recommends that reviewers informally assess the reliability of coding during the piloting process [ 37 ]. One article mentions the possibility of quantifying agreement during the piloting process, without making a specific recommendation or specifying any thresholds [ 40 ].

Three articles recommend that data are extracted by two reviewers, in each case using independent parallel extraction [ 36 , 37 , 38 ]. Citing the IoM standards, one article also mentions the possibility of a using independent parallel extraction for critical data and a double-checking procedure for non-critical data [ 37 ]. One article recommends that the principle reviewer runs regular logic checks to validate the extracted data [ 37 ]. One article also mentions the possibility that the reliability of extraction may need to be reviewed throughout the extraction process in case of extended coding periods [ 40 ].

Two articles mention the need to have a procedure in place for resolving disagreements, either with a hierarchical procedure using discussion and arbitration with a third person [ 36 ] or by discussion and review of the source document [ 37 ]. One article recommends that disagreements and consensus results are documented for future reference [ 37 ]. Finally, one article mentions advantages of having data extractors with complementary expertise such as a content expert and method experts, but does not make a clear recommendations on this [ 37 ].

We reviewed current recommendations on data extraction methods in systematic reviews across a different range of sources. Our results suggest that current recommendations are fragmented. Very few documents made comprehensive recommendations. This may be detrimental to the quality of systematic reviews and makes it difficult to aspiring reviewers to prepare high quality data extraction forms and ensure reliable and valid extraction procedures. While our review cannot show that improved recommendations will truly have an impact on the quality of systematic reviews, it seems reasonable to assume that clear and comprehensive recommendations are a prerequisite to high quality data extraction, especially for less experienced reviewers.

There were some notable exceptions to our findings. Among the most comprehensive documents were the Cochrane Handbook for Systematic Reviews, the textbook by Foster and colleagues and the journal article by Li and colleagues [ 18 , 24 , 37 ]. We believe that these are among the most helpful resources for systematic reviewers from the pool of documents that we analysed – not only because they provide in-depth information, but also for being among the most current sources.

We were particularly surprised by the lack of information provided by HTA agencies. Only very few HTA agencies had documents with relevant recommendations at all. Since many HTA agencies publish detailed documents on many other methodological aspects such as search screening methods, risk of bias assessments or evidence grading methods, it would seem reasonable to provide more information on data extraction methods.

We believe there would be many practical benefits of developing clearer recommendations for the development and testing of extraction forms and the data extraction process. One reason is that data extraction is one of the most resource intensive parts of a systematic review – especially, when the review includes a significant number of studies and/or outcomes. Having a good extraction form can also save time at later stages of the review. For example, a poorly developed extraction form may lead to extensive revisions during the review process and may require reviewers to go back to the original sources or repeat extraction on some included studies. Furthermore, some methodological standards such as independent parallel extraction could be modified to save resources. This is not reflected in most of the sources included in our review. Lastly, it would be helpful to specify recommendations further to accommodate for systematic reviews of different sizes, both in terms of the number of included studies and the review team. While the general quality standards should remain the same, a mega-review with several tens or even hundreds of studies, a large, heterogeneous or international review team and several data extractors may differ in some requirements from a small review with few studies and a small, local team [ 12 , 37 ]. For example, training and piloting may need more time to achieve agreement. We therefore encourage developers of guidelines documents for systematic reviews to provide more comprehensive recommendations on developing and piloting data extraction forms and the data extraction process. Our review can be used as a starting point. Formal development of structured guidance or a set of minimum standards on data extraction methods in systematic reviews may also be useful. Moher and colleagues have developed a framework to support the development of guidance to improve reporting, which includes literature reviews and a Delphi study and provides a helpful starting point [ 41 ]. Lastly, authors of reporting guidelines for systematic reviews of various types can use our results to consider elements worth including.

To some extent the results reflect the empirical evidence from comparative methods research. For example, among the most frequent recommendations were that data extraction should be conducted by two reviewers to reduce risk of errors, which is supported by some evidence [ 11 ]. This is also true for the recommendation that additional data should be retrieved if necessary, which reflects selective outcome reporting [ 42 ]. At the same time, we found few recommendations on reviewer expertise, for which empirical studies have produced inconsistent results [ 11 ]. Arguably, some items in our analysis have theoretical rather than empirical foundations. For instance, we would consider the inclusion of content experts in the development of the extraction forms to be important to enhance clinical relevance and applicability. Even this is a somewhat contested issue, however. Gøtzsche and Ioannidis, for instance, have questioned the value of involving content experts in systematic reviews [ 43 ]. In their analysis, they highlight the lack of evidence on the effects of involving them and in addition to the possible benefits raise potential downsides of expert involvement – notably that experts often have conflicts of interest and strong prior opinions that may introduce bias. While we do not argue against involvement of content experts since conflicts of interest can be managed, the controversy shows that this in fact may be an issue worth exploring empirically [ 44 ]. Thus, in addition to providing more in-depth recommendations for systematic reviewers, empirical evaluations of extraction methods should be encouraged. Such method studies should be based on a systematic review of the current evidence and overcome some of the limitations from previous investigations including the use of convenience samples and small sets of reviewers [ 11 ].

As a final note, some parts of systematic reviews can now be assisted by automation methods. Examples include enhanced study selection using learning algorithms (e.g. implemented in Rayyan) and assisted risk of bias assessments using RobotReviewer [ 45 , 46 ]. However, not all of the software solutions are free and some are still in their early development or have not been validated yet. Furthermore, some of them are restricted to specific review types [ 47 ]. To the best of our knowledge comprehensive tools to assist with data extraction, including for example extraction of outcome data, are no yet available [ 48 ]. For example, a recent systematic review conducted with currently available automation tools used traditional spreadsheet-based data extraction forms and piloting methods [ 49 ]. The authors identified two issues regarding data extraction that could be assisted by automation methods: contacting authors of included studies for additional information using metadata and better integration of software tools to automatically exchange data between different software. Thus, much work is still to be done in this area. Furthermore, when automation tools for data extraction become available, they will need to be readily available, usability tested, accepted by systematic reviewers and validated before widespread use (validation is especially important for technically complex or critical tasks) [ 50 ]. It is also likely that they will complement current data extraction methods rather than replace them as it is currently the case for automated risk of bias assessments of randomised trials [ 46 ]. For these reasons we believe that traditional data extraction methods will still be required and used in the future.

Limitations

There are some limitations to our methods. Firstly, our review is not exhaustive. The list of handbooks from SROs was compiled based on previous research and discussions between the authors, but no formal search was conducted to identify other potentially relevant organisations [ 51 , 52 ]. The list of textbooks was also based on a previous study not intended to cover the literature in full. It does, however, include textbooks from a range of disciplines including medicine, nursing, education and the social sciences, which arguably increases the generalisability of the findings. The search strategy for our database search was pragmatic for reasons stated in the methods and may have missed some relevant articles. Furthermore, the databases searched focus on the field of medicine and health, so other areas may be underrepresented.

Secondly, searching the websites of HTA agencies proved difficult in some instances, as some websites have quite intricate site structures. Furthermore, we did not contact the HTA agencies to retrieve unpublished documents. It is likely that at least some HTA agencies have internal documents that provide more specific recommendations. Our focus was the usefulness of the HTA method documents as a guidance to systematic reviewers outside of HTA institutions, however. For this purpose, we believe that the assumption is appropriate that most reviewers are likely to depend on the information directly accessible to them.

Thirdly, it was difficult to classify some of the recommendations using our coding scheme. For example, recommendations in the new Cochrane Handbook are based on Cochrane’s Methodological Expectations for Cochrane Intervention Reviews Standards (MECIR) which make a subtle differentiation between mandatory and highly desirable recommendations. In this case we considered both these types of recommendations as positive in our classification scheme. To use a more difficult example, one HTA method document did not make a statement on the number of reviewers involved in data extraction but stated that a third investigator may check a random sample of extracted data for additional quality assurance. This would imply that data extraction is conducted by two reviewers independently, but since this method was not stated, it was classified as “method not mentioned”. While some judgements were required, we have described notable cases in the results section and do not believe that different decisions in these cases would affect our overall results or conclusions.

Lastly, we note that some of the included sources referenced more comprehensive guidance such as the Cochrane Handbook. We have not formally extracted information on cross-referencing between documents, however.

Many current methodological guidance documents for systematic reviewers lack comprehensiveness and clarity regarding the development and piloting of data extraction forms and the data extraction process. In the future, developers of learning resources should consider providing more information and guidance on this important part of the systematic review process. Our review and list of items may be a helpful starting point. HTA agencies may consider describing in more detail their published methods on data extraction procedures to increase transparency.

Availability of data and materials

The datasets used and analysed for the current study are available from the corresponding author on reasonable request.

Abbreviations

Cochrane Methodology Register

Centre for Reviews and Dissemination

European Network for Health Technology Assessment

Health Technology Assessment

Health Technology Assessment international

The collaborative research network of Health Technology Assessment agencies in the Asia-Pacific region

International Network of Agencies for Health Technology Assessment

Institute of Medicine

Joanna Briggs Institute

Preferred Reporting Items for Systematic Reviews and Meta-Analyses

Red de Evaluación de Tecnologías en Salud de las Américas (Health Technology Assessment Network of the Americas)

Scientific Resource Center’s Methods Library

Systematic Review Organisations

Sackett DL, Rosenberg WMC, Gray JAM, Haynes RB, Richardson WS. Evidence based medicine: what it is and what it isn’t. BMJ. 1996;312:71–2.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Guyatt G, Rennie D, Meade MO, Cook DJ, editors. Users’ guides to the medical literature: a manual for evidence-based clinical practice. 3rd ed. New York: McGraw-Hill Education Ltd; 2015.

Google Scholar  

Khan KS, Kunz R, Kleijnen J, Antes G. Five steps to conducting a systematic review. J R Soc Med. 2003;96:118–21.

Article   PubMed   PubMed Central   Google Scholar  

Montori VM, Swiontkowski MF, Cook DJ. Methodologic issues in systematic reviews and meta-analyses. Clin Orthop Relat Res. 2003;413:43–54.

Article   Google Scholar  

Mathes T, Klaßen P, Pieper D. Frequency of data extraction errors and methods to increase data extraction quality: a methodological review. BMC Med Res Methodol. 2017;17:152.

Gøtzsche PC, Hróbjartsson A, Maric K, Tendal B. Data extraction errors in meta-analyses that use standardized mean differences. JAMA. 2007;298:430–7.

PubMed   Google Scholar  

Glasziou P, Meats E, Heneghan C, Shepperd S. What is missing from descriptions of treatment in trials and reviews? BMJ. 2008;336:1472–4.

Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ. 2009;339:b2535.

Krnic Martinic M, Pieper D, Glatt A, Puljak L. Definition of a systematic review used in overviews of systematic reviews, meta-epidemiological studies and textbooks. BMC Med Res Methodol. 2019;19:203.

Van der Mierden S, Tsaioun K, Bleich A, Leenaars CHC. Software tools for literature screening in systematic reviews in biomedical research. ALTEX. 2019;36:508–17.

Robson RC, Pham B, Hwee J, Thomas SM, Rios P, Page MJ, et al. Few studies exist examining methods for selecting studies, abstracting data, and appraising quality in a systematic review. J Clin Epidemiol. 2019;106:121–35.

Article   PubMed   Google Scholar  

Elamin MB, Flynn DN, Bassler D, Briel M, Alonso-Coello P, Karanicolas PJ, et al. Choice of data extraction tools for systematic reviews depends on resources and review complexity. J Clin Epidemiol. 2009;62:506–10.

Ciani O, Buyse M, Garside R, Pavey T, Stein K, Sterne JAC, et al. Comparison of treatment effect sizes associated with surrogate and final patient relevant outcomes in randomised controlled trials: meta-epidemiological study. BMJ. 2013;346:f457.

Haslam A, Hey SP, Gill J, Prasad V. A systematic review of trial-level meta-analyses measuring the strength of association between surrogate end-points and overall survival in oncology. Eur J Cancer. 2019;106:196–211.

Pfadenhauer LM, Gerhardus A, Mozygemba K, Lysdahl KB, Booth A, Hofmann B, et al. Making sense of complexity in context and implementation: the Context and Implementation of Complex Interventions (CICI) framework. Implement Sci. 2017;12:21.

Aromataris E, Munn Z, editors. Joanna Briggs Institute reviewer's manual: The Joanna Briggs Institute; 2017. https://reviewersmanual.joannabriggs.org/ . Accessed 04 June 2020.

Centre for Reviews and Dissemination. CRD’s guidance for undertaking reviews in health care. York: York Publishing Services Ltd; 2009.

Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al. Cochrane Handbook for Systematic Reviews of Interventions version 6.0: Cochrane; 2019. www.training.cochrane.org/handbook . Accessed 04 June 2020.

Institute of Medicine. Finding what works in health care: standards for systematic reviews. Washington, DC: The National Academies Press; 2011.

Bettany-Saltikov J. How to do a systematic literature review in nursing: a step-by-step guide. Berkshire: McGraw-Hill Education; 2012.

Booth A, Papaioannou D, Sutton A. Systematic approaches to a successful literature review. London: Sage Publications Ltd; 2012.

Cooper HM. Synthesizing research: a guide for literature reviews. Thousand Oaks: Sage Publications Inc; 1998.

Egger M, Smith GD, Altman DG. Systematic reviews in health care: meta-analysis in context. 2nd ed. London: BMJ Publishing Group; 2001.

Book   Google Scholar  

Foster MJ, Jewell ST. Assembling the pieces of a systematic review: a guide for librarians. Lanham: Rowman & Littlefield; 2017.

Holly C, Salmond S, Saimbert M. Comprehensive systematic review for advanced nursing practice. New York: Springer Publishing Company; 2012.

Mulrow C, Cook D. Systematic reviews: synthesis of best evidence for health care decisions. Philadelphia: ACP Press; 1998.

Petticrew M, Roberts H. Systematic Reviews in the Social Sciences: A Practical Guide. Malden: Blackwell Publishing; 2008.

Pope C, Mays N, Popay J. Synthesizing Qualitative and Quantitative Health Evidence. Maidenhead: McGraw Hill; 2007.

Sharma R, Gordon M, Dharamsi S, Gibbs T. Systematic reviews in medical education: A practical approach: AMEE Guide 94. Dundee: Association for Medical Education in Europe; 2015.

Fröschl B, Bornschein B, Brunner-Ziegler S, Conrads-Frank A, Eisenmann A, Gartlehner G, et al. Methodenhandbuch für health technology assessment: Gesundheit Österreich GmbH; 2012. https://jasmin.goeg.at/121/ . Accessed 19 Feb 2019.

Gartlehner G. (Internes) Manual Abläufe und Methoden: Ludwig Boltzmann Institut für Health Technology Assessment (LBI-HTA); 2007. http://eprints.aihta.at/713/ . Accessed 19 Feb 2019.

Health Information and Quality Authority (HIQA). Guidelines for the retrieval and interpretation of economic evaluations of health technologies in Ireland: HIQA; 2014. https://www.hiqa.ie/reports-and-publications/health-technology-assessments/guidelines-interpretation-economic . Accessed 19 Feb 2019.

Institute for Clinical and Economic Review (ICER). A guide to ICER’s methods for health technology assessment: ICER; 2018. https://icer-review.org/methodology/icers-methods/icer-hta-guide_082018/ . Accessed 19 Feb 2019.

International Network of Agencies for Health Technology Assessment (INAHTA). A checklist for health technology assessment reports: INAHTA; 2007. http://www.inahta.org/hta-tools-resources/briefs/ . Accessed 19 Feb 2019.

Malaysian Health Technology Assessment Section (MaHTAS). Manual on health technology assessment. 2015. https://www.moh.gov.my/moh/resources/HTA_MANUAL_MAHTAS.pdf?mid=636 .

Furlan AD, Malmivaara A, Chou R, Maher CG, Deyo RA, Schoene M, et al. 2015 Updated Method Guideline for Systematic Reviews in the Cochrane Back and Neck Group. Spine. 2015;40:1660–73.

Li T, Vedula SS, Hadar N, Parkin C, Lau J, Dickersin K. Innovations in data collection, management, and archiving for systematic reviews. Ann Intern Med. 2015;162:287–94.

Munn Z, Tufanaru C, Aromataris E. JBI’s systematic reviews: data extraction and synthesis. Am J Nurs. 2014;114:49–54.

Pullin AS, Stewart GB. Guidelines for systematic review in conservation and environmental management. Conserv Biol. 2006;20:1647–56.

Stock WA, Goméz Benito J, Balluerka LN. Research synthesis. Coding and conjectures. Eval Health Prof. 1996;19:104–17.

Article   CAS   PubMed   Google Scholar  

Moher D, Schulz KF, Simera I, Altman DG. Guidance for developers of health research reporting guidelines. PLoS Med. 2010;7:e1000217.

Kirkham JJ, Dwan KM, Altman DG, Gamble C, Dodd S, Smyth R, et al. The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews. BMJ. 2010;340:c365.

Gøtzsche PC, Ioannidis JPA. Content area experts as authors: helpful or harmful for systematic reviews and meta-analyses? BMJ. 2012;345:e7031.

Agoritsas T, Neumann I, Mendoza C, Guyatt GH. Guideline conflict of interest management and methodology heavily impacts on the strength of recommendations: comparison between two iterations of the American College of Chest Physicians Antithrombotic Guidelines. J Clin Epidemiol. 2017;81:141–3.

Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan—a web and mobile app for systematic reviews. Syst Rev. 2016;5:210.

Marshall IJ, Kuiper J, Wallace BC. RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. J Am Med Informatics Assoc. 2016;23:193–201.

Beller E, Clark J, Tsafnat G, Adams C, Diehl H, Lund H, et al. Making progress with the automation of systematic reviews: principles of the International Collaboration for the Automation of Systematic Reviews (ICASR). Syst Rev. 2018;7:77.

O’Connor AM, Glasziou P, Taylor M, Thomas J, Spijker R, Wolfe MS. A focus on cross-purpose tools, automated recognition of study design in multiple disciplines, and evaluation of automation tools: a summary of significant discussions at the fourth meeting of the International Collaboration for Automation of Systematic R. Syst Rev. 2020;9:100.

Clark J, Glasziou P, Del Mar C, Bannach-Brown A, Stehlik P, Scott AM. A full systematic review was completed in 2 weeks using automation tools: a case study. J Clin Epidemiol. 2020;121:81–90.

O’Connor AM, Tsafnat G, Thomas J, Glasziou P, Gilbert SB, Hutton B. A question of trust: can we build an evidence base to gain trust in systematic review automation technologies? Syst Rev. 2019;8:143.

Cooper C, Booth A, Britten N, Garside R. A comparison of results of empirical studies of supplementary search techniques and recommendations in review methodology handbooks: a methodological review. Syst Rev. 2017;6:234.

Cooper C, Booth A, Varley-Campbell J, Britten N, Garside R. Defining the process to literature searching in systematic reviews: a literature review of guidance and supporting studies. BMC Med Res Methodol. 2018;18:85.

Download references

Acknowledgments

We thank information specialist Simone Hass for peer reviewing the search strategy and conducting searches.

No funding was received. Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and affiliations.

Institute for Research in Operative Medicine (IFOM), Faculty of Health - School of Medicine, Witten/Herdecke University, Ostmerheimer Str. 200, 51109, Cologne, Germany

Roland Brian Büchter, Alina Weise & Dawid Pieper

You can also search for this author in PubMed   Google Scholar

Contributions

Study design: RBB, DP. Data extraction: RBB, AW. Data analysis and interpretation: RBB, DP, AW. Writing the first draft of the manuscript: RBB. Revisions of the manuscript for important intellectual content: RBB, DP, AW. Final approval of the manuscript: RBB, DP, AW. Agree to be accountable for all aspects of the work: RBB, DP, AW. Guarantor: RBB.

Corresponding author

Correspondence to Roland Brian Büchter .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1..

List of HTA websites searched.

Additional file 2.

Information on database searches

Additional file 3.

List of items and rationale

Additional file 4.

List of included documents

Additional file 5.

Recommendations for non-interventional reviews

Additional file 6.

Primary analysis

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Büchter, R.B., Weise, A. & Pieper, D. Development, testing and use of data extraction forms in systematic reviews: a review of methodological guidance. BMC Med Res Methodol 20 , 259 (2020). https://doi.org/10.1186/s12874-020-01143-3

Download citation

Received : 11 June 2020

Accepted : 07 October 2020

Published : 19 October 2020

DOI : https://doi.org/10.1186/s12874-020-01143-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Systematic review methods
  • Evidence synthesis

BMC Medical Research Methodology

ISSN: 1471-2288

literature review data extraction

  • Open access
  • Published: 18 June 2024

Patient safety in orthodontic care: a scoping literature review with proposal for terminology and future research agenda

  • Nikolaos Ferlias 1 , 3 ,
  • Ambrosina Michelotti 2 &
  • Peter Stoustrup 1  

BMC Oral Health volume  24 , Article number:  702 ( 2024 ) Cite this article

Metrics details

Knowledge about patient safety in orthodontics is scarce. Lack of standardisation and a common terminology hinders research and limits our understanding of the discipline. This study aims to 1) summarise current knowledge about patient safety incidents (PSI) in orthodontic care by conducting a systematic literature search, 2) propose a new standardisation of PSI terminology and 3) propose a future research agenda on patient safety in the field of orthodontics.

A systematic literature search was performed in the main online sources of PubMed, Web of Science, Scopus and OpenGrey from their inception to 1 July 2023. Inclusion criteria were based on the World Health Organization´s (WHO) research cycle on patient safety. Studies providing information about the cycle’s steps related to orthodontics were included. Study selection and data extraction were performed by two of the authors.

A total of 3,923 articles were retrieved. After review of titles and abstracts, 41 articles were selected for full-text review and 25 articles were eligible for inclusion. Seven provided information on the WHO’s research cycle step 1 (“measuring harm”), twenty-one on “understanding causes” (step 2) and twelve on “identifying solutions” (step 3). No study provided information on Steps 4 and 5 (“evaluating impact” or “translating evidence into safer care”).

Current evidence on patient safety in orthodontics is scarce due to a lack of standardised reporting and probably also under-reporting of PSIs. Current literature on orthodontic patient safety deals primarily with “measuring harms” and “understanding causes of patient safety”, whereas less attention has been devoted to initiatives “identifying solutions”, “evaluating impact” and “translating evidence into safer care”. The present project holds a proposal for a new categorisation, terminology and future research agenda that may serve as a framework to support future research and clinical initiatives to improve patient safety in orthodontic care.

Registration

PROSPERO (CRD42022371982).

Peer Review reports

Introduction

For decades, patient safety has been recognised as a healthcare discipline. However, the awareness-raising publication of “To Err Is Human” by the Institute of Medicine Committee on Quality of Health Care in the US drew considerable attention to this important aspect of healthcare [ 1 , 2 ]. In this publication, experts estimated that in the US in any given year as many as 98,000 people die from medical errors that occur in hospitals [ 1 ]. The definition of patient safety by the World Health Organization (WHO) from 2009 is: “the freedom for a patient from unnecessary harm or potential harm related to healthcare” [ 2 ]. Similarly, in their report, Kohn et al. recognised safety as “freedom from accidental injury” [ 1 ]. In this context, a patient safety incident (PSI) is an event or circumstance that could have resulted or did result in unnecessary harm to a patient [ 2 ].

Patient safety is a crucial aspect of healthcare that seeks to minimise preventable harm, accidents, complications and adverse events (AEs). AEs are defined as injuries resulting from poor management practices that could have been prevented but are not attributed to an underlying disease process [ 2 , 3 ]. The WHO classifies certain AEs as "never events", which are serious incidents that should not occur given the presence of strong systemic safety measures [ 4 ]. Never events can have a profound impact on patients, and their prevention is a key objective of healthcare organisations. In this context, patient safety aims to limit the impact of AEs adverse events and promote the avoidance of preventable harm.

Patient safety is a priority from the patient’s perspective, and for care providers it falls in line with the Hippocratic Oath ("primum non nocere"), which is an important element of modern healthcare. Patient safety initiatives analyse characteristics and features of healthcare systems that may lead to the occurrence of AEs. These features are latent risks that may be of any nature from a soft tissue laceration or a loose wire to inhalation of an orthodontic appliance [ 5 ]. Throughout most healthcare treatment courses, multiple latent risks exist and this makes patient safety multifactorial and complex. When an AE occurs, patient safety does not aim to punish but rather to investigate how and why the protective barriers failed [ 6 , 7 ].

Improving the quality of care is a road that passes through patient safety. Additionally, patient safety has additional psychosocial and financial benefits. Dealing with the consequences of an adverse event has an economic cost to the practitioner, the patient and society. By improving patient safety, dental practitioners increase their quality of care, which is associated with safer and better treatment outcomes [ 8 , 9 , 10 ]. In addition, it affords increased legal security by minimising the risk of legal claims [ 6 ].

Knowledge about patient safety in dental care and orthodontics in particular is scarce. The absence of patient safety guidelines in orthodontics is a major concern. This issue is further complicated by the absence of standardized terminology in the field, challenging the development of consistent safety protocols. Additionally, there is a noticeable lack of research and publications in this area, which hinders progress in developing effective, evidence-based strategies to ensure patient safety in orthodontic care [ 11 ]. Therefore, an urgent need exists for studies in the field of orthodontics in particular [ 2 , 3 , 12 ]. Among others, the lack of a common language among orthodontic caregivers ultimately hinders research and limits our understanding of the discipline [ 13 , 14 ]. The aims of this study were to 1) summarise current knowledge about PSIs in orthodontic care by performing a systematic literature search; 2) propose a new standardisation of PSI terminology; 3) propose a research agenda on patient safety in the field of orthodontics that may serve to further develop and provide direction for future research on the subject.

Materials and methods

Protocol and registration.

Prior to the initiation of the project, the study protocol was registered with PROSPERO (reg. no. CRD42022371982). No ethical approval was deemed necessary.

Search strategy

A systematic literature search was performed in the main online sources of MEDLINE (through PubMed), Web of Science, Scopus as well as the System for Information on Grey Literature in Europe (Open-Grey) from their inception to 1 July 2023. No language limitation was set in the search, and all types of eligible human studies were included.

The inclusion criteria for articles were based on the WHO research cycle on patient safety [ 15 , 16 ]. The various steps of the cycle aim to measure harm and identify causes while identifying solutions to improve patient safety. The ultimate goal is to translate evidence into safer care (Fig.  1 ). Only studies that provided relevant information in at least one of the following categories were eligible for inclusion in this systematic review:

Measuring harm: Studies characterising and/or reporting on the occurrence of AEs or orthodontic-related patient harm.

Understanding causes: Reports focusing on understanding causes leading to patient harm and AEs from orthodontic care.

Identifying solutions: Studies identifying solutions that are effective in reducing the occurrence of AEs and patient harm.

Evaluating impact: Studies evaluating the effectiveness of solutions in terms of impact, affordability and acceptability.

figure 1

The World Health Organization’s research cycle on patient safety consisting of five steps with the main goal of measuring harm and its causes while identifying solutions and their impact. Ultimately, this evidence should lead to safer care with a set of actions and preventable measures

Only full-text articles were included. In addition, studies dealing with patient safety from a general dental-care perspective were included only if they were directly relevant to orthodontic care and the WHO research cycle. For example, although studies on oral surgery were excluded, wrong-tooth-extraction studies or articles investigating the light-curing safety on patients were included owing to their relevance to orthodontics.

The following MESH terms were used for the systematic search:

(((orthodontic*) OR (dental)) AND (patient safety)) AND ((((((((((((((((((((((((((harm) OR (risk*)) OR (malpractice)) OR (adverse event*)) OR (adverse effect*)) OR (never event*)) OR (iatrogenic)) OR (damage)) OR (incident*)) OR (accident*)) OR (delay* diagnos*)) OR (misdiagnosis)) OR (complication*)) OR (allerg*)) OR (infection)) OR (failure)) OR (error*)) OR (white spot lesion*)) OR (root resorption)) OR (relapse)) OR (decalcification)) OR (caries)) OR (periodontal disease)) OR (nerve damage)) OR (injury)) OR (temporomandibular joint dysfunction)).

Data extraction

After removal of duplicates, all results returned from the systematic literature search were initially screened by their title to establish their relevance. The second filtering decided relevance for inclusion based on the content of the abstract. Finally, the third filtering level was applied to the main text, and the remaining studies were then included in the review. All screening was performed independently by one of the authors (NF) and was later re-checked by another author (PS). Any disputes in study selection were addressed and resolved through discussion between the reviewing authors. On all included studies the main outcome/result was recorded. This was studies investigating prevalence (“measuring harm”- step 1) or assessing contributing factors (“understanding causes”-step 2). For all studies providing information on the cycle’s step 3 (“identifying solutions”), all recommended solutions to prevent harm were also noted. Due to the nature of the data in the included studies, no risk of bias assessment was possible. For the same reason, no quantitative synthesis and meta-analysis was performed. Based on these findings, the intention to conduct a systematic review was revised to a scoping literature review instead [ 17 ].

Study selection

A total of 3,923 studies were identified from the systematic search and imported into Excel (Microsoft®, USA) (PubMed n = 2,049, Web of Science n = 663, Scopus n = 1203 and OpenGrey n = 8). Among the 3,923 articles, 237 were deemed relevant according to the inclusion criteria after screening their titles. Filtering by abstracts, left 41 articles for inclusion after removal of the duplicates. In one case, the full-text of an article was unavailable and it was therefore excluded [ 18 ]. Three relevant articles found in the reference lists were also added [ 4 , 14 , 19 ]. Finally, 25 studies were included as they were found to provide information within any of the categories of the WHO’s research cycle on patient safety related to the orthodontic field (flowchart presented in Fig.  2 ).

figure 2

PRISMA flowchart diagram of the systematic literature search and inclusion procedure

Study characteristics

Study characteristics are shown in Table  1 . Nine of the included papers were retrospective studies of AEs studying: eye wear protection and ocular trauma in orthodontic practice [ 19 ], clinical evaluation of a locking orthodontic facebow [ 20 ], adverse reactions to dental materials [ 3 ], case reports of latex allergy [ 21 ], wrong tooth extraction claims [ 4 ], dental and orthodontic PSIs in a UK register [ 7 ] and a Finnish register [ 8 ], adverse reactions to dental devices reported at the US Food and Drug Administration [ 9 ] and investigation of monomer release from orthodontic adhesives [ 22 ].

The remaining sixteen studies reported risk assessments of orthodontic procedures or materials. These included safety assessment of dental radiography [ 23 ], bonding of brackets under general anaesthesia [ 24 ], orthodontic facebows [ 10 ], mini-implants [ 12 , 25 , 26 ], soft-tissue lasers in orthodontics [ 13 ], effect of orthodontic treatment on patients’ diet [ 14 ], eye safety of curing lights [ 27 ], safety of metal fixed appliance during magnetic resonance imaging (MRI) [ 28 ], pulp safety of various types of curing lights [ 29 ], wrong tooth extraction in orthodontics [ 30 , 31 , 32 ], orthodontic treatment by identifying orthodontic never events [ 33 ] and complications after orthognathic surgery [ 34 ]. These studies identified risks in orthodontic procedures or materials and proposed solutions to manage and minimise these risks.

Study results

Measuring harm.

Seven of the studies included provided information in the first category of the WHO’s research cycle on patient safety, which is “measuring harm” [ 4 , 7 , 8 , 9 , 19 , 22 , 34 ]. Sims et al. conducted a postal survey on eye protection in the UK and found that ocular injuries were reported in 37.7% of all respondents involving orthodontists, assistants and patients [ 19 ]. Peleg et al. conducted a root-cause analysis of wrong-tooth extraction in 54 insurance claims in Israel and reported that in two thirds of all claims an identification error was the cause of the incorrect tooth extraction [ 4 ]. Also, a cross-sectional study on PSIs in the UK found that orthodontic PSIs accounted for 8.9% of all reported dental PSIs in the country [ 7 ]. Hebballi et al. investigated the frequency and types of AEs associated with dental devices as reported to the Food and Drug Administration and User Facility Device Experience (MAUDE) in the US [ 9 ]. They reported that orthodontic appliances and accessories accounted for 1% of all AEs involving dental devices. In a similar investigation in hospital and private settings in Finland, Hiivala et al. reported that orthodontic PSIs accounted for 3.6% of all dental PSIs [ 8 ]. Finally, a multi-centre retrospective review of orthognathic surgeries assessing complications and risk factors studied a population of 674 patients [ 34 ]. They reported that adverse events were rare (4.3%) with superficial incisional infection being the most common. They also concluded that the setting, the type of surgery as well as the patients’ ethnicity were identified as risk factors for some types of complications.

Understanding causes of harm & identifying solutions

Twenty-one of the included studies identified the underlying causes of AEs that caused patient harm (WHO’s Category 2 “Understanding the causes”) [ 3 , 4 , 7 , 10 , 12 , 13 , 14 , 19 , 20 , 21 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 33 , 34 ]. In addition, twelve studies identified possible solutions that may be effective in reducing the occurrence of AEs (WHO’s Cycle Category 3 “Identifying solutions”) [ 4 , 10 , 12 , 13 , 19 , 20 , 21 , 23 , 24 , 25 , 31 , 32 ]. These solutions included: health and safety instructions for eye-protection goggles to prevent ocular trauma [ 19 ], use of non-latex materials [ 21 ], clear instructions with a brief description of the tooth to be extracted addressed to the clinician using two different identification methods to prevent wrong-site extraction and use of a computerised checklist [ 4 , 31 , 32 ], use of facebows with a locking mechanism and self-releasing head strap to prevent injuries from headgear [ 10 , 20 ], suggestions to improve safety in dental radiography [ 23 ], use of rubber dam during bonding of brackets under general anaesthesia [ 24 ], recommendations to overcome failures and risks during placement, loading and removal of mini-implants [ 12 , 25 ] and, finally, instructions for safe use of soft-tissue lasers in orthodontics recommending that the clinician obtained appropriate training and certification, use of proper eye wear by all involved parties, obtaining informed consent and providing proper post-operative instructions [ 13 ].

None of the included studies provided information on how to evaluate the impact of such solutions or on how to translate evidence into safer care in terms of affordability and acceptability. Data synthesis and meta-analysis was not possible due to the heterogeneity of the different studies and the nature of the data.

Patient safety incidents in orthodontics

To our knowledge, this is the first systematic investigation of patient safety in orthodontics. The lack of evidence in the field manifests in our results. Twenty-five studies were included in this review and these studies were only peripherally related to orthodontics while providing some information based on the WHO’s research cycle. This cycle describes a process to identify solutions for enhancing patient safety and reducing patient harm. It consists of five steps representing the natural process for patient-safety initiatives. It seems that dentistry in general and orthodontics in particular have yet to take even the initial steps of the cycle (steps 1 and 2), which are to measure the harm and understand the causes of harm [ 16 ]. This is evident from the results as the included studies were either reviews of risks associated with specific orthodontic procedures (like mini-implant insertion, soft-tissue laser, facebow use, etc.) or retrospective reviews of AEs peripherally related to orthodontics (incidence of ocular trauma, adverse reactions to materials, etc.).

The results of this review document that current evidence relating to orthodontics is scarce. Without a basic understanding of PSIs and harms we cannot begin to understand the causes and identify solutions that will subsequently translate into safer care for our patients [ 16 ]. A major limitation to this is a trend towards potential under-reporting of PSIs in our field. In fact, a review of the National Patient Safety Agency (NPSA) database in the UK revealed that orthodontics is among the lowest reporting specialties along with dental surgery and paediatric dentistry [ 35 ]. A contributing factor in this may be the lesser severity of some PSIs in orthodontics, which may be smaller injuries like soft-tissue laceration from loose wires [ 16 ]. One way to overcome the underreporting issues may be effective keeping of patient records and clinical notes, which may prove an essential tool in clinical audits and will also underpin the reporting of more AEs [ 36 ]. Also, the lack of standardisation in terminology and reporting process of AEs makes it challenging if not impossible to summarise and categorise all PSIs in orthodontics, let alone analyse the data in depth.

Additionally, we hypothesise that an underreporting bias may exist between dental specialities. Dental implants are more expensive and dentists and/or patients may therefore report them more often when asking for replacements [ 9 ]. This leads, e.g., to many more reported PSIs for implants than for burs. Finally, another contributing factor in the lack of evidence on patient safety is the overlap found in some areas within dentistry. This makes it more challenging to precisely measure AEs in only one field. A clear example of this is the AE of wrong-tooth extraction for orthodontic reasons, which may fall in both the orthodontic and surgical category.

Standardisation and terminology

The lack of a standardised terminology and reporting of PSIs in orthodontics seems to hinder any effort to summarise and categorise PSIs, which could be a reasonable first research step to enhance our knowledge in this field. For future work in this field, we therefore suggest that PSIs related to orthodontics may be summarised into two main categories; local and systemic. Categorisation with subcategories and examples are shown in Table  2 . Terminology according to the WHO is proposed in Table  3 .

Local PSIs refer to any harm on dental tissues (root resorption, white spot lesions, pulp necrosis, caries) and soft tissues. This may be damage to both periodontal and surrounding soft tissues that could have been avoided (gingival recessions, soft tissue lacerations, local allergic reaction/contact dermatitis). In addition, local PSIs include treatment injuries with a negative effect on orofacial function. This may be development of lip catch as a result of orthodontic treatment. Finally, any harm related to any unwanted tooth movement is also included in this category. This may be unwanted tooth movement due to an active retainer.

Systemic PSIs refer to harm at a systemic level. This may be excessive pain and discomfort as a result of the orthodontic treatment due to a defective appliance or even hypersensitivity due to excessive interproximal reduction. In addition, systemic PSIs include potential emotional damage to patients. This may be development of general discomfort/odontophobia/mistrust towards the clinician or the healthcare system or deterioration of the oral health-related quality of life (OHRQoL). Systemic PSIs may be a result of delayed treatment initiation due to delayed/inadequate diagnosis. Finally, harm caused by poor cross-infection control, inhalation of orthodontic parts and extraction of a wrong tooth are also considered systemic PSIs.

Future research agenda

A proposal for a future research agenda in orthodontic patient safety is shown in Table  4 . The agenda is intended as inspiration to promote future research and development in patient safety in orthodontics. It should not be considered absolute as topics other than those listed may be of interest for future patient safety initiatives. Two main categories of studies are presented in Table  4 : Retrospective or prospective studies dealing with patient safety (26).

Retrospective studies are reactive in nature and focus on the incidence, characteristics and severity of PSIs using an acknowledged methodology such as journal file audit and root cause analysis (RCA) (26,27). They investigate PSIs that have already occurred with the intention of generating knowledge to promote learning and guidance for future patient safety initiatives. RCA allows us to focus on individual PSIs and investigate, through a comprehensive analysis, all the contributing factors that lead to the occurrence of an AE.

Conversely, prospective studies assess potential risks associated with a treatment, appliance or material. The methodology in these studies is failure mode and effects analysis (FMEA) (27,28). This approach is the analysis of a method, treatment, material or procedure by first creating a risk map and then implementing measures to reduce the likelihood or impact of a PSI (27–30).

Both intrinsic and extrinsic motivation are key factors in the establishment of safer future orthodontic care. Intrinsic motivation is shaped by professional ethics, norms and patient-reported outcomes and expectations [ 1 , 37 , 38 ]. The articles included in our review, however, mainly focused on the extrinsic motivation, which refers to the environment, policies and strategies that we may develop with the ultimate goal of improving patient safety in orthodontics.

In orthodontic patient safety research, a need exists to increase our focus on this aspect and on clinical routines and administrative, organisational and legal contexts. One strategy that may help us move in this direction is to establish excellent records and clinical notes through periodical audits [ 30 ]. This will help clinicians and/or patients report more AEs in future. Honest exchange of such information between health professionals is a necessary first step and a founding rock for safer care and further research. To achieve this, it is important to establish a non-blame culture with psychological safety and a feeling of partnership, enthusiasm and commitment to improving patient safety in orthodontics [ 36 ].

Research on patient safety is more advanced in other parts of healthcare than orthodontics. Even other fields of dentistry have taken steps in this direction with the creation of checklists, i.e. in endodontics, orofacial function and oral surgery [ 39 , 40 , 41 , 42 , 43 ]. Checklists seem to have a positive effect on patient safety [ 44 , 45 , 46 ]. Most of the checklists are adaptations of the WHO’s surgical checklist that is now used in a wide range of surgical specialties in medicine [ 47 ]. Adjusting this to fit our orthodontic needs and implementing it in daily practice may be an important step towards improving safety in orthodontics [ 48 ]. In the past decade, the WHO has published several guidelines and educational curricula to enhance the level of patient safety in healthcare in general [ 49 , 50 ]. These publications may provide a starting point for the spreading of local patient safety initiatives and the introduction of educational and organisational measures to further patient safety.

Some orthodontic societies seem to have taken steps towards patient safety, however all societies in different countries need to follow and implement policies for safer care. In its core patient safety is the purpose of audit and clinical governance. Amongst other, research is a vital element in this process. Nevertheless, a limitation in this could be that clinical governance might differ from one country to another.

Traditionally, patient safety was focused on rare types of incidents with a significant degree of harm referred to as “never events” in the literature [ 51 ]. However, in recent years, more efforts have been devoted to understanding the frequency and causes of PSIs that we assume occur more frequently than is reported today [ 51 ]. The perceived threshold determining what is considered a PSI may often be vague; and the border is not absolute, particularly as we come to understand patient safety better. It is important to emphasize that common side effects (e.g., root resorption) are not considered PSIs as these side effects may also occur when a patient has undergone an optimally performed course of treatment, unless, of course, these side effects were avoidable and appropriate measures had been adopted [ 52 ]. The extent of such side effects, however, can vary and probably depends on a wide range of factors (force magnitude, treatment duration) [ 53 ]. Excessive root resorption, however, may be considered a PSI if the risk factors were not assessed before initiating treatment and if precautionary measures were not taken in advance. A step towards safer orthodontics may be to incorporate such “risk maps” routinely in systematic reviews. For example, when a systematic review compares A to B, reporting just which of the two is more efficient or faster may be insufficient. The burden and the risk of harm to the patient should also be reported. This reporting may include anything that may be considered a PSI, from excessive root resorption to increased exposure to radiation, cytotoxicity, effect on patients’ OHRQoL, late diagnosis, overtreatment, gingival recessions or bone dehiscence, etc. A cultural change in the way we approach these “side effects” and further patient-centred research will improve patient safety in our field. In addition, in today's rapidly evolving technological landscape, where new advancements outpace research capabilities, emphasizing the safety of orthodontic materials is crucial while treatment decisions need to be patient-centred, based on their perspective [ 54 ].

Strengths and limitations

The strengths of this systematic review include an extensive literature search, a predefined protocol, a priori registration with PROSPERO and the adoption of a strict methodology at all study stages [ 55 ]. Also, the fact that there was no date or language limitation in the search, provided us with data that likely reflect the current understanding and knowledge about PSI in orthodontics. In addition, the proposed categorisation of PSIs in orthodontics and the future-agenda proposals may spark interest and lead to further research in the field of orthodontic patient safety.

Certain limitations need further consideration: mainly the inability to assess precise prevalence of orthodontic PSIs and categorise them accordingly. This inability is due to the poor current evidence and lack of standardisation and terminology and the fact that many PSIs are probably underreported. It can also be due to the fact that patient safety is a topic of increasing complexity, especially with the new risks arising directly from the use of new technologies [ 51 ]. Also, there is inherent risk of bias due to the nature of the studies included which were mostly retrospective [ 56 ]. Furthermore, in this study, the final selection of the included studies was consensus-based instead of individually assessing the suitability of the articles during the review process. Finally, despite thorough searching, there could be studies overlooked during the process, possibly originating from databases not encompassed in the search.

Current evidence on patient safety in orthodontics is scarce due to a lack of standardisation and potential under-reporting of PSIs. The current literature on orthodontic patient safety deals mostly with “measuring harms” and “understanding causes of patient safety”, whereas less attention has been devoted to initiatives “identifying solutions”, “evaluating impact” and “translating evidence into safer care”. The present project presents proposals for a new categorisation, terminology and a future research agenda that may serve as a framework to support future research and clinical initiatives to improve patient safety in orthodontic care.

Availability of data and materials

All data generated or analysed during this study are included in this published article and its supplementary information files.

Kohn LT, Corrigan JM, Donaldson MS. To Err Is Human. Regul Toxicol Pharmacol. 2000;52:1–287.

Google Scholar  

World Health Organization (WHO). Conceptual Framework for the International Classification for Patient Safety Final Technical Report. International Classification [Internet]. 2009;(January):3–101. Available from: http://www.who.int/patientsafety/taxonomy/ICPS_Statement_of_Purpose.pdf

Scott A, Egner W, Gawkrodger DJ, Hatton P V., Sherriff M, Van Noort R, et al. The national survey of adverse reactions to dental materials in the UK: A preliminary study by the UK Adverse Reactions Reporting Project. Vol. 196, British Dental Journal. Nature Publishing Group; 2004. p. 471–7.

Peleg O, Givot DMDN, Halamish-shani T, Taicher S. Wrong tooth extraction: root cause analysis. Br Dent J. 2011;210(4):163–163.

Article   Google Scholar  

Yamalik N, Perea PB. Patient safety and dentistry: What do we need to know? Fundamentals of patient safety, the safety culture and implementation of patient safety measures in dental practice. Int Dent J. 2012;62(4):189–96.

Article   PubMed   Google Scholar  

Dehghanian D, Heydarpoor P, Attaran N, Khoshnevisan M. Clinical governance in general dental practice. Journal of International Oral Health. 2019;11(3):107–11.

Thusu S, Panesar S, Bedi R. Patient safety in dentistry - State of play as revealed by a national database of errors. Br Dent J. 2012;213(3):E3.

Article   CAS   PubMed   Google Scholar  

Hiivala N, Mussalo-Rauhamaa H, Tefke HL, Murtomaa H. An analysis of dental patient safety incidents in a patient complaint and healthcare supervisory database in Finland. Acta Odontol Scand [Internet]. 2016 Feb 17;74(2):81–9. Available from: https://www.tandfonline.com/doi/abs/ https://doi.org/10.3109/00016357.2015.1042040

Hebballi NB, Ramoni R, Kalenderian E, Delattre VF, Stewart DCL, Kent K, et al. The dangers of dental devices as reported in the food and drug administration manufacturer and user facility device experience database. J Am Dent Assoc. 2015;146(2):102–10.

Article   PubMed   PubMed Central   Google Scholar  

Samuels RHA, Brezniak N. Orthodontic facebows: Safety issues and current management. J Orthod. 2002;29(2):101–7.

Bailey E, Tickle M, Campbell S, O’Malley L. Systematic review of patient safety interventions in dentistry. BMC Oral Health. 2015;15(1):152.

Kravitz ND, Kusnoto B. Risks and complications of orthodontic miniscrews. Am J Orthod Dentofac Orthop. 2007;131(4):S43-51.

Kravitz ND, Kusnoto B. Soft-tissue lasers in orthodontics: An overview. Am J Orthod Dentofacial Orthop. 2008;133(4 SUPPL):S110-4.

Johal A, Abed Al Jawad F, Marcenes W, Croft N. Does orthodontic treatment harm children’s diets? J Dent. 2013;41(11):949–54.

World Health Organization. WHO patient safety research : better knowledge for safer care. 2009;12 p.

Tokede O, Walji M, Ramoni R, Rindal D, Worley D, Hebballi N, et al. Quantifying Dental Office–Originating Adverse Events: The Dental Practice Study Methods. J Patient Saf. 2017;Publish Ah(00):1–8.

Vaid N. Scoping studies: Should there be more in orthodontic literature? APOS Trends in Orthodontics. 2019;9(3):124–5.

Rak D. X-ray examinations in orthodontic diagnostics as a source of ionizing radiation. Bilten Udruzenja ortodonata Jugoslavije. Bulletin Orthod Soc Yugosl. 1989;22:37–48.

CAS   Google Scholar  

Sims AP, Roberts-Harry TJ, Roberts-Harry DP. The incidence and prevention of ocular injuries in orthodontic practice. Br J Orthod. 1993;20(4):339–43.

Samuels R, O’Neill J, Bhavra G, Hills D, Thomas P, Hug H, et al. A clinical evaluation of a locking orthodontic facebow. American J Orthod Dentofacial Orthop. 2000;117(3):344–50.

Article   CAS   Google Scholar  

Raggio DP, Camargo LB, Naspitz GMCC, Bonifacio CC, Politano GT, Mendes FM, et al. Latex allergy in dentistry: Clinical cases report. J Clin Exp Dent. 2010;2(1):e55–9.

Bationo R, Jordana F, Boileau MJ, Colat-Parros J. Release of monomers from orthodontic adhesives. Am J Orthod Dentofac Orthop. 2016;150(3):491–8.

Abbott P. Are dental radiographs safe? Aust Dent J. 2000;45(3):208–13.

Chaushu S, Zeltser R, Becker A. Safe orthodontic bonding for children with disabilities during general anaesthesia. Eur J Orthod. 2000;22(3):225–8.

Suzuki M, Deguchi T, Watanabe H, Seiryu M, Iikubo M, Sasano T, et al. Evaluation of optimal length and insertion torque for miniscrews. Am J Orthod Dentofac Orthop. 2013;144(2):251–9.

Kuroda S, Tanaka E. Risks and complications of miniscrew anchorage in clinical orthodontics. Japan Dent Sci Rev. 2014;50:79–85.

McCusker N, Lee SM, Robinson S, Patel N, Sandy JR, Ireland AJ. Light curing in orthodontics; Should we be concerned? Dent Mater. 2013;29(6):e85-90.

Görgülü S, Ayyildiz S, Kamburoǧlu K, Gökçe S, Ozen T. Effect of orthodontic brackets and different wires on radiofrequency heating and magnetic field interactions during 3-T MRI. Dentomaxillofacial Radiol. 2014;43(2):20130356.

Mouhat M, Mercer J, Stangvaltaite L, Örtengren U. Light-curing units used in dentistry: factors associated with heat development—potential risk for patients. Clin Oral Investig. 2017;21(5):1687–96.

Anwar H, Waring D. Improving patient safety through a clinical audit spiral: prevention of wrong tooth extraction in orthodontics. Br Dent J. 2017;223(1):48–52.

Cullingham P, Saksena A, Pemberton MN. Patient safety: Reducing the risk of wrong tooth extraction. Br Dent J. 2017;222(10):759–63.

Jacob O, Gough E, Thomas H. Preventing wrong tooth extraction. Acta Stomatol Croat. 2021;55(3):316–24.

Jerrold L, Danoff-Rudick J. Never events in clinical orthodontic practice. Am J Orthod Dentofac Orthop. 2022;161(4):480–9.

Knoedler S, Baecher H, Hoch CC, Obed D, Matar DY, Rendenbach C, et al. Early Outcomes and Risk Factors in Orthognathic Surgery for Mandibular and Maxillary Hypo- and Hyperplasia: A 13-Year Analysis of a Multi-Institutional Database. J Clin Med. 2023;12(4):1444.

Bagley CHM, Panesar SS, Patel B, Cleary K, Pickles J. Safer cut: Revelations of surgical harm through a national database [Internet]. Vol. 71, British Journal of Hospital Medicine. MA Healthcare London; 2010. p. 484–5. Available from: https://www.magonlinelibrary.com/doi/10.12968/hmed.2010.71.9.78155

Yamalik N. Quality systems in dentistry Part 2. Quality assurance and improvement (QA/I) tools that have implications for dentistry. Int Dent J [Internet]. 2007;57(6):459–67. Available from: https://pubmed.ncbi.nlm.nih.gov/18265780/

Hua F. Dental patient-reported outcomes update 2022. J Evid Based Dent Pract. Mosby. 2023;23:1–6.

Tao Z, Zhao T, Ngan P, Qin D, Hua F, He H. The use of dental patient-reported outcomes among randomized controlled trials in orthodontics: a methodological study. J Evid Based Dent Pract. 2023;23(1): 101795.

Díaz-Flores-García V, Perea-Pérez B, Labajo-González E, Santiago-Sáez A, Cisneros-Cabello R. Proposal of a “Checklist” for endodontic treatment. J Clin Exp Dent. 2014;6(2):104–9.

Wright S, Ucer TC, Crofts G. The adaption and implementation of the WHO surgical safety checklist for dental procedures. Br Dent J. 2018;225(8):727–9.

Nenad MW, Halupa C, Spolarich AE, Gurenlian JAR. A Dental Radiography Checklist as a Tool for Quality Improvement. J Dent Hyg. 2016;90(6):386–93.

PubMed   Google Scholar  

Beddis HP, Davies SJ, Budenberg A, Horner K, Pemberton MN. Temporomandibular disorders, trismus and malignancy: Development of a checklist to improve patient safety. Br Dent J. 2014;217(7):351–5.

Schmitt CM, Buchbender M, Musazada S, Bergauer B, Neukam FW. Evaluation of Staff Satisfaction After Implementation of a Surgical Safety Checklist in the Ambulatory of an Oral and Maxillofacial Surgery Department and its Impact on Patient Safety. J Oral Maxillofac Surg. 2018;76(8):1616–39.

Wilson L, Walker L. The WHO surgical safety checklist: The evidence. J Perioper Pract [Internet]. 2009;19(10):362–4. Available from: https://journals.sagepub.com/doi/epdf/10.1177/175045890901901002

Weiser TG, Haynes AB, Dziekan G, Berry WR, Lipsitz SR, Gawande AA. Effect of A 19-item surgical safety checklist during urgent operations in a global patient population. Ann Surg. 2010;251(5):976–80.

Vats A, Vincent CA, Nagpal K, Davies RW, Darzi A, Moorthy K. Practical challenges of introducing WHO surgical checklist: UK pilot experience. BMJ (Online). 2010;340(7738):133–5.

World Health Organization. Tool and Resources [Internet]. WHO Surgical Safety Checklist. 2009. Available from: https://www.who.int/teams/integrated-health-services/patient-safety/research/safe-surgery/tool-and-resources

Clark S, Hamilton L. WHO surgical checklist: Needs to be customised by specialty. Vol. 340, BMJ (Online). British Medical Journal Publishing Group; 2010. p. 280.

World Health Organization (WHO). Global Patient Safety Action Plan 2021–2030 [Internet]. Vol. 53, World Health Organization. 2020. 1689–1699 p. Available from: https://www.who.int/teams/integrated-health-services/patient-safety/policy/global-patient-safety-action-plan

World Health Organization (WHO). Patient Safety Research course.2022; Available from: https://www.who.int/teams/integrated-health-services/patient-safety/guidance/patient-safety-research-course

Vincent C, Amalberti R. Safer Healthcare: Strategies for the Real World. Cham: Springer; 2016. p. 1–157.

Stoustrup P, Ferlias N. Patientskader i forbindelse med ortodonti Tandlaegebladet. 2022;126:812–22.

Yassir YA, McIntyre GT, Bearn DR. Orthodontic treatment and root resorption: An overview of systematic reviews. Eur J Orthod. 2021;43(4):442–56.

Alansari R, Vaiid N. Why do patients transition between orthodontic appliances? A qualitative analysis of patient decision-making. Orthod Craniofac Res. 2023;00:1–8.

Higgins, Julian PT and Green S. Cochrane Handbook for Systematic Reviews of Interventions | Cochrane Training. Vol. 2, Handbook. 2011. p. 649.

OCEBM Table of Evidence Working Group = Jeremy Howick, Iain Chalmers (James Lind Library), Paul Glasziou, Trish Greenhalgh, Carl Heneghan, Alessandro Liberati, Ivan Moschetti, Bob Phillips, Hazel Thornton OG and MH. OCEBM Levels of Evidence — Centre for Evidence-Based Medicine (CEBM), University of Oxford [Internet]. 2011. p. 1. Available from: https://www.cebm.ox.ac.uk/resources/levels-of-evidence/ocebm-levels-of-evidence

Download references

Author information

Authors and affiliations.

Section of Orthodontics, Department of Dentistry and Oral Health, Aarhus University, Aarhus, Denmark

Nikolaos Ferlias & Peter Stoustrup

Department of Neurosciences, Reproductive Sciences and Oral Sciences, Section of Orthodontics and Temporomandibular Disorders, University of Naples Federico II, Naples, Italy

Ambrosina Michelotti

Private Practice, Brighton, UK

Nikolaos Ferlias

You can also search for this author in PubMed   Google Scholar

Contributions

NF: Conceptualization, Search strategy, Data synthesis, interpretation and analysis, Investigation, Methodology, Validation, Writing original draft, Writing review & editing. AM: Investigation, Methodology, Data interpretation and analysis, Supervision, Validation, Writing review & editing. PS: Conceptualization, Search strategy, Data synthesis, interpretation and analysis, Investigation, Methodology, Project administration, Supervision, Validation, Writing original draft, Writing review & editing.

Corresponding author

Correspondence to Nikolaos Ferlias .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., supplementary material 2., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Ferlias, N., Michelotti, A. & Stoustrup, P. Patient safety in orthodontic care: a scoping literature review with proposal for terminology and future research agenda. BMC Oral Health 24 , 702 (2024). https://doi.org/10.1186/s12903-024-04375-7

Download citation

Received : 11 March 2024

Accepted : 14 May 2024

Published : 18 June 2024

DOI : https://doi.org/10.1186/s12903-024-04375-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Systematic review
  • Patient safety
  • Orthodontics
  • Patient safety incidents
  • Patient harm
  • Adverse events

BMC Oral Health

ISSN: 1472-6831

literature review data extraction

  • Open access
  • Published: 15 June 2015

Automating data extraction in systematic reviews: a systematic review

  • Siddhartha R. Jonnalagadda 1 ,
  • Pawan Goyal 2 &
  • Mark D. Huffman 3  

Systematic Reviews volume  4 , Article number:  78 ( 2015 ) Cite this article

42k Accesses

122 Citations

37 Altmetric

Metrics details

Automation of the parts of systematic review process, specifically the data extraction step, may be an important strategy to reduce the time necessary to complete a systematic review. However, the state of the science of automatically extracting data elements from full texts has not been well described. This paper performs a systematic review of published and unpublished methods to automate data extraction for systematic reviews.

We systematically searched PubMed, IEEEXplore, and ACM Digital Library to identify potentially relevant articles. We included reports that met the following criteria: 1) methods or results section described what entities were or need to be extracted, and 2) at least one entity was automatically extracted with evaluation results that were presented for that entity. We also reviewed the citations from included reports.

Out of a total of 1190 unique citations that met our search criteria, we found 26 published reports describing automatic extraction of at least one of more than 52 potential data elements used in systematic reviews. For 25 (48 %) of the data elements used in systematic reviews, there were attempts from various researchers to extract information automatically from the publication text. Out of these, 14 (27 %) data elements were completely extracted, but the highest number of data elements extracted automatically by a single study was 7. Most of the data elements were extracted with F-scores (a mean of sensitivity and positive predictive value) of over 70 %.

Conclusions

We found no unified information extraction framework tailored to the systematic review process, and published reports focused on a limited (1–7) number of data elements. Biomedical natural language processing techniques have not been fully utilized to fully or even partially automate the data extraction step of systematic reviews.

Peer Review reports

Systematic reviews identify, assess, synthesize, and interpret published and unpublished evidence, which improves decision-making for clinicians, patients, policymakers, and other stakeholders [ 1 ]. Systematic reviews also identify research gaps to develop new research ideas. The steps to conduct a systematic review [ 1 – 3 ] are:

Define the review question and develop criteria for including studies

Search for studies addressing the review question

Select studies that meet criteria for inclusion in the review

Extract data from included studies

Assess the risk of bias in the included studies, by appraising them critically

Where appropriate, analyze the included data by undertaking meta-analyses

Address reporting biases

Despite their widely acknowledged usefulness [ 4 ], the process of systematic review, specifically the data extraction step (step 4), can be time-consuming. In fact, it typically takes 2.5–6.5 years for a primary study publication to be included and published in a new systematic review [ 5 ]. Further, within 2 years of the publication of systematic reviews, 23 % are out of date because they have not incorporated new evidence that might change the systematic review’s primary results [ 6 ].

Natural language processing (NLP), including text mining, involves information extraction, which is the discovery by computer of new, previously unfound information by automatically extracting information from different written resources [ 7 ]. Information extraction primarily constitutes concept extraction, also known as named entity recognition, and relation extraction, also known as association extraction. NLP handles written text at level of documents, words, grammar, meaning, and context. NLP techniques have been used to automate extraction of genomic and clinical information from biomedical literature. Similarly, automation of the data extraction step of the systematic review process through NLP may be one strategy to reduce the time necessary to complete and update a systematic review. The data extraction step is one of the most time-consuming steps of a systematic review. Automating or even semi-automating this step could substantially decrease the time taken to complete systematic reviews and thus decrease the time lag for research evidence to be translated into clinical practice. Despite these potential gains from NLP, the state of the science of automating data extraction has not been well described.

To date, there is limited knowledge and methods on how to automate the data extraction phase of the systematic reviews, despite being one of the most time-consuming steps. To address this gap in knowledge, we sought to perform a systematic review of methods to automate the data extraction component of the systematic review process.

Our methodology was based on the Standards for Systematic Reviews set by the Institute of Medicine [ 8 ]. We conducted our study procedures as detailed below with input from the Cochrane Heart Group US Satellite.

Eligibility criteria

We included a report that met the following criteria: 1) the methods or results section describes what entities were or needed to be extracted, and 2) at least one entity was automatically extracted with evaluation results that were presented for that entity.

We excluded a report that met any of the following criteria: 1) the methods were not applied to the data extraction step of a systematic review; 2) the report was an editorial, commentary, or other non-original research report; or 3) there was no evaluation component.

Information sources and searches

For collecting the initial set of articles for our review, we developed search strategies with the help of the Cochrane Heart Group US Satellite, which includes systematic reviewers and a medical librarian. We refined these strategies using relevant citations from related papers. We searched three datasets: PubMed, IEEExplore, and ACM digital library, and our searches were limited between January 1, 2000 and January 6, 2015 (see Appendix 1 ). We restricted our search to these dates because biomedical information extraction algorithms prior to 2000 are unlikely to be accurate enough to be used for systematic reviews.

We retrieved articles that dealt with the extraction of various data elements, defined as categories of data that pertained to any information about or deriving from a study, including details of methods, participants, setting, context, interventions, outcomes, results, publications, and investigators [ 1 ] from included study reports. After we retrieved the initial set of reports from the search results, we then evaluated reports included in the references of these reports. We also sought expert opinion for additional relevant citations.

Study selection

We first de-duplicated the retrieve citations. For calibration and refinement of the inclusion and exclusion criteria, 100 citations were randomly selected and independently reviewed by a two authors (SRJ and PG). Disagreements were resolved by consensus with a third author (MH). In a second round, another set of 100 randomly selected abstracts was independently reviewed by two study authors (SRJ and PG), whereby we achieved a strong level of agreement (kappa = 0.97). Given the high level of agreement, the remaining studies were reviewed only by one author (PG). In this phase, we identified reports as “not relevant” or “potentially relevant”.

Two authors (PG and SRJ) independently reviewed the full text of all citations ( N  = 74) that were identified as “potentially relevant”. We classified included reports into various categories based on the particular data element that they attempted to extract from the original, scientific articles. Example of these data elements might be overall evidence, specific interventions, among others (Table  1 ). We resolved disagreements between the two reviewers through consensus with a third author (MDH).

Data collection process

Two authors (PG and SRJ) independently reviewed the included articles to extract data, such as the particular entity automatically extracted by the study, algorithm or technique used, and evaluation results into a data abstraction spreadsheet. We resolved disagreements through consensus with a third author (MDH).

We reviewed the Cochrane Handbook for Systematic Reviews [ 1 ], the CONsolidated Standards Of Reporting Trials (CONSORT) [ 9 ] statement, the Standards for Reporting of Diagnostic Accuracy (STARD) initiative [ 10 ], and PICO [ 11 ], PECODR [ 12 ], and PIBOSO [ 13 ] frameworks to obtain the data elements to be considered. PICO stands for Population, Intervention, Comparison, Outcomes; PECODR stands for Patient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results; and PIBOSO stands for Population, Intervention, Background, Outcome, Study Design, Other.

Data synthesis and analysis

Because of the large variation in study methods and measurements, a meta-analysis of methodological features and contextual factors associated with the frequency of data extraction methods was not possible. We therefore present a narrative synthesis of our findings. We did not thoroughly assess risk of bias, including reporting bias, for these reports because the study designs did not match domains evaluated in commonly used instruments such as the Cochrane Risk of Bias tool [ 1 ] or QUADAS-2 instrument used for systematic reviews of randomized trials and diagnostic test accuracy studies, respectively [ 14 ].

Of 1190 unique citations retrieved, we selected 75 reports for full-text screening, and we included 26 articles that met our inclusion criteria (Fig.  1 ). Agreement on abstract and full-text screening was 0.97 and 1.00.

Process of screening the articles to be included for this systematic review

Study characteristics

Table  1 provides a list of items to be considered in the data extraction process based on the Cochrane Handbook (Appendix 2 ) [ 1 ], CONSORT statement [ 9 ], STARD initiative [ 10 ], and PICO [ 11 ], PECODR [ 12 ], and PIBOSO [ 13 ] frameworks. We provide the major group for each field and report which standard focused on that field. Finally, we report whether there was a published method to extract that field. Table  1 also identifies the data elements relevant to systematic review process categorized by their domain and the standard from which the element was adopted and was associated with existing automation methods, where present.

Results of individual studies

Table  2 summarizes the existing information extraction studies. For each study, the table provides the citation to the study (study: column 1), data elements that the study focused on (extracted elements: column 2), dataset used by the study (dataset: column 3), algorithm and methods used for extraction (method: column 4), whether the study extracted only the sentence containing the data elements, full concept or neither of these (sentence/concept/neither: column 5), whether the extraction was done from full-text or abstracts (full text/abstract: column 6) and the main accuracy results reported by the system (results: column 7). The studies are arranged by increasing complexity by ordering studies that classified sentences before those that extracted the concepts and ordering studies that extracted data from abstracts before those that extracted data from full-text reports.

The accuracy of most ( N  = 18, 69 %) studies was measured using a standard text mining metric known as F-score, which is the harmonic mean of precision (positive predictive value) and recall (sensitivity). Some studies ( N  = 5, 19 %) reported only the precision of their method, while some reported the accuracy values ( N  = 2, 8 %). One study (4 %) reported P5 precision, which indicates the fraction of positive predictions among the top 5 results returned by the system.

Studies that did not implement a data extraction system

Dawes et al. [ 12 ] identified 20 evidence-based medicine journal synopses with 759 extracts in the corresponding PubMed abstracts. Annotators agreed with the identification of an element 85 and 87 % for the evidence-based medicine synopses and PubMed abstracts, respectively. After consensus among the annotators, agreement rose to 97 and 98 %, respectively. The authors proposed various lexical patterns and developed rules to discover each PECODR element from the PubMed abstracts and the corresponding evidence-based medicine journal synopses that might make it possible to partially or fully automate the data extraction process.

Studies that identified sentences but did not extract data elements from abstracts only

Kim et al. [ 13 ] used conditional random fields (CRF) [ 15 ] for the task of classifying sentences in one of the PICO categories. The features were based on lexical, syntactic, structural, and sequential information in the data. The authors found that unigrams, section headings, and sequential information from preceding sentences were useful features for the classification task. They used 1000 medical abstracts from PIBOSO corpus and achieved micro-averaged F-scores of 91 and 67 % over datasets of structured and unstructured abstracts, respectively.

Boudin et al. [ 16 ] utilized a combination of multiple supervised classification techniques for detecting PICO elements in the medical abstracts. They utilized features such as MeSH semantic types, word overlap with title, number of punctuation marks on random forests (RF), naive Bayes (NB), support vector machines (SVM), and multi-layer perceptron (MLP) classifiers. Using 26,000 abstracts from PubMed, the authors took the first sentence in the structured abstracts and assigned a label automatically to build a large training data. They obtained an F-score of 86 % for identifying participants (P), 67 % for interventions (I) and controls (C), and 56 % for outcomes (O).

Huang et al. [ 17 ] used a naive Bayes classifier for the PICO classification task. The training data were generated automatically from the structured abstracts. For instance, all sentences in the section of the structured abstract that started with the term “PATIENT” were used to identify participants (P). In this way, the authors could generate a dataset of 23,472 sentences. Using 23,472 sentences from the structured abstracts, they obtained an F-score of 91 % for identifying participants (P), 75 % for interventions (I), and 88 % for outcomes (O).

Verbeke et al. [ 18 ] used a statistical relational learning-based approach (kLog) that utilized relational features for classifying sentences. The authors also used the PIBOSO corpus for evaluation and achieved micro-averaged F-score of 84 % on structured abstracts and 67 % on unstructured abstracts, which was a better performance than Kim et al. [ 13 ].

Huang et al. [ 19 ] used 19,854 structured extracts and trained two classifiers: one by taking the first sentences of each section (termed CF by the authors) and the other by taking all the sentences in each section (termed CA by the authors). The authors used the naive Bayes classifier and achieved F-scores of 74, 66, and 73 % for identifying participants (P), interventions (I), and outcomes (O), respectively, by the CF classifier. The CA classifier gave F-scores of 73, 73, and 74 % for identifying participants (P), interventions (I), and outcomes (O), respectively.

Hassanzadeh et al. [ 20 ] used the PIBOSO corpus for the identification of sentences with PIBOSO elements. Using conditional random fields (CRF) with discriminative set of features, they achieved micro-averaged F-score of 91 %.

Robinson [ 21 ] used four machine learning models, 1) support vector machines, 2) naive Bayes, 3) naive Bayes multinomial, and 4) logistic regression to identify medical abstracts that contained patient-oriented evidence or not. These data included morbidity, mortality, symptom severity, and health-related quality of life. On a dataset of 1356 PubMed abstracts, the authors achieved the highest accuracy using a support vector machines learning model and achieved an F-measure of 86 %.

Chung [ 22 ] utilized a full sentence parser to identify the descriptions of the assignment of treatment arms in clinical trials. The authors used predicate-argument structure along with other linguistic features with a maximum entropy classifier. They utilized 203 abstracts from randomized trials for training and 124 abstracts for testing and achieved an F-score of 76 %.

Hara and Matsumoto [ 23 ] dealt with the problem of extracting “patient population” and “compared treatments” from medical abstracts. Given a sentence from the abstract, the authors first performed base noun-phrase chunking and then categorized the base noun-phrase into one of the five classes: “disease”, “treatment”, “patient”, “study”, and “others” using support vector machine and conditional random field models. After categorization, the authors used regular expression to extract the target words for patient population and comparison. The authors used 200 abstracts including terms such as “neoplasms” and “clinical trial, phase III” and obtained 91 % accuracy for the task of noun phrase classification. For sentence classification, the authors obtained a precision of 80 % for patient population and 82 % for comparisons.

Studies that identified only sentences but did not extract data elements from full-text reports

Zhao et al. [ 24 ] used two classification tasks to extract study data including patient details, including one at the sentence level and another at the keyword level. The authors first used a five-class scheme including 1) patient, 2) result, 3) intervention, 4) study design, and 5) research goal and tried to classify sentences into one of these five classes. They further used six classes for keywords such as sex (e.g., male, female), age (e.g., 54-year-old), race (e.g., Chinese), condition (e.g., asthma), intervention, and study design (e.g., randomized trial). They utilized conditional random fields for the classification task. Using 19,893 medical abstracts and full-text articles from 17 journal websites, they achieved F-scores of 75 % for identifying patients, 61 % for intervention, 91 % for results, 79 % for study design, and 76 % for research goal.

Hsu et al. [ 25 ] attempted to classify whether a sentence contains the “hypothesis”, “statistical method”, “outcomes”, or “generalizability” of the study and then extracted the values. Using 42 full-text papers, the authors obtained F-scores of 86 % for identifying hypothesis, 84 % for statistical method, 90 % for outcomes, and 59 % for generalizability.

Song et al. [ 26 ] used machine learning-based classifiers such as maximum entropy classifier (MaxEnt), support vector machines (SVM), multi-layer perceptron (MLP), naive Bayes (NB), and radial basis function network (RBFN) to classify the sentences into categories such as analysis (statistical facts found by clinical experiment), general (generally accepted scientific facts, process, and methodology), recommendation (recommendations about interventions), and rule (guidelines). They utilized the principle of information gain (IG) as well as genetic algorithm (GA) for feature selection. They used 346 sentences from the clinical guideline document and obtained an F-score of 98 % for classifying sentences.

Marshall et al. [ 27 ] used soft-margin support vector machines in a joint model for risk of bias assessment along with supporting sentences for random sequence generation, allocation concealment, blinding of participants and personnel, and blinding of outcome assessment, among others. They utilized presence of unigrams in the supporting sentences as features in their model. Working with full text of 2200 clinical trials, the joint model achieved F-scores of 56, 48, 35, and 38 % for identifying sentences corresponding to random sequence generation, allocation concealment, blinding of participants and personnel, and blinding of outcome assessment, respectively.

Studies that identified data elements only from abstracts but not from full texts

Demner-Fushman and Lin [ 28 ] used a rule-based approach to identify sentences containing PICO. Using 275 manually annotated abstracts, the authors achieved an accuracy of 80 % for population extraction and 86 % for problem extraction. They also utilized a supervised classifier for outcome extraction and achieved accuracy from 64 to 95 % across various experiments.

Kelly and Yang [ 29 ] used regular expressions and gazetteer to extract the number of participants, participant age, gender, ethnicity, and study characteristics. The authors utilized 386 abstracts from PubMed obtained with the query “soy and cancer” and achieved F-scores of 96 % for identifying the number of participants, 100 % for age of participants, 100 % for gender of participants, 95 % for ethnicity of participants, 91 % for duration of study, and 87 % for health status of participants.

Hansen et al. [ 30 ] used support vector machines [ 31 ] to extract number of trial participants from abstracts of the randomized control trials. The authors utilized features such as part-of-speech tag of the previous and next words and whether the sentence is grammatically complete (contained a verb). Using 233 abstracts from PubMed, they achieved an F-score of 86 % for identifying participants.

Xu et al. [ 32 ] utilized text classifications augmented with hidden Markov models [ 33 ] to identify sentences about subject demographics. These sentences were then parsed to extract information regarding participant descriptors (e.g., men, healthy, elderly), number of trial participants, disease/symptom name, and disease/symptom descriptors. After testing over 250 RCT abstracts, the authors obtained an accuracy of 83 % for participant descriptors: 83 %, 93 % for number of trial participants, 51 % for diseases/symptoms, and 92 % for descriptors of diseases/symptoms.

Summerscales et al. [ 34 ] used a conditional random field-based approach to identify various named entities such as treatments (drug names or complex phrases) and outcomes. The authors extracted 100 abstracts of randomized trials from the BMJ and achieved F-scores of 49 % for identifying treatment, 82 % for groups, and 54 % for outcomes.

Summerscales et al. [ 35 ] also proposed a method for automatic summarization of results from the clinical trials. The authors first identified the sentences that contained at least one integer (group size, outcome numbers, etc.). They then used the conditional random field classifier to find the entity mentions corresponding to treatment groups or outcomes. The treatment groups, outcomes, etc. were then treated as various “events.” To identify all the relevant information for these events, the authors utilized templates with slots. The slots were then filled using a maximum entropy classifier. They utilized 263 abstracts from the BMJ and achieved F-scores of 76 % for identifying groups, 42 % for outcomes, 80 % for group sizes, and 71 % for outcome numbers.

Studies that identified data elements from full-text reports

Kiritchenko et al. [ 36 ] developed ExaCT, a tool that assists users with locating and extracting key trial characteristics such as eligibility criteria, sample size, drug dosage, and primary outcomes from full-text journal articles. The authors utilized a text classifier in the first stage to recover the relevant sentences. In the next stage, they utilized extraction rules to find the correct solutions. The authors evaluated their system using 50 full-text articles describing randomized trials with 1050 test instances and achieved a P5 precision of 88 % for identifying the classifier. Precision and recall of their extraction rules was found to be 93 and 91 %, respectively.

Restificar et al. [ 37 ] utilized latent Dirichlet allocation [ 38 ] to infer the latent topics in the sample documents and then used logistic regression to compute the probability that a given candidate criterion belongs to a particular topic. Using 44,203 full-text reports of randomized trials, the authors achieved accuracies of 75 and 70 % for inclusion and exclusion criteria, respectively.

Lin et al. [ 39 ] used linear-chain conditional random field for extracting various metadata elements such as number of patients, age group of the patients, geographical area, intervention, and time duration of the study. Using 93 full-text articles, the authors achieved a threefold cross validation precision of 43 % for identifying number of patients, 63 % for age group, 44 % for geographical area, 40 % for intervention, and 83 % for time period.

De Bruijn et al. [ 40 ] used support vector machine classifier to first identify sentences describing information elements such as eligibility criteria, sample size, etc. The authors then used manually crafted weak extraction rules to extract various information elements. Testing this two-stage architecture on 88 randomized trial reports, they obtained a precision of 69 % for identifying eligibility criteria, 62 % for sample size, 94 % for treatment duration, 67 % for intervention, 100 % for primary outcome estimates, and 67 % for secondary outcomes.

Zhu et al. [ 41 ] also used manually crafted rules to extract various subject demographics such as disease, age, gender, and ethnicity. The authors tested their method on 50 articles and for disease extraction obtained an F-score of 64 and 85 % for exactly matched and partially matched cases, respectively.

Risk of bias across studies

In general, many studies have a high risk of selection bias because the gold standards used in the respective studies were not randomly selected. The risk of performance bias is also likely to be high because the investigators were not blinded. For the systems that used rule-based approaches, it was unclear whether the gold standard was used to train the rules or if there were a separate training set. The risk of attrition bias is unclear based on the study design of these non-randomized studies evaluating the performance of NLP methods. Lastly, the risk of reporting bias is unclear because of the lack of protocols in the development, implementation, and evaluation of NLP methods.

Summary of evidence

Extracting the data elements.

Participants — Sixteen studies explored the extraction of the number of participants [ 12 , 13 , 16 – 20 , 23 , 24 , 28 – 30 , 32 , 39 ], their age [ 24 , 29 , 39 , 41 ], sex [ 24 , 39 ], ethnicity [ 41 ], country [ 24 , 39 ], comorbidities [ 21 ], spectrum of presenting symptoms, current treatments, and recruiting centers [ 21 , 24 , 28 , 29 , 32 , 41 ], and date of study [ 39 ]. Among them, only six studies [ 28 – 30 , 32 , 39 , 41 ] extracted data elements as opposed to highlighting the sentence containing the data element. Unfortunately, each of these studies used a different corpus of reports, which makes direct comparisons impossible. For example, Kelly and Yang [ 29 ] achieved high F-scores of 100 % for age of participants, 91 % for duration of study, 95 % for ethnicity of participants, 100 % for gender of subjects, 87 % for health status of participants, and 96 % for number of participants on a dataset of 386 abstracts.

Intervention — Thirteen studies explored the extraction of interventions [ 12 , 13 , 16 – 20 , 22 , 24 , 28 , 34 , 39 , 40 ], intervention groups [ 34 , 35 ], and intervention details (for replication if feasible) [ 36 ]. Of these, only six studies [ 28 , 34 – 36 , 39 , 40 ] extracted intervention elements. Unfortunately again, each of these studies used a different corpus. For example, Kiritchenko et al. [ 36 ] achieved an F-score of 75–86 % for intervention data elements on a dataset of 50 full-text journal articles.

Outcomes and comparisons — Fourteen studies also explored the extraction of outcomes and time points of collection and reporting [ 12 , 13 , 16 – 20 , 24 , 25 , 28 , 34 – 36 , 40 ] and extraction of comparisons [ 12 , 16 , 22 , 23 ]. Of these, only six studies [ 28 , 34 – 36 , 40 ] extracted the actual data elements. For example, De Bruijn et al. [ 40 ] obtained an F-score of 100 % for extracting primary outcome and 67 % for secondary outcome from 88 full-text articles. Summerscales [ 35 ] utilized 263 abstracts from the BMJ and achieved an F-score of 42 % for extracting outcomes.

Results — Two studies [ 36 , 40 ] extracted sample size data element from full text on two different data sets. De Bruijn et al. [ 40 ] obtained an accuracy of 67 %, and Kiritchenko et al. [ 36 ] achieved an F-score of 88 %.

Interpretation — Three studies explored extraction of overall evidence [ 26 , 42 ] and external validity of trial findings [ 25 ]. However, all these studies only highlighted sentences containing the data elements relevant to interpretation.

Objectives — Two studies [ 24 , 25 ] explored the extraction of research questions and hypotheses. However, both these studies only highlighted sentences containing the data elements relevant to interpretation.

Methods — Twelve studies explored the extraction of the study design [ 13 , 18 , 20 , 24 ], study duration [ 12 , 29 , 40 ], randomization method [ 25 ], participant flow [ 36 , 37 , 40 ], and risk of bias assessment [ 27 ]. Of these, only four studies [ 29 , 36 , 37 , 40 ] extracted the corresponding data elements from text using different sets of corpora. For example, Restificar et al. [ 37 ] utilized 44,203 full-text clinical trial articles and achieved accuracies of 75 and 70 % for inclusion and exclusion criteria, respectively.

Miscellaneous — One study [ 26 ] explored extraction of key conclusion sentence and achieved a high F-score of 98 %.

Related reviews and studies

Previous reviews on the automation of systematic review processes describe technologies for automating the overall process or other steps. Tsafnat et al. [ 43 ] surveyed the informatics systems that automate some of the tasks of systematic review and report systems for each stage of systematic review. Here, we focus on data extraction. None of the existing reviews [ 43 – 47 ] focus on the data extraction step. For example, Tsafnat et al. [ 43 ] presented a review of techniques to automate various aspects of systematic reviews, and while data extraction has been described as a task in their review, they only highlighted three studies as an acknowledgement of the ongoing work. In comparison, we identified 26 studies and critically examined their contribution in relation to all the data elements that need to be extracted to fully support the data extraction step.

Thomas et al. [ 44 ] described the application of text mining technologies such as automatic term recognition, document clustering, classification, and summarization to support the identification of relevant studies in systematic reviews. The authors also pointed out the potential of these technologies to assist at various stages of the systematic review. Slaughter et al. [ 45 ] discussed necessary next steps towards developing “living systematic reviews” rather than a static publication, where the systematic reviews can be continuously updated with the latest knowledge available. The authors mentioned the need for development of new tools for reporting on and searching for structured data from clinical trials.

Tsafnat et al. [ 46 ] described four main tasks in systematic review: identifying the relevant studies, evaluating risk of bias in selected trials, synthesis of the evidence, and publishing the systematic reviews by generating human-readable text from trial reports. They mentioned text extraction algorithms for evaluating risk of bias and evidence synthesis but remain limited to one particular method for extraction of PICO elements.

Most natural language processing research has focused on reducing the workload for the screening step of systematic reviews (Step 3). Wallace et al. [ 48 , 49 ] and Miwa et al. [ 50 ] proposed an active learning framework to reduce the workload in citation screening for inclusion in the systematic reviews. Jonnalagadda et al. [ 51 ] designed a distributional semantics-based relevance feedback model to semi-automatically screen citations. Cohen et al. [ 52 ] proposed a module for grouping studies that are closely related and an automated system to rank publications according to the likelihood for meeting the inclusion criteria of a systematic review. Choong et al. [ 53 ] proposed an automated method for automatic citation snowballing to recursively pursue relevant literature for helping in evidence retrieval for systematic reviews. Cohen et al. [ 54 ] constructed a voting perceptron-based automated citation classification system to classify each article as to whether it contains high-quality, drug-specific evidence. Adeva et al. [ 55 ] also proposed a classification system for screening articles for systematic review. Shemilt et al. [ 56 ] also discussed the use of text mining to reduce screening workload in systematic reviews.

Research implications

No standard gold standards or dataset.

Among the 26 studies included in this systematic review, only three of them use a common corpus, namely 1000 medical abstracts from the PIBOSO corpus. Unfortunately, even that corpus facilitates only classification of sentences into whether they contain one of the data elements corresponding to the PIBOSO categories. No two other studies shared the same gold standard or dataset for evaluation. This limitation made it impossible for us to compare and assess the relative significance of the reported accuracy measures.

Separate systems for each data element

Few data elements, which are also relatively straightforward to extract automatically, such as the total number of participants (14 overall and 5 for extracting the actual data elements), have a relatively higher number of studies aiming towards extracting the same data element. This is not the case with other data elements. There are 27 out of 52 potential data elements that have not been explored for automated extraction, even if for highlighting the sentences containing them; seven more data elements were explored just by one study. There are 38 out of 52 potential data elements (>70 %) that have not been explored for automated extraction of the actual data elements; three more data elements were explored just by one study. The highest number of data elements extracted by a single study is only seven (14 %). This finding means that not only are more studies needed to explore the remaining 70 % data elements, but that there is an urgent need for a unified framework or system to extract all necessary data elements. The current state of informatics research for data extraction is exploratory, and multiple studies need to be conducted using the same gold standard and on the extraction of the same data elements for effective comparison.

Limitations

Our study has limitations. First, there is a possibility that data extraction algorithms were not published in journals or that our search might have missed them. We sought to minimize this limitation by searching in multiple bibliographic databases, including PubMed, IEEExplore, and ACM Digital Library. However, investigators may have also failed to publish algorithms that had lower F-scores than were previously reported, which we would not have captured. Second, we did not publish a protocol a priori, and our initial findings may have influenced our methods. However, we performed key steps, including screening, full-text review, and data extraction in duplicate to minimize potential bias in our systematic review.

Future work

“On demand” access to summarized evidence and best practices has been considered a sound strategy to satisfy clinicians’ information needs and enhance decision-making [ 57 – 65 ]. A systematic review of 26 studies concluded that information-retrieval technology produces positive impact on physicians in terms of decision enhancement, learning, recall, reassurance, and confirmation [ 62 ]. Slaughter et al. [ 45 ] discussed necessary next steps towards developing “living systematic reviews” rather than a static publication, where the systematic reviews can be continuously updated with the latest knowledge available. The authors mention the need for development of new tools for reporting on and searching for structured data from published literature. Automated information extraction framework that extract data elements have the potential to assist the systematic reviewers and to eventually automate the screening and data extraction steps.

Medical science is currently witnessing a rapid pace at which medical knowledge is being created—75 clinical trials a day [ 66 ]. Evidence-based medicine [ 67 ] requires clinicians to keep up with published scientific studies and use them at the point of care. However, it has been shown that it is practically impossible to do that even within a narrow specialty [ 68 ]. A critical barrier is that finding relevant information, which may be located in several documents, takes an amount of time and cognitive effort that is incompatible with the busy clinical workflow [ 69 , 70 ]. Rapid systematic reviews using automation technologies will enable clinicians with up-to-date and systematic summaries of the latest evidence.

Our systematic review describes previously reported methods to identify sentences containing some of the data elements for systematic reviews and only a few studies that have reported methods to extract these data elements. However, most of the data elements that would need to be considered for systematic reviews have been insufficiently explored to date, which identifies a major scope for future work. We hope that these automated extraction approaches might first act as checks for manual data extraction currently performed in duplicate; then serve to validate manual data extraction done by a single reviewer; then become the primary source for data element extraction that would be validated by a human; and eventually completely automate data extraction to enable living systematic reviews.

Abbreviations

natural language processing

CONsolidated Standards Of Reporting Trials

Standards for Reporting of Diagnostic Accuracy

Population, Intervention, Comparison, Outcomes

Patient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results

Population, Intervention, Background, Outcome, Study Design, Other

conditional random fields

naive Bayes

randomized control trial

British Medical Journal

Higgins J, Green S. Cochrane handbook for systematic reviews of interventions version 5.1. 0 [updated March 2011]. The Cochrane Collaboration. 2011. Available at [ http://community.cochrane.org/handbook ]

Khan KS, Ter Riet G, Glanville J, Sowden AJ, Kleijnen J. Undertaking systematic reviews of research on effectiveness: CRD’s guidance for carrying out or commissioning reviews, NHS Centre for Reviews and Dissemination. 2001.

Google Scholar  

Woolf SH. Manual for conducting systematic reviews, Agency for Health Care Policy and Research. 1996.

Field MJ, Lohr KN. Clinical practice guidelines: directions for a new program, Clinical Practice Guidelines. 1990.

Elliott J, Turner T, Clavisi O, Thomas J, Higgins J, Mavergames C, et al. Living systematic reviews: an emerging opportunity to narrow the evidence-practice gap. PLoS Med. 2014;11:e1001603.

Article   PubMed   PubMed Central   Google Scholar  

Shojania KG, Sampson M, Ansari MT, Ji J, Doucette S, Moher D. How quickly do systematic reviews go out of date? A survival analysis. Ann Intern Med. 2007;147(4):224–33.

Article   PubMed   Google Scholar  

Hearst MA. Untangling text data mining. Proceedings of the 37th annual meeting of the Association for Computational Linguistics. College Park, Maryland: Association for Computational Linguistics; 1999. p. 3–10.

Morton S, Levit L, Berg A, Eden J. Finding what works in health care: standards for systematic reviews. Washington D.C.: National Academies Press; 2011. Available at [ http://www.nap.edu/catalog/13059/finding-what-works-in-health-care-standards-for-systematic-reviews ]

Begg C, Cho M, Eastwood S, Horton R, Moher D, Olkin I, et al. Improving the quality of reporting of randomized controlled trials: the CONSORT statement. JAMA. 1996;276(8):637–9.

Article   CAS   PubMed   Google Scholar  

Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Clin Chem Lab Med. 2003;41(1):68–73. doi: 10.1515/CCLM.2003.012 .

Richardson WS, Wilson MC, Nishikawa J, Hayward RS. The well-built clinical question: a key to evidence-based decisions. ACP J Club. 1995;123(3):A12–3.

CAS   PubMed   Google Scholar  

Dawes M, Pluye P, Shea L, Grad R, Greenberg A, Nie J-Y. The identification of clinically important elements within medical journal abstracts: Patient–Population–Problem, Exposure–Intervention, Comparison, Outcome, Duration and Results (PECODR). Inform Prim Care. 2007;15(1):9–16.

PubMed   Google Scholar  

Kim S, Martinez D, Cavedon L, Yencken L. Automatic classification of sentences to support evidence based medicine. BMC Bioinform. 2011;12 Suppl 2:S5.

Article   Google Scholar  

Whiting P, Rutjes AWS, Reitsma JB, Bossuyt PMM, Kleijnen J. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3(1):25.

Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data, Proceedings of the Eighteenth International Conference on Machine Learning. 2001. p. 282–9. %L 3140.

Boudin F, Nie JY, Bartlett JC, Grad R, Pluye P, Dawes M. Combining classifiers for robust PICO element detection. BMC Med Inform Decis Mak. 2010;10:29. doi: 10.1186/1472-6947-10-29 .

Huang K-C, Liu C-H, Yang S-S, Liao C-C, Xiao F, Wong J-M, et al, editors. Classification of PICO elements by text features systematically extracted from PubMed abstracts. Granular Computing (GrC), 2011 IEEE International Conference on; 2011: IEEE.

Verbeke M, Van Asch V, Morante R, Frasconi P, Daelemans W, De Raedt L, editors. A statistical relational learning approach to identifying evidence based medicine categories. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning; 2012: Association for Computational Linguistics.

Huang K-C, Chiang IJ, Xiao F, Liao C-C, Liu CC-H, Wong J-M. PICO element detection in medical text without metadata: are first sentences enough? J Biomed Inform. 2013;46(5):940–6.

Hassanzadeh H, Groza T, Hunter J. Identifying scientific artefacts in biomedical literature: the evidence based medicine use case. J Biomed Inform. 2014;49:159–70.

Robinson DA. Finding patient-oriented evidence in PubMed abstracts. Athens: University of Georgia; 2012.

Chung GY-C. Towards identifying intervention arms in randomized controlled trials: extracting coordinating constructions. J Biomed Inform. 2009;42(5):790–800.

Hara K, Matsumoto Y. Extracting clinical trial design information from MEDLINE abstracts. N Gener Comput. 2007;25(3):263–75.

Zhao J, Bysani P, Kan MY. Exploiting classification correlations for the extraction of evidence-based practice information. AMIA Annu Symp Proc. 2012;2012:1070–8.

PubMed   PubMed Central   Google Scholar  

Hsu W, Speier W, Taira R. Automated extraction of reported statistical analyses: towards a logical representation of clinical trial literature. AMIA Annu Symp Proc. 2012;2012:350–9.

Song MH, Lee YH, Kang UG. Comparison of machine learning algorithms for classification of the sentences in three clinical practice guidelines. Healthcare Informatics Res. 2013;19(1):16–24.

Marshall IJ, Kuiper J, Wallace BC, editors. Automating risk of bias assessment for clinical trials. Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics; 2014: ACM.

Demner-Fushman D, Lin J. Answering clinical questions with knowledge-based and statistical techniques. Comput Linguist. 2007;33(1):63–103.

Kelly C, Yang H. A system for extracting study design parameters from nutritional genomics abstracts. J Integr Bioinform. 2013;10(2):222. doi: 10.2390/biecoll-jib-2013-222 .

Hansen MJ, Rasmussen NO, Chung G. A method of extracting the number of trial participants from abstracts describing randomized controlled trials. J Telemed Telecare. 2008;14(7):354–8. doi: 10.1258/jtt.2008.007007 .

Joachims T. Text categorization with support vector machines: learning with many relevant features, Machine Learning: ECML-98, Tenth European Conference on Machine Learning. 1998. p. 137–42.

Xu R, Garten Y, Supekar KS, Das AK, Altman RB, Garber AM. Extracting subject demographic information from abstracts of randomized clinical trial reports. 2007.

Eddy SR. Hidden Markov models. Curr Opin Struct Biol. 1996;6(3):361–5.

Summerscales RL, Argamon S, Hupert J, Schwartz A. Identifying treatments, groups, and outcomes in medical abstracts. The Sixth Midwest Computational Linguistics Colloquium (MCLC 2009). 2009.

Summerscales R, Argamon S, Bai S, Huperff J, Schwartzff A. Automatic summarization of results from clinical trials, the 2011 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2011. p. 372–7.

Kiritchenko S, de Bruijn B, Carini S, Martin J, Sim I. ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC Med Inform Decis Mak. 2010;10:56.

Restificar A, Ananiadou S. Inferring appropriate eligibility criteria in clinical trial protocols without labeled data, Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics. 2012. ACM.

Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003;3(4–5):993–1022.

Lin S, Ng J-P, Pradhan S, Shah J, Pietrobon R, Kan M-Y, editors. Extracting formulaic and free text clinical research articles metadata using conditional random fields. Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents; 2010: Association for Computational Linguistics.

De Bruijn B, Carini S, Kiritchenko S, Martin J, Sim I, editors. Automated information extraction of key trial design elements from clinical trial publications. AMIA Annual Symposium Proceedings; 2008: American Medical Informatics Association.

Zhu H, Ni Y, Cai P, Qiu Z, Cao F. Automatic extracting of patient-related attributes: disease, age, gender and race. Stud Health Technol Inform. 2011;180:589–93.

Davis-Desmond P, Mollá D, editors. Detection of evidence in clinical research papers. Proceedings of the Fifth Australasian Workshop on Health Informatics and Knowledge Management-Volume 129; 2012: Australian Computer Society, Inc.

Tsafnat G, Glasziou P, Choong M, Dunn A, Galgani F, Coiera E. Systematic review automation technologies. Syst Rev. 2014;3(1):74.

Thomas J, McNaught J, Ananiadou S. Applications of text mining within systematic reviews. Res Synthesis Methods. 2011;2(1):1–14.

Slaughter L, Berntsen CF, Brandt L, Mavergames C. Enabling living systematic reviews and clinical guidelines through semantic technologies. D-Lib Magazine. 2015;21(1/2). Available at [ http://www.dlib.org/dlib/january15/slaughter/01slaughter.html ]

Tsafnat G, Dunn A, Glasziou P, Coiera E. The automation of systematic reviews. BMJ. 2013;346:f139.

O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev. 2015;4(1):5.

Wallace BC, Trikalinos TA, Lau J, Brodley C, Schmid CH. Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinformatics. 2010;11(1):55.

Wallace BC, Small K, Brodley CE, Trikalinos TA, editors. Active learning for biomedical citation screening. Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining; 2010: ACM.

Miwa M, Thomas J, O’Mara-Eves A, Ananiadou S. Reducing systematic review workload through certainty-based screening. J Biomed Inform. 2014;51:242–53.

Jonnalagadda S, Petitti D. A new iterative method to reduce workload in systematic review process. Int J Comput Biol Drug Des. 2013;6(1–2):5–17. doi: 10.1504/IJCBDD.2013.052198 .

Cohen A, Adams C, Davis J, Yu C, Yu P, Meng W, et al. Evidence-based medicine, the essential role of systematic reviews, and the need for automated text mining tools. Proceedings of the 1st ACM International Health Informatics Symposium. 2010:376–80.

Choong MK, Galgani F, Dunn AG, Tsafnat G. Automatic evidence retrieval for systematic reviews. J Med Inter Res. 2014;16(10):e223.

Cohen AM, Hersh WR, Peterson K, Yen P-Y. Reducing workload in systematic review preparation using automated citation classification. J Am Med Inform Assoc. 2006;13(2):206–19.

Article   CAS   PubMed   PubMed Central   Google Scholar  

García Adeva JJ, Pikatza Atxa JM, Ubeda Carrillo M, Ansuategi ZE. Automatic text classification to support systematic reviews in medicine. Expert Syst Appl. 2014;41(4):1498–508.

Shemilt I, Simon A, Hollands GJ, Marteau TM, Ogilvie D, O’Mara‐Eves A, et al. Pinpointing needles in giant haystacks: use of text mining to reduce impractical screening workload in extremely large scoping reviews. Res Synthesis Methods. 2014;5(1):31–49.

Cullen RJ. In search of evidence: family practitioners’ use of the Internet for clinical information. J Med Libr Assoc. 2002;90(4):370–9.

Hersh WR, Hickam DH. How well do physicians use electronic information retrieval systems? A framework for investigation and systematic review. JAMA. 1998;280(15):1347–52.

Lucas BP, Evans AT, Reilly BM, Khodakov YV, Perumal K, Rohr LG, et al. The impact of evidence on physicians’ inpatient treatment decisions. J Gen Intern Med. 2004;19(5 Pt 1):402–9. doi: 10.1111/j.1525-1497.2004.30306.x .

Magrabi F, Coiera EW, Westbrook JI, Gosling AS, Vickland V. General practitioners’ use of online evidence during consultations. Int J Med Inform. 2005;74(1):1–12. doi: 10.1016/j.ijmedinf.2004.10.003 .

McColl A, Smith H, White P, Field J. General practitioner’s perceptions of the route to evidence based medicine: a questionnaire survey. BMJ. 1998;316(7128):361–5.

Pluye P, Grad RM, Dunikowski LG, Stephenson R. Impact of clinical information-retrieval technology on physicians: a literature review of quantitative, qualitative and mixed methods studies. Int J Med Inform. 2005;74(9):745–68. doi: 10.1016/j.ijmedinf.2005.05.004 .

Rothschild JM, Lee TH, Bae T, Bates DW. Clinician use of a palmtop drug reference guide. J Am Med Inform Assoc. 2002;9(3):223–9.

Rousseau N, McColl E, Newton J, Grimshaw J, Eccles M. Practice based, longitudinal, qualitative interview study of computerised evidence based guidelines in primary care. BMJ. 2003;326(7384):314.

Westbrook JI, Coiera EW, Gosling AS. Do online information retrieval systems help experienced clinicians answer clinical questions? J Am Med Inform Assoc. 2005;12(3):315–21. doi: 10.1197/jamia.M1717 .

Bastian H, Glasziou P, Chalmers I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med. 2010;7(9):e1000326. doi: 10.1371/journal.pmed.1000326 .

Lau J. Evidence-based medicine and meta-analysis: getting more out of the literature. In: Greenes RA, editor. Clinical decision support: the road ahead. 2007. p. 249.

Fraser AG, Dunstan FD. On the impossibility of being expert. BMJ (Clinical Res). 2010;341:c6815.

Ely JW, Osheroff JA, Chambliss ML, Ebell MH, Rosenbaum ME. Answering physicians’ clinical questions: obstacles and potential solutions. J Am Med Inform Assoc. 2005;12(2):217–24. doi: 10.1197/jamia.M1608 .

Ely JW, Osheroff JA, Maviglia SM, Rosenbaum ME. Patient-care questions that physicians are unable to answer. J Am Med Inform Assoc. 2007;14(4):407–14. doi: 10.1197/jamia.M2398 .

Download references

Author information

Authors and affiliations.

Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, 750 North Lake Shore Drive, 11th Floor, Chicago, IL, 60611, USA

Siddhartha R. Jonnalagadda

Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, 721302, West Bengal, India

Pawan Goyal

Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, USA

Mark D. Huffman

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Siddhartha R. Jonnalagadda .

Additional information

Competing interests.

The authors declare that they have no competing interests.

Authors’ contributions

SRJ and PG had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design were done by SRJ. SRJ, PG, and MDH did the acquisition, analysis, or interpretation of data. SRJ and PG drafted the manuscript. SRJ, PG, and MDH did the critical revision of the manuscript for important intellectual content. SRJ obtained funding. PG and SRJ provided administrative, technical, or material support. SRJ did the study supervision. All authors read and approved the final manuscript.

Funding/Support

This project was partly supported by the National Library of Medicine (grant 5R00LM011389). The Cochrane Heart Group US Satellite at Northwestern University is supported by an intramural grant from the Northwestern University Feinberg School of Medicine.

Role of the sponsors

The funding source had no role in the design and conduct of the study; collection, management, analysis, or interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Library of Medicine.

Additional contributions

Mark Berendsen (Research Librarian, Galter Health Sciences Library, Northwestern University Feinberg School of Medicine) provided insights on the design of this study, including the search strategies, and Dr. Kalpana Raja (Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine) reviewed the manuscript. None of them received compensation for their contributions.

Search strategies

Below, we provide the search strategies used in PubMed, ACM Digital Library, and IEEExplore. The search was conducted on January 6, 2015.

(“identification” [Title] OR “extraction” [Title] OR “extracting” [Title] OR “detection” [Title] OR “identifying” [Title] OR “summarization” [Title] OR “learning approach” [Title] OR “automatically” [Title] OR “summarization” [Title] OR “identify sections” [Title] OR “learning algorithms” [Title] OR “Interpreting” [Title] OR “Inferring” [Title] OR “Finding” [Title] OR “classification” [Title]) AND (“medical evidence”[Title] OR “PICO”[Title] OR “PECODR” [Title] OR “intervention arms” [Title] OR “experimental methods” [Title] OR “study design parameters” [Title] OR “Patient oriented Evidence” [Title] OR “eligibility criteria” [Title] OR “clinical trial characteristics” [Title] OR “evidence based medicine” [Title] OR “clinically important elements” [Title] OR “evidence based practice” [Title] “results from clinical trials” [Title] OR “statistical analyses” [Title] OR “research results” [Title] OR “clinical evidence” [Title] OR “Meta Analysis” [Title] OR “Clinical Research” [Title] OR “medical abstracts” [Title] OR “clinical trial literature” [Title] OR ”clinical trial characteristics” [Title] OR “clinical trial protocols” [Title] OR “clinical practice guidelines” [Title]).

We performed this search only in the metadata.

(“identification” OR “extraction” OR “extracting” OR “detection” OR “Identifying” OR “summarization” OR “learning approach” OR “automatically” OR “summarization” OR “identify sections” OR “learning algorithms” OR “Interpreting” OR “Inferring” OR “Finding” OR “classification”) AND (“medical evidence” OR “PICO” OR “intervention arms” OR “experimental methods” OR “eligibility criteria” OR “clinical trial characteristics” OR “evidence based medicine” OR “clinically important elements” OR “results from clinical trials” OR “statistical analyses” OR “clinical evidence” OR “Meta Analysis” OR “clinical research” OR “medical abstracts” OR “clinical trial literature” OR “clinical trial protocols”).

ACM digital library

((Title: “identification” or Title: “extraction” or Title: “extracting” or Title: “detection” or Title: “Identifying” or Title: “summarization” or Title: “learning approach” or Title: “automatically” or Title: “summarization “or Title: “identify sections” or Title: “learning algorithms” or Title: “scientific artefacts” or Title: “Interpreting” or Title: “Inferring” or Title: “Finding” or Title: “classification” or “statistical techniques”) and (Title: “medical evidence” or Abstract: “medical evidence” or Title: “PICO” or Abstract: “PICO” or Title: “intervention arms” or Title: “experimental methods” or Title: “study design parameters” or Title: “Patient oriented Evidence” or Abstract: “Patient oriented Evidence” or Title: “eligibility criteria” or Abstract: “eligibility criteria” or Title: “clinical trial characteristics” or Abstract: “clinical trial characteristics” or Title: “evidence based medicine” or Abstract: “evidence based medicine” or Title: “clinically important elements” or Title: “evidence based practice” or Title: “treatments” or Title: “groups” or Title: “outcomes” or Title: “results from clinical trials” or Title: “statistical analyses” or Abstract: “statistical analyses” or Title: “research results” or Title: “clinical evidence” or Abstract: “clinical evidence” or Title: “Meta Analysis” or Abstract:“Meta Analysis” or Title:“Clinical Research” or Title: “medical abstracts” or Title: “clinical trial literature” or Title: “Clinical Practice” or Title: “clinical trial protocols” or Abstract: “clinical trial protocols” or Title: “clinical questions” or Title: “clinical trial design”)).

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Jonnalagadda, S.R., Goyal, P. & Huffman, M.D. Automating data extraction in systematic reviews: a systematic review. Syst Rev 4 , 78 (2015). https://doi.org/10.1186/s13643-015-0066-7

Download citation

Received : 20 March 2015

Accepted : 21 May 2015

Published : 15 June 2015

DOI : https://doi.org/10.1186/s13643-015-0066-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Support Vector Machine
  • Data Element
  • Conditional Random Field
  • PubMed Abstract
  • Systematic Review Process

Systematic Reviews

ISSN: 2046-4053

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

literature review data extraction

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

ijerph-logo

Article Menu

literature review data extraction

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

The impact of nonpharmacological interventions on opioid use for chronic noncancer pain: a scoping review.

literature review data extraction

1. Introduction

2.1. inclusion and exclusion criteria, 2.2. search strategy, 2.3. data screening, 2.4. data extraction, 2.5. data analysis, 3.1. characteristics of included studies.

First Author,
Year
DesignNNPI TypeDurationPain and Opioid Use MeasuresPain Intensity and Opioid Use Results
Garcia,
2021
RCT179Device56 daysPain: Defense and Veterans Pain Rating Scale
Opioid use: Self-reported and converted to morphine milligram equivalent (MME)
Pain: Pain intensity reduced by an average of 42.8% for the virtual reality (EaseVRx) group and 25% for the sham virtual reality group.
Opioid use: Did not reach statistical significance for either group.
Jensen,
2020
RCT173Hypnosis4 sessionsPain: Numeric Rating Scale
Opioid use: Self-reported and converted to MME
Pain: No statistically significant between-group differences on omnibus test for pain intensity. On average, pain intensity reduced between pre- vs. post-treatment for all groups.
Opioid use: No changes in opioid use were found.
Zheng,
2019
RCT108Acupuncture12 wksPain: Visual Analogue Scale
Opioid use: Self-reported and converted to MME
Pain: No group differences were found in pain intensity. No changes in pain intensity were found over time.
Opioid use: Opioid use reduced by 20.5% (p < 0.05) and 13.7% (p < 0.01) in the two acupuncture groups and by 4.5% in the education group post-treatment, but without any group differences. For follow-up, the education group had a 47% decrease in opioid use after a course of electroacupuncture.
Garland,
2022
RCT250Mindfulness8 wksPain: Brief Pain Inventory
Opioid use: Urine toxicologic screening, Self-reported and converted to MME
Pain: MORE showed greater reductions in pain severity (between-group effect: 0.49; 95% CI, 0.17–0.81; p = 0.003) than the control group.
Opioid use: MORE reduced the opioid use more than the control group (between-group effect: 0.15 log mg; 95% CI, 0.03–0.27 log mg; p = 0.009). At 9-month follow-up, 22 of 62 participants (35.5%) in MORE group reduced opioid use by at least 50%, compared to 11 of 69 participants (15.9%) in the control group (p = 0.009). At 9-months, 36 of 80 participants (45.0%) in MORE were no longer misusing opioids compared with 19 of 78 participants (24.4%) in the control group.
Hudak,
2021
RCT62
*
Mindfulness8 wksPain: NA
Opioid use: Self-reported and converted to MME
Pain: NA
Opioid use: Participants in MORE showed greater reduction in opioid use over time than the control group.
Wilson,
2023
RCT402Educational
Program
8 wksPain: Brief Pain Inventory
Opioid use: Opioid prescription information was collected from the participants medical record and converted to MME
Pain: 24 (14.5%) of 166 E-Health participants achieved a >2 point decrease in pain intensity compared to 13 (6.8%) of 192 TAU participants (odds ratio, 2.4 [95% CL, 1.2–4.9]; p = 0.02).
Opioid use: 105 (53.6%) of 196 E-Health participants achieved a >15% reduction in opioid use compared with 85 (42.3%) of 201 TAU participants (odds ratio, 1.6 [95% CL, 1.1–2.3]; p = 0.02).
Garland,
2024
RCT230
*
Mindfulness8 wksPain: Brief Pain Inventory
Opioid use: Urine drug screens, opioid prescription information was collected from the participants medical record and converted to MME
Pain: MORE showed significantly greater reduction in pain outcomes than the control group (p = 0.025).
Opioid use: MORE reduced opioid dose significantly compared to control group (B = 0.65, 95% CI = 0.07–1.23, p = 0.029); 20.7% reduction in mean opioid use (18.88 mg, SD = 8.40 mg) for MORE compared to 3.9% reduction (3.19 mg, SD = 4.38 mg) for control group. MORE showed significantly greater reduction in opioid dose than control group (p = 0.025).
DeBar,
2022
RCT850CBT12 wksPain: Pain Intensity and Interference with Enjoyment of Life, General Activity, and Sleep
Opioid use: Self-reported and converted to MME per 90-day period
Pain: CBT had larger reductions in pain outcomes at 12-month follow-up compared to usual care (difference, −0.434 point [95% CI, 0.690 to −0.178 point]) and post-treatment (difference, −0.565 point [CI, −0.796 to −0.333 point]).
Opioid use: No differences were seen in opioid use at post-treatment (difference, −2.260 points [CI, −5.509 to 0.989 points]) or at 12-month follow-up (difference, −1.969 points [CI, −6.765 to 2.827 points]).
Gardiner,
2019
RCT159Combined21 wksPain: Brief Pain Inventory
Opioid use: Self-reported
Pain: No differences in pain outcomes at any time point.
Opioid use: At 21 weeks, the IMGV group reported greater reduction in pain medications use (Odds Ratio: 0.42, CI: 0.18–0.98) compared to controls.
Wartko,
2023
RCT153CBT18 sessions/
1 year
Pain: Pain, Enjoyment of life, and General activity
Opioid use: Self-reported and converted to MME
Pain: No significant differences between intervention and usual care for pain outcomes were found (0.0 [95% CI: −0.5, 0.5], p = 0.985).
Opioid use: No significant differences between intervention and usual care for opioid use were found (adjusted mean difference: −2.3 MME; 95% CI: −10.6, 5.9; p = 0.578).
Groessl,
2017
RCT150
*
Yoga12 wksPain: Brief Pain Inventory
Opioid use: Self-report and verified using medical records.
Pain: Differences observed at all three time points (p = 0.001 for 6 weeks, 0.005 for 12 weeks, 0.013 for 6 months), with larger reductions in pain intensity for yoga participants.
Opioid use: Significant reduction from 20% to 11% at 12 weeks (p = 0.007) and 8% after 6 months (p < 0.001).
Roseen,
2022
RCT120
*
Yoga12 wksPain: Defense and Veterans Pain Rating Scale
Opioid use: Self-reported
Pain: No significant in-between differences were observed for pain.
Opioid use: No significant in-between differences were observed for opioid use. Post-treatment, fewer yoga than education participants reported pain medication use (55% vs. 67%, OR = 0.56, 95% CI: 0.26–1.24, p = 0.15).
Sandhu,
2023
RCT608Educational
Program
3 days and 12 months maintenancePain: Patient-Reported Outcomes Measurement Information System
Opioid use: Self-reported, with a participant report verified in a telephone call from a member of the study team and converted to MME
Pain: No significant between-group differences in pain intensity.
Opioid use: At 12 months, 65 of 225 participants (29%) achieved opioid cessation in the intervention group and 15 of 208 participants (7%) achieved opioid cessation in the usual care group (odds ratio, 5.55 [95% CI, 2.80 to 10.99].
Does,
2024
RCT376Educational
Program
4 sessionsPain: Patient-Reported Outcomes Measurement Information System
Opioid use: Pharmacy dispensation data from the medical record and converted to MME for the 6-month period.
Pain: No significant between-group differences in pain intensity.
Opioid use: A small but not significant decrease in opioid use was found in both groups over the study period. At 12 months, intervention group demonstrated greater medication use (OR = 2.72; 95% CI 1.61–4.58).
Naylor,
2010
RCT51Digital
Technology
4 monthsPain: Short form of the McGill Pain Questionnaire, the Pain Symptoms Subscale from the Treatment Outcomes in Pain Survey
Opioid use: Self-reported
Pain: TIVR showed significant improvement at 8-month follow-up for pain scores (p < 0.0001), compared to the control group.
Opioid use: Opioid use reduced in the TIVR group in both follow-ups: 4- and 8-months post CBT. At 8-month follow-up, 21% of the TIVR participants stopped using opioids. There was significant between group differences in opioid use at 8-month follow-up (p = 0.004).
Nielssen,
2019
RCT50Educational
Program
8 wksPain: Roland–Morris Disability Questionnaire, Wisconsin Brief Pain Questionnaire
Opioid use: Self-reported and converted to MME
Pain: Significantly larger reduction in pain outcomes with the intervention compared to the control group.
Opioid use: Significant reduction in opioid use compared to control group.
Day,
2019
RCT69Combined8 wksPain: Numeric Rating Scale
Opioid use: Self-reported opioid use in the past week
Pain: Post-treatment, the intent-to-treat group showed significant improvements for pain intensity (p < 0.001), with no significant between group differences.
Opioid use: For the intent-to-treat group, there was no significant difference (p = 0.549) in opioid use between pre-treatment (48%) and post-treatment (43%). Opioid use decreased significantly (p = 0.012) from pre-treatment (49%) to 3-month follow-up (28%), but opioid use at post-treatment (40%) and 6-month follow-up (33%) were not significantly reduced (p = 0.289) than at pre-treatment.
Spangeus,
2023
RCT21Educational
Program
10 wksPain: Numeric Pain Scale
Opioid use: Self-reported opioid use
Pain: Significant improvements post-treatment on pain outcomes were found.
Opioid use: Significant reduction in opioid use (25%) at baseline and (14%) at post-treatment were found.
Nelli,
2023
OB45Device2 wksPain: Numeric Scale
Opioid use: Self-reported and converted to MME
Pain: The reduction in pain scores was 67%, 50%, and 45% for the green, blue, and clear glasses groups (p = 0.56). No significant differences in pain score reduction between groups was found.
Opioid use: Greater than 10% reduction in opioid use was achieved and found 33%, 11%, and 8% of the green, blue, and clear eyeglasses groups (p = 0.23).
Moffat,
2023
OB13,968
*
Combined22 monthsPain: NA
Opioid use: Identified using the Australian Pharmaceutical Benefits Scheme item number and converted to MME
Pain: NA. Opioid use: Calculated change in predicted trends with and without the intervention 25,387 (95% CI 24,676, 26,131).
Zeliadt,
2022
OB4869
*
Combined18 months Pain: NA
Opioid use: Extracted from VA’s pharmacy managerial cost accounting national data extract and converted to MME.
Pain: NA. Opioid use: Opioid use decreased by −12% in one year among veterans who began CIH compared to similar veterans who used conventional care; −4.4% among veterans who used only Whole Health services compared to conventional care, and −8.5% among veterans who used both CIH combined with Whole Health services compared to conventional care.
Huffman, 2019OB1681Combined4 wksPain: Numeric Rating Scale
Opioid use: Self-reported
Pain: Pain on discharge, and at 6 months and 12 months was significantly lower compared to on admission (p < 0.05).
Opioid use: There were significantly fewer patients using opioids p < 0.05) post-treatment. At 6-month follow-up, 76.3% maintained opioid cessation, 14.6% resumed opioid use, 5.8% remained continued to use opioids, and 3.4% discontinued opioid use. At 12-month follow-up, 14.6% maintained opioid cessation, 5.8% resumed opioids, 3.4% continued to use opioids, and 76.3% discontinued opioid use.
Townsend, 2008OB373Combined3 wksPain: Multidimensional Pain Inventory
Opioid use: Verified using medical records and converted to MME
Pain: Significant improvement was found in pain outcomes post-
treatment (p < 0.001) and six months post-treatment (p < 0.001).
Opioid use: At discharge, 176 (92.6%) of the opioid group had completed the taper of opioids (x = 20.57; df = 1, p < 0.001).
Ward,
2022
OB237
*
Combined10 wksPain: Pain Numeric Scale
Opioid use: Number of days with prescription opioids determined from VA pharmacy data
Pain: No significant improvement to pain scores noted.
Opioid use: No significant differences in percentage of opioid use found one year pre-post treatment for both EVP engaged and not engaged participants.
Van Der Merwe,
2021
OB164
*
Combined10 daysPain: Brief Pain Inventory
Opioid use: Self-reported
Pain: Significant improvement with treatment (p < 0.001).
Opioid use: Approximately, 25% ceased opioid use and 17% had reduced opioid use post-treatment.
Hooten,
2007
OB159Combined3 wksPain: Multidimensional Pain Inventory
Opioid use: Medical chart review
Pain: Significant improvement with program treatment (p < 0.001).
Opioid use: Compared with admission, opioid use at post-treatment was significantly reduced (p < 0.001).
Davis,
2018
OB156Acupuncture12 sessions/60 days Pain: Patient-Reported Outcomes Measurement Information System
Opioid use: Self-reported
Pain: Significant improvements in pain intensity (p < 0.01).
Opioid use: Approximately 32% of patients using opioids reported reductions in use post-intervention.
Schumann, 2020OB134Combined3 wksPain: West Haven Yale Multidisciplinary Pain Inventory
Opioid use: Self-reported and converted to MME
Pain: Significant treatment effects (p < 0.001) with large effect sizes were observed.
Opioid use: Significant reductions (p < 0.01) in opioids were found post-treatment. All participants in the opioid group completed the opioid taper and discontinued use.
Gibson,
2020
OB99
*
Combined3 monthsPain: Brief Pain Inventory
Opioid use: Self-reported
Pain: No significant change in pain severity (p = 0.11, ES = 0.16).
Opioid use: At baseline, 77 participants were prescribed opioids, 6 (7%) discontinued use between baseline and follow-up.
Van Hooff,
2012
OB85Combined10 daysPain: Visual Analogue Scale
Opioid use: Self-reported
Pain: No significant improvement at 1-year follow-up (p = 0.34).
Opioids use: Minimal reduction was found, 25% of patients used opioids (15% weak opioid, 10% strong opioid) at pre-treatment, and 14% of patients used opioids (11% weak opioid, 3% strong opioid) at 2-year follow-up.
Gilliam,
2020
OB762Combined15 daysPain: West Haven Yale Multidimensional Pain Inventory
Opioid use: Medical records, medicine bottles, patient report, and state prescription monitoring programs and converted to MME
Pain: Significant improvements were found for pain outcomes.
Opioid use: Significant improvements were found for opioid use. At discharge, all patients (31.8%, n = 242) taking opioids at pre-treatment had completed the taper and discontinued opioid use.
Trinh,
2023
OB74Device30 daysPain: Brief Pain Inventory, Visual Analogue Scale
Opioid use: Self-reported, compensation claimants
Pain: Significant reduction in pain post H-Wave treatment (p < 0.0001) Opioid use: Approximately, 49% of the patients taking opioids prior to the H-Wave device intervention subsequently reduced or stopped their usage.
Passmore,
2022
OB62ChiropracticNAPain: Numeric Rating Scale
Opioid use: Self-reported
Pain: Significant decrease in pain intensity was found.
Opioid use: Significant reduction of opioid use was found (p = 0.012), approximately 59.0% reduction post-treatment.
Buchfuhrer, 2023OB20Device21 daysPain: Clinician Global Impression of Improvement
Opioid use:
Self-reported and converted to MME
Pain: No changes to restless legs syndrome severity found.
Opioid use: Approximately, 70% of participants (14/20) successfully reduced opioid use >20%, 29.9% mean opioid reduction (SD = 23.7%, n = 20) from 39.0 to 26.8 MME per day post-TOMAC treatment.
Barrett,
2021
OB17Combined8 wksPain: Brief Pain Inventory
Opioid use: Self-reported and converted to MME
Pain: No significant changes in pain severity (5.9 vs. 5.93, p = 0.913).
Opioid use: Five participants (38.5%) reported decreasing their opioid use since baseline. Of these five, opioid use reductions were 17%, 25%, 34%, 55%, and 74%. The mean opioid use decreased from 138.17 mg (SD = 83.99) to 101.21 mg (SD = 45.71).
Matyac,
2022
OB13Educational
Program
5 wksPain: Pain, Enjoyment, and General Activity
Opioid use: Self-reported and converted to MME
Pain: The program was associated with decreased pain intensity.
Opioid use: Although not significant, the program was associated with reduced opioid use.
Nilsen,
2010
OB11CBT8 wksPain: Brief Pain Inventory
Opioid use: Codeine (milligram) use and blood sample taken at the first session for genetic polymorphism CTP2D6
Pain: No significant changes (p > 0.05) were found to mid-treatment (d = 0.3), post-treatment (d = 0.4), or to follow-up (d = 0.4).
Opioid use: A significant decrease in codeine use was found from pre- to mid-treatment (t = 11.4, p < 0.001; d = 2.2), pre-to post-treatment (t = 11.8, p < 0.001; d = 2.9), pre-treatment to follow-up (t = 11.7, p < 0.001; d = 2.9) and from mid- to post-treatment (t = 6.1, p < 0.001; d = 1.4).
McCrae,
2020
SA113CBT8 wksPain: NA
Opioid use: Self-reported
Pain: NA. Opioid use: There were no significant effects for frequency of opioid use between groups (CBT-insomnia, CBT-pain, waitlist control).
Miller-Matero, 2022SA60Combined5 sessionsPain: Brief Pain Inventory
Opioid use: EHRs verified and converted to MME
Pain: Intervention significantly reduced pain outcomes (p = 0.048).
Opioid use: Though not significant, the intervention showed lower odds of having an opioid prescription 6 months post-intervention (p = 0.09, OR = 0.32).

3.2. Nonpharmacological Interventions Included in Studies

3.2.1. combination npi, 3.2.2. educational programs, 3.2.3. noninvasive devices or digital technology, 3.2.4. cognitive behavioral therapy (cbt), 3.2.5. mindfulness, 3.2.6. acupuncture, 3.2.7. yoga, 3.2.8. hypnosis, 3.2.9. chiropractic, 3.3. reported effect sizes, 3.4. measures, 3.5. assessment of methodological quality, 4. discussion, limitations, 5. conclusions, supplementary materials, author contributions, institutional review board statement, informed consent statement, data availability statement, acknowledgments, conflicts of interest.

SectionItemPrisma-ScR Checklist ItemReported on Page #
Title1Identify the report as a scoping review.1
Structured summary2Provide a structured summary that includes (as applicable) background, objectives, eligibility criteria, sources of evidence, charting methods, results, and conclusions that relate to the review questions and objectives.1
Rationale3Describe the rationale for the review in the context of what is already known. Explain why the review questions/objectives lend themselves to a scoping review approach.1–3
Objectives4Provide an explicit statement of the questions and objectives being addressed with reference to their key elements (e.g., population or participants, concepts, and context) or other relevant key elements used to conceptualize the review questions and/or objectives.3
Protocol and registration5Indicate whether a review protocol exists; state if and where it can be accessed (e.g., a Web address); and if available, provide registration information, including the registration number.2–3
Eligibility criteria6Specify characteristics of the sources of evidence used as eligibility criteria (e.g., years considered, language, and publication status), and provide a rationale.3
Information sources7Describe all information sources in the search (e.g., databases with dates of coverage and contact with authors to identify additional sources), as well as the date the most recent search was executed.4
Search8Present the full electronic search strategy for at least 1 database, including any limits used, such that it could be repeated.28–29
Selection of sources of evidence†9State the process for selecting sources of evidence (i.e., screening and eligibility) included in the scoping review.3
Data charting process‡10Describe the methods of charting data from the included sources of evidence (e.g., calibrated forms or forms that have been tested by the team before their use, and whether data charting was performed independently or in duplicate) and any processes for obtaining and confirming data from investigators.4
Data items11List and define all variables for which data were sought and any assumptions and simplifications made.3
Critical appraisal of individual sources of evidence§12If performed, provide a rationale for conducting a critical appraisal of included sources of evidence; describe the methods used and how this information was used in any data synthesis (if appropriate).3
Synthesis of results13Describe the methods of handling and summarizing the data that were charted.4
Selection of sources of evidence14Give numbers of sources of evidence screened, assessed for eligibility, and included in the review, with reasons for exclusions at each stage, ideally using a flow diagram.5
Characteristics of sources of evidence15For each source of evidence, present characteristics for which data were charted and provide the citations.5
Critical appraisal within sources of evidence16If performed, present data on critical appraisal of included sources of evidence (see item 12).23–24
Results of individual sources of evidence17For each included source of evidence, present the relevant data that were charted that relate to the review questions and objectives.6–11
Synthesis of results18Summarize and/or present the charting results as they relate to the review questions and objectives.13–18
Summary of evidence19Summarize the main results (including an overview of concepts, themes, and types of evidence available), link to the review questions and objectives, and consider the relevance to key groups.24–26
Limitations20Discuss the limitations of the scoping review process.26–27
Conclusions21Provide a general interpretation of the results with respect to the review questions and objectives, as well as potential implications and/or next steps.27
Funding22Describe sources of funding for the included sources of evidence, as well as sources of funding for the scoping review. Describe the role of the funders of the scoping review.27
  • Trinh, A.; Williamson, T.K.; Han, D.; Hazlewood, J.E.; Norwood, S.M.; Gupta, A. Clinical and Quality of Life Benefits for End-Stage Workers’ Compensation Chronic Pain Claimants following H-Wave ® Device Stimulation: A Retrospective Observational Study with Mean 2-Year Follow-Up. J. Clin. Med. 2023 , 12 , 1148. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Sandhu, H.K.; Booth, K.; Furlan, A.D.; Shaw, J.; Carnes, D.; Taylor, S.J.C.; Abraham, C.; Alleyne, S.; Balasubramanian, S.; Betteley, L.; et al. Reducing Opioid Use for Chronic Pain with a Group-Based Intervention: A Randomized Clinical Trial. JAMA 2023 , 329 , 1745–1756. [ Google Scholar ] [ CrossRef ]
  • Barrett, D.; Brintz, C.E.; Zaski, A.M.; Edlund, M.J. Dialectical Pain Management: Feasibility of a Hybrid Third-Wave Cognitive Behavioral Therapy Approach for Adults Receiving Opioids for Chronic Pain. Pain Med. 2021 , 22 , 1080–1094. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Choudry, E.; Rofé, K.L.; Konnyu, K.; Marshall, B.D.L.; Shireman, T.I.; Merlin, J.S.; Trivedi, A.N.; Schmidt, C.; Bhondoekhan, F.; Moyo, P.; et al. Treatment Patterns and Population Characteristics of Nonpharmacological Management of Chronic Pain in the United States’ Medicare Population: A Scoping Review. Innov. Aging 2023 , 7 , igad085. [ Google Scholar ] [ CrossRef ]
  • Matyac, C.A.; McLaughlin, H. Alternatives to opioids for managing chronic pain: A patient education programme in the US. Prim. Health Care 2022 , 33 , 16–21. [ Google Scholar ] [ CrossRef ]
  • Dowell, D.; Haegerich, T.M.; Chou, R. CDC Guideline for Prescribing Opioids for Chronic Pain—United States, 2016. JAMA 2016 , 315 , 1624–1645. [ Google Scholar ] [ CrossRef ]
  • Does, M.B.; Adams, S.R.; Kline-Simon, A.H.; Marino, C.; Charvat-Aguilar, N.; Weisner, C.M.; Rubinstein, A.L.; Ghadiali, M.; Cowan, P.; Young-Wolff, K.C.; et al. A patient activation intervention in primary care for patients with chronic pain on long term opioid therapy: Results from a randomized control trial. BMC Health Serv. Res. 2024 , 24 , 112. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Weeks, J. Common Sense: Use All Proven Pain Methods in a Comprehensive Strategy to Prevent Opioid Abuse. J. Altern. Complement. Med. 2016 , 22 , 677–679. [ Google Scholar ] [ CrossRef ]
  • Duff, J.H.; Tharakan, S.M.; Davis-Castro, C.Y.; Cornell, A.S.; Romero, P.D. Consumption of Prescription Opioids for Pain: A Comparison of Opioid Use in the United States and Other Countries. Available online: https://crsreports.congress.gov/product/pdf/R/R46805 (accessed on 15 May 2024).
  • Moffat, A.K.; Apajee, J.; Blanc, V.T.L.; Westaway, K.; Andrade, A.Q.; Ramsay, E.N.; Blacker, N.; Pratt, N.; Roughead, E.E. Reducing opioid use for chronic non-cancer pain in primary care using an evidence-based, theory-informed, multistrategic, multistakeholder approach: A single-arm time series with segmented regression. BMJ Qual. Saf. 2023 , 32 , 623–631. [ Google Scholar ] [ CrossRef ]
  • Chou, R.; Hartung, D.; Turner, J.; Blazina, I.; Chan, B.; Levander, X.; McDonagh, M.; Selph, S.; Fu, R.; Pappas, M. Opioid Treatments for Chronic Pain ; Agency for Healthcare Research and Quality (US): Rockville, MD, USA, 2020. [ Google Scholar ]
  • Kissin, I. Long-term opioid treatment of chronic nonmalignant pain: Unproven efficacy and neglected safety? J. Pain Res. 2013 , 6 , 513–529. [ Google Scholar ] [ CrossRef ]
  • Krebs, E.E.; Gravely, A.; Nugent, S.; Jensen, A.C.; DeRonne, B.; Goldsmith, E.S.; Kroenke, K.; Bair, M.; Noorbaloochi, S. Effect of Opioid vs. Nonopioid Medications on Pain-Related Function in Patients with Chronic Back Pain or Hip or Knee Osteoarthritis Pain: The SPACE Randomized Clinical Trial. JAMA 2018 , 319 , 872–882. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Gibson, C.J.; Grasso, J.; Li, Y.; Purcell, N.; Tighe, J.; Zamora, K.; Nicosia, F.; Seal, K.H. An Integrated Pain Team Model: Impact on Pain-Related Outcomes and Opioid Misuse in Patients with Chronic Pain. Pain Med. 2020 , 21 , 1977–1984. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Sobczak, M.; Salaga, M.; Storr, M.A.; Fichna, J. Physiology, signaling, and pharmacology of opioid receptors and their ligands in the gastrointestinal tract: Current concepts and future perspectives. J. Gastroenterol. 2014 , 49 , 24–45. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Lavand’homme, P.; Steyaert, A. Opioid-free anesthesia opioid side effects: Tolerance and hyperalgesia. Best Pract. Res. Clin. Anaesthesiol. 2017 , 31 , 487–498. [ Google Scholar ] [ CrossRef ]
  • Brack, A.; Rittner, H.L.; Stein, C. Immunosuppressive effects of opioids—Clinical relevance. J. Neuroimmune Pharmacol. 2011 , 6 , 490–502. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Brennan, M.J. The effect of opioid therapy on endocrine function. Am. J. Med. 2013 , 126 (Suppl. S1), S12–S18. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Mattia, C.; Di Bussolo, E.; Coluzzi, F. Non-analgesic effects of opioids: The interaction of opioids with bone and joints. Curr. Pharm. Des. 2012 , 18 , 6005–6009. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Shanmugam, V.K.; Couch, K.S.; McNish, S.; Amdur, R.L. Relationship between opioid treatment and rate of healing in chronic wounds. Wound Repair Regen. 2017 , 25 , 120–130. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Martin, J.L.; Koodie, L.; Krishnan, A.G.; Charboneau, R.; Barke, R.A.; Roy, S. Chronic morphine administration delays wound healing by inhibiting immune cell recruitment to the wound site. Am. J. Pathol. 2010 , 176 , 786–799. [ Google Scholar ] [ CrossRef ]
  • Correa, D.; Farney, R.J.; Chung, F.; Prasad, A.; Lam, D.; Wong, J. Chronic opioid use and central sleep apnea: A review of the prevalence, mechanisms, and perioperative considerations. Anesth. Analg. 2015 , 120 , 1273–1285. [ Google Scholar ] [ CrossRef ]
  • Freire, C.; Sennes, L.U.; Polotsky, V.Y. Opioids and obstructive sleep apnea. J. Clin. Sleep Med. 2022 , 18 , 647–652. [ Google Scholar ] [ CrossRef ]
  • Krantz, M.J.; Palmer, R.B.; Haigney, M.C.P. Cardiovascular Complications of Opioid Use: JACC State-of-the-Art Review. J. Am. Coll. Cardiol. 2021 , 77 , 205–223. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Scherrer, J.F.; Salas, J.; Copeland, L.A.; Stock, E.M.; Ahmedani, B.K.; Sullivan, M.D.; Burroughs, T.; Schneider, F.D.; Bucholz, K.K.; Lustman, P.J. Prescription Opioid Duration, Dose, and Increased Risk of Depression in 3 Large Patient Populations. Ann. Fam. Med. 2016 , 14 , 54–62. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Sjogren, P.; Thomsen, A.B.; Olsen, A.K. Impaired neuropsychological performance in chronic nonmalignant pain patients receiving long-term oral opioid therapy. J. Pain Symptom Manag. 2000 , 19 , 100–108. [ Google Scholar ] [ CrossRef ]
  • Rolita, L.; Spegman, A.; Tang, X.; Cronstein, B.N. Greater number of narcotic analgesic prescriptions for osteoarthritis is associated with falls and fractures in elderly adults. J. Am. Geriatr. Soc. 2013 , 61 , 335–340. [ Google Scholar ] [ CrossRef ]
  • Matsuda, M.; Huh, Y.; Ji, R.R. Roles of inflammation, neurogenic inflammation, and neuroinflammation in pain. J. Anesth. 2019 , 33 , 131–139. [ Google Scholar ] [ CrossRef ]
  • Wartko, P.D.; Krakauer, C.; Turner, J.A.; Cook, A.J.; Boudreau, D.M.; Sullivan, M.D. STRategies to Improve Pain and Enjoy life (STRIPE): Results of a pragmatic randomized trial of pain coping skills training and opioid medication taper guidance for patients on long-term opioid therapy. Pain 2023 , 164 , 2852–2864. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Zheng, Z.; Gibson, S.; Helme, R.D.; Wang, Y.; Lu, D.S.; Arnold, C.; Hogg, M.; Somogyi, A.; Da Costa, C.; Xue, C.C.L. Effects of Electroacupuncture on Opioid Consumption in Patients with Chronic Musculoskeletal Pain: A Multicenter Randomized Controlled Trial. Pain Med. 2019 , 20 , 397–410. [ Google Scholar ] [ CrossRef ]
  • Bennett, M.; Closs, S. Methodological issues in nonpharmacological trials for chronic pain. Anaesth. Pain Intensive Care 2011 , 15 , 126–132. [ Google Scholar ]
  • Park, J.; Hughes, A.K. Nonpharmacological Approaches to the Management of Chronic Pain in Community-Dwelling Older Adults: A Review of Empirical Evidence. J. Am. Geriatr. Soc. 2012 , 60 , 555–568. [ Google Scholar ] [ CrossRef ]
  • Cooperman, N.A.; Lu, S.-E.; Hanley, A.W.; Puvananayagam, T.; Dooley-Budsock, P.; Kline, A.; Garland, E.L. Telehealth Mindfulness-Oriented Recovery Enhancement vs. Usual Care in Individuals with Opioid Use Disorder and Pain: A Randomized Clinical Trial. JAMA Psychiatry 2023 , 81 , 338–346. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Hassan, S.; Zheng, Q.; Rizzolo, E.; Tezcanli, E.; Bhardwaj, S.; Cooley, K. Does Integrative Medicine Reduce Prescribed Opioid Use for Chronic Pain? A Systematic Literature Review. Pain Med. 2020 , 21 , 836–859. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Tricco, A.C.; Lillie, E.; Zarin, W.; O’Brien, K.K.; Colquhoun, H.; Levac, D.; Moher, D.; Peters, M.; Horsley, T.; Weeks, L.; et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann. Intern. Med. 2018 , 169 , 467–473. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Downs, S.H.; Black, N. The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions. J. Epidemiol. Community Health 1998 , 52 , 377–384. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Nguyen, H.V.; McGinty, E.E.; Mital, S.; Alexander, G.C. Recreational and Medical Cannabis Legalization and Opioid Prescriptions and Mortality. JAMA Health Forum 2024 , 5 , e234897. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Garland, E.L.; Nakamura, Y.; Bryan, C.J.; Hanley, A.W.; Parisi, A.; Froeliger, B.; Marchand, W.; Donaldson, G.W. Mindfulness-Oriented Recovery Enhancement for Veterans and Military Personnel on Long-Term Opioid Therapy for Chronic Pain: A Randomized Clinical Trial. JAMA Psychiatry 2024 , 181 , 125–134. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Jensen, M.P.; Mendoza, M.E.; Ehde, D.M.; Patterson, D.R.; Molton, I.R.; Dillworth, T.M.; Gertz, K.J.; Chan, J.; Hakimian, S.; Battalio, S.L.; et al. Effects of hypnosis, cognitive therapy, hypnotic cognitive therapy, and pain education in adults with chronic pain: A randomized clinical trial. Pain 2020 , 161 , 2284–2298. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Garcia, L.M.; Birckhead, B.J.; Krishnamurthy, P.; Sackman, J.; Mackey, I.G.; Louis, R.G.; Maddox, T.; Birckhead, B.J. An 8-Week Self-Administered At-Home Behavioral Skills-Based Virtual Reality Program for Chronic Low Back Pain: Double-Blind, Randomized, Placebo-Controlled Trial Conducted During COVID-19. J. Med. Internet. Res. 2021 , 23 , e26292. [ Google Scholar ] [ CrossRef ]
  • Hudak, J.; Hanley, A.W.; Marchand, W.R.; Nakamura, Y.; Yabko, B.; Garland, E.L. Endogenous theta stimulation during meditation predicts reduced opioid dosing following treatment with Mindfulness-Oriented Recovery Enhancement. Neuropsychopharmacology 2021 , 46 , 836–843. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • DeBar, L.; Mayhew, M.; Benes, L.; Bonifay, A.; Deyo, R.A.; Elder, C.R.; Keefe, F.J.; Leo, M.C.; McMullen, C.; Owen-Smith, A.; et al. A Primary Care-Based Cognitive Behavioral Therapy Intervention for Long-Term Opioid Users with Chronic Pain: A Randomized Pragmatic Trial. Ann. Intern. Med. 2022 , 175 , 46–55. [ Google Scholar ] [ CrossRef ]
  • Garland, E.L.; Hanley, A.W.; Nakamura, Y.; Barrett, J.W.; Baker, A.K.; Reese, S.E.; Riquino, M.R.; Froeliger, B.; Donaldson, G.W. Mindfulness-Oriented Recovery Enhancement vs. Supportive Group Therapy for Co-occurring Opioid Misuse and Chronic Pain in Primary Care: A Randomized Clinical Trial. JAMA Intern. Med. 2022 , 182 , 407–417. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Gardiner, P.; Luo, M.; D’Amico, S.; Gergen-Barnett, K.; White, L.F.; Saper, R.; Mitchell, S.; Liebschutz, J.M.; Moitra, E. Effectiveness of integrative medicine group visits in chronic pain and depressive symptoms: A randomized controlled trial. PLoS ONE 2019 , 14 , e0225540. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Nielssen, O.; Karin, E.; Staples, L.; Titov, N.; Gandy, M.; Fogliati, V.J.; Dear, B.F.; Davila, J.; Borsari, B.; Read, J.P. Opioid Use Before and After Completion of an Online Pain Management Program. J. Consult. Clin. Psychol. 2019 , 87 , 904–917. [ Google Scholar ] [ CrossRef ]
  • Wilson, M.; Dolor, R.J.; Lewis, D.; Regan, S.L.; Vonder Meulen, M.B.; Winhusen, T.J. Opioid dose and pain effects of an online pain self-management program to augment usual care in adults with chronic pain: A multisite randomized clinical trial. Pain 2023 , 164 , 877–885. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Spångeus, A.; Willerton, C.; Enthoven, P.; Grahn Kronhed, A.-C. Patient Education Improves Pain and Health-Related Quality of Life in Patients with Established Spinal Osteoporosis in Primary Care-A Pilot Study of Short- and Long-Term Effects. Int. J. Environ. Res. Public Health 2023 , 20 , 4933. [ Google Scholar ] [ CrossRef ]
  • Day, M.A.; Ward, L.C.; Ehde, D.M.; Thorn, B.E.; Burns, J.; Barnier, A.; Mattingley, J.B.; Jensen, M.P. A Pilot Randomized Controlled Trial Comparing Mindfulness Meditation, Cognitive Therapy, and Mindfulness-Based Cognitive Therapy for Chronic Low Back Pain. Pain Med. 2019 , 20 , 2134–2148. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Nelli, A.; Wright, M.C.; Gulur, P. Green Light-Based Analgesia—Novel Nonpharmacological Approach to Fibromyalgia Pain: A Pilot Study. Pain Physician 2023 , 26 , 403–410. [ Google Scholar ] [ PubMed ]
  • Roseen, E.J.; Pinheiro, A.; Lemaster, C.M.; Plumb, D.; Wang, S.; Elwy, A.R.; Streeter, C.C.; Lynch, S.; Groessl, E.; Sherman, K.J.; et al. Yoga Versus Education for Veterans with Chronic Low Back Pain: A Randomized Controlled Trial. J. Gen. Intern. Med. 2023 , 38 , 2113–2122. [ Google Scholar ] [ CrossRef ]
  • Groessl, E.J.; Liu, L.; Chang, D.G.; Wetherell, J.L.; Bormann, J.E.; Atkinson, J.H.; Baxi, S.; Schmalzl, L. Yoga for Military Veterans with Chronic Low Back Pain: A Randomized Clinical Trial. Am. J. Prev. Med. 2017 , 53 , 599–608. [ Google Scholar ] [ CrossRef ]
  • Naylor, M.R.; Naud, S.; Keefe, F.J.; Helzer, J.E. Therapeutic Interactive Voice Response (TIVR) to reduce analgesic medication use for chronic pain management. J. Pain 2010 , 11 , 1410–1419. [ Google Scholar ] [ CrossRef ]
  • Davis, R.T.; Badger, G.; Valentine, K.; Cavert, A.; Coeytaux, R.R. Acupuncture for Chronic Pain in the Vermont Medicaid Population: A Prospective, Pragmatic Intervention Trial. Glob. Adv. Health Med. 2018 , 7 , 2164956118769557. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Schumann, M.E.; Lapid, M.I.; Cunningham, J.L.; Schluenz, L.; Gilliam, W.P. Treatment Effectiveness and Medication Use Reduction for Older Adults in Interdisciplinary Pain Rehabilitation. Mayo Clin. Proc. Innov. Qual. Outcomes 2020 , 4 , 276–286. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Huffman, K.L.; Mandell, D.; Lehmann, J.K.; Jimenez, X.F.; Lapin, B.R. Clinical and Demographic Predictors of Interdisciplinary Chronic Pain Rehabilitation Program Treatment Response. J. Pain 2019 , 20 , 1470–1485. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Van Der Merwe, J.; Brook, S.; Fear, C.; Benjamin, M.J.; Libby, G.; Williams, A.C.C.; Baranowski, A.P. Military veterans with and without post-traumatic stress disorder: Results from a chronic pain management programme. Scand. J. Pain 2021 , 21 , 560–568. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Van Hooff, M.L.; Ter Avest, W.; Horsting, P.P.; O’Dowd, J.; de Kleuver, M.; van Lankveld, W.; van Limbeek, J. A short, intensive cognitive behavioral pain management program reduces health-care use in patients with chronic low back pain: Two-year follow-up results of a prospective cohort. Eur. Spine J. 2012 , 21 , 1257–1264. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Passmore, S.; Malone, Q.; Manansala, C.; Ferbers, S.; Toth, E.A.; Olin, G.M. A retrospective analysis of pain changes and opioid use patterns temporally associated with a course of chiropractic care at a publicly funded inner-city facility. J. Can. Chiropr. Assoc. 2022 , 66 , 107–117. [ Google Scholar ] [ PubMed ]
  • Townsend, C.O.; Kerkvliet, J.L.; Bruce, B.K.; Rome, J.D.; Hooten, M.W.; Luedtke, C.A.; Hodgson, J.E. A longitudinal study of the efficacy of a comprehensive pain rehabilitation program with opioid withdrawal: Comparison of treatment outcomes based on opioid use status at admission. Pain 2008 , 140 , 177–189. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Hooten, W.M.; Townsend, C.O.; Sletten, C.D.; Bruce, B.K.; Rome, J.D. Treatment outcomes after multidisciplinary pain rehabilitation with analgesic medication withdrawal for patients with fibromyalgia. Pain Med. 2007 , 8 , 8–16. [ Google Scholar ] [ CrossRef ]
  • Nilsen, H.K.; Stiles, T.C.; Landrø, N.I.; Fors, E.A.; Kaasa, S.; Borchgrevink, P.C. Patients with problematic opioid use can be weaned from codeine without pain escalation. Acta. Anaesthesiol. Scand. 2010 , 54 , 571–579. [ Google Scholar ] [ CrossRef ]
  • Buchfuhrer, M.J.; Roy, A.; Rodriguez, S.; Charlesworth, J.D. Adjunctive tonic motor activation enables opioid reduction for refractory restless legs syndrome: A prospective, open-label, single-arm clinical trial. BMC Neurol. 2023 , 23 , 415. [ Google Scholar ] [ CrossRef ]
  • Zeliadt, S.B.; Douglas, J.H.; Gelman, H.; Coggeshall, S.; Taylor, S.L.; Kligler, B.; Bokhour, B.G. Effectiveness of a whole health model of care emphasizing complementary and integrative health on reducing opioid use among patients with chronic pain. BMC Health Serv. Res. 2022 , 22 , 1053. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Gilliam, W.P.; Schumann, M.E.; Craner, J.R.; Cunningham, J.L.; Morrison, E.J.; Seibel, S.; Sawchuk, C.; Sperry, J.A. Examining the effectiveness of pain rehabilitation on chronic pain and post-traumatic symptoms. J. Behav. Med. 2020 , 43 , 956–967. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Ward, R.; Rauch, S.A.M.; Axon, R.N.; Saenger, M.S. Evaluation of a non-pharmacological interdisciplinary pain rehabilitation and functional restoration program for chronic pain in veterans. Health Serv. Res. 2023 , 58 , 365–374. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Miller-Matero, L.R.; Chohan, S.; Gavrilova, L.; Hecht, L.M.; Autio, K.; Tobin, E.; Gavrilova, L. Utilizing Primary Care to Engage Patients on Opioids in a Psychological Intervention for Chronic Pain. Subst. Use Misuse 2022 , 57 , 1492–1496. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • McCrae, C.S.; Curtis, A.F.; Miller, M.B.; Nair, N.; Rathinakumar, H.; Davenport, M.; Berry, J.R.; McGovney, K.; Staud, R.; Berry, R.; et al. Effect of cognitive behavioural therapy on sleep and opioid medication use in adults with fibromyalgia and insomnia. J. Sleep Res. 2020 , 29 , e13020. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Mun, C.J.; Hook, J.; Winsick, N.; Nair, L.; Chen, A.C.-C.; Parsons, T.D.; Roos, C. Digital Interventions for Improving Pain Among Individuals With and Without Opioid Use Disorder and Reducing Medical and Non-medical Opioid Use: A Scoping Review of the Current Science. Curr. Addict. Rep. 2024 , 11 , 299–315. [ Google Scholar ] [ CrossRef ]
  • Avery, N.; McNeilage, A.G.; Stanaway, F.; Ashton-James, C.E.; Blyth, F.M.; Martin, R.; Gholamrezaei, A.; Glare, P. Efficacy of interventions to reduce long term opioid treatment for chronic non-cancer pain: Systematic review and meta-analysis. BMJ 2022 , 377 , e066375. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Cai, Q.; Grigoroglou, C.; Allen, T.; Chen, T.-C.; Chen, L.-C.; Kontopantelis, E. Interventions to reduce opioid use for patients with chronic non-cancer pain in primary care settings: A systematic review and meta-analysis. medRxiv 2024 . [ Google Scholar ] [ CrossRef ]
  • Wang, F.; Lee, E.-K.O.; Wu, T.; Benson, H.; Fricchione, G.; Wang, W.; Yeung, A.S. The Effects of Tai Chi on Depression, Anxiety, and Psychological Well-Being: A Systematic Review and Meta-Analysis. Int. J. Behav. Med. 2014 , 21 , 605–617. [ Google Scholar ] [ CrossRef ]
  • Cui, J.; Liu, F.; Liu, X.; Li, R.; Chen, X.; Zeng, H. The Impact of Qigong and Tai Chi Exercise on Drug Addiction: A Systematic Review and Meta-Analysis. Front. Psychiatry 2022 , 13 , 826187. [ Google Scholar ] [ CrossRef ]
  • Zhang, Y.; Lu, S. Effects of traditional Chinese exercises on mental health in individuals with drug rehabilitee: A systematic review and meta-analysis. Front. Public Health 2022 , 10 , 944636. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Park, J.; Krause-Parello, C.A.; Barnes, C.M. A Narrative Review of Movement-Based Mind-Body Interventions: Effects of Yoga, Tai Chi, and Qigong for Back Pain Patients. Holist. Nurs. Pract. 2020 , 34 , 3–23. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Liu, J.; Yeung, A.; Xiao, T.; Tian, X.; Kong, Z.; Zou, L.; Wang, X. Chen-Style Tai Chi for Individuals (Aged 50 Years Old or Above) with Chronic Non-Specific Low Back Pain: A Randomized Controlled Trial. Int. J. Environ. Res. Public Health 2019 , 16 , 517. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Chen, P.Y.; Song, C.Y.; Yen, H.Y.; Lin, P.C.; Chen, S.R.; Lu, L.H.; Tien, C.L.; Wang, X.M.; Lin, C.H. Impacts of tai chi exercise on functional fitness in community-dwelling older adults with mild degenerative knee osteoarthritis: A randomized controlled clinical trial. BMC Geriatr. 2021 , 21 , 449. [ Google Scholar ] [ CrossRef ]
  • Wang, C.; Schmid, C.H.; Iversen, M.D.; Harvey, W.F.; Fielding, R.A.; Driban, J.B.; Price, L.L.; Wong, J.B.; Reid, K.F.; Rones, R.; et al. Comparative Effectiveness of Tai Chi Versus Physical Therapy for Knee Osteoarthritis: A Randomized Trial. Ann. Intern. Med. 2016 , 165 , 77–86. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Jones, K.D.; Sherman, C.A.; Mist, S.D.; Carson, J.W.; Bennett, R.M.; Li, F. A randomized controlled trial of 8-form Tai chi improves symptoms and functional mobility in fibromyalgia patients. Clin. Rheumatol. 2012 , 31 , 1205–1214. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Lauche, R.; Stumpe, C.; Fehr, J.; Cramer, H.; Cheng, Y.W.; Wayne, P.M.; Rampp, T.; Langhorst, J.; Dobos, G. The Effects of Tai Chi and Neck Exercises in the Treatment of Chronic Nonspecific Neck Pain: A Randomized Controlled Trial. J. Pain. 2016 , 17 , 1013–1027. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Raudenbush, S.W.; Bryk, A.S.; Hutchison, D. Hierarchical linear models: Applications and data analysis methods. Educ. Res. 2003 , 45 , 327–328. [ Google Scholar ]
  • Ziegel, E.R.; Littell, R.; Milliken, G.; Stroup, W.; Wolfinger, R. SAS ® System for Mixed Models. Technometrics 1997 , 39 , 344. [ Google Scholar ] [ CrossRef ]
  • Donahue, M.L.; Dunne, E.M.; Gathright, E.C.; DeCosta, J.; Balletto, B.L.; Jamison, R.N.; Carey, M.P.; Scott-Sheldon, L.A.J.; Magaletta, P.R.; Morasco, B.J.; et al. Complementary and integrative health approaches to manage chronic pain in U.S. military populations: Results from a systematic review and meta-analysis, 1985–2019. Psychol. Serv. 2021 , 18 , 295–309. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

NPIGardiner,
2019
Day,
2019
Moffat,
2023
Zeliadt,
2022
Huffman,
2019
Townsend,
2008
Ward,
2022
Van Der Merwe,
2021
Hooten,
2007
Schumann,
2020
Gibson,
2020
Van Hoof,
2012
Gilliam,
2020
Barrett,
2021
Miller-
Matero,
2022
MindfulnessXX X XX
Relaxation
Techniques
X X X
CBT X XX X XXXXXX
EducationX XXXXXXX X
Biofeedback X X X
Yoga X
Audit and
Feedback
X
Taper
Protocol
X X X X X
Physical or
Occupational Therapy or Movement
XXX XXXXX
Guided
Imagery
X
Group
Visits
XX X X
Hypnosis X
Acupuncture X
ACT X XX
Psychotherapy X
Stress
Management
X X
Chiropractic X X
Tai Chi/Qigong X
Meditation X X X
Massage X X X
Whole Health Coaching X
Hydrotherapy X
Breathing
Practices
X
Device or Digital TechnologyX
Reduced Pain and
Opioid Use?
NNNNYYNYYYNNYNN
Integrated
Approach?
YYNNYYYYYYNYYYN
First Author,
Year
Additional MeasuresAdditional Results
Garcia,
2021
Pain Interference with Activity, Sleep, Mood, and Stress (DVPRS-II, PROMIS), Pain Catastrophizing Scale (PCS), Pain Efficacy (PSEQ-2), Chronic Pain Acceptance (CPAQ-8), Patient’s Global Impression of Change, Satisfaction with VR Device Use, Cybersickness, Over-the-Counter Analgesic Medication UseEaseVRx intervention decreased pain-related interference with activity, mood, and stress, and nonopioid medication use. Pain catastrophizing, pain self-efficacy, and pain acceptance did not reach statistical significance for either group.
Jensen,
2020
Pain Interference (BPI), Depressive (PHQ-8), Global Impression of Change (IMMPACT), Satisfaction (PGATS)All 4 treatment groups showed improvements on pain-related interference and depressive symptoms, with some return to pre-treatment levels at 12-month follow-up.
Zheng,
2019
Medication Quantification Scale III was used to quantify nonopioid medications, Unpleasantness was measured with a 0–20 Numerical Rating Scale, Depression (BDI), Quality of Life (SF-36), Disability (RMDQ), Perception of Electroacupuncture Treatment QuestionnaireThere were no significant differences found across the treatment groups on mental health, feelings of unpleasantness, nonopioid medication doses, disability, and opioid-related adverse events.
Garland,
2022
Pain Interference (BPI), Emotional distress (DASS), Opioid Misuse and Cravings (DMI, COMM)MORE group experienced greater reductions in pain-related functional interference and lower emotional distress and opioid cravings than the supportive psychotherapy group.
Hudak,
2021 *
Self-referential Processing (NADA-state, PBBS)MORE group demonstrated significantly increased alpha and theta power and increased frontal midline theta coherence compared to the control group—neural changes with altered self-referential processing were noted.
Wilson,
2023
Opioid Misuse (COMM), Global Health (PROMIS), Pain Knowledge (The Pain Knowledge Questionnaire), Pain Self-Efficacy (PSEQ), Pain Coping (CSQ-R)No significant effect found from baseline to 10-month posttest for COMM and Global Health. Improvements were found in pain knowledge, pain self-efficacy, and pain coping.
Garland,
2024 *
Emotional Stress (DASS), Post-Traumatic Stress Disorder Checklist—Military Version, Pain Catastrophizing subscale of the Coping Strategies Questionnaire, the Snaith–Hamilton Anhedonia and Pleasure Scale, the positive affect subscale of the Positive and Negative Affect Schedule, the Cognitive Reappraisal of Pain Scale, and Nonreactivity Subscale of the Five Facet Mindfulness Questionnaire, Opioid Cravings (COMM)MORE group reduced opioid use while maintaining pain control and preventing mood disturbances. MORE group reduced opioid cravings, opioid cue reactivity, anhedonia, pain catastrophizing, and opioid attentional bias and increased positive affect more than the control group.
DeBar,
2022
Roland–Morris Disability Questionnaire (RMDQ)CBT intervention sustained larger reductions in pain related disability.
Gardiner,
2019
Depression (PHQ-9), Patient Activation Measure, Health-related Quality of Life (short form 12 Health Survey version 2: SF-12), Opioid Misuse (COMM)Significant differences between the intervention and control group for activation and opioid misuse. No differences in depression at any time point. At 21 weeks, the intervention group had higher quality of life compared with the control group
Wartko,
2023
Pain Self-Efficacy (PSEQ), Depression (PHQ-8), Generalized Anxiety (GAD-7), Patient Global Impression of Change, Prescription Opioid Difficulties Scale, Prescription Opioid Misuse IndexNo significant differences between intervention and usual care were found for any of the secondary outcomes.
Groessl,
2017 *
Roland–Morris Disability Questionnaire (RMDQ) Improvements in disability scores did not differ between the two groups at 12 weeks, but yoga showed greater reductions in disability scores than delayed treatment group at 6 months.
Roseen,
2022 *
Post-Traumatic Stress Symptoms (PCL-C), Roland–Morris Disability Questionnaire (RMDQ)No significant differences between intervention and education were found for secondary outcomes.
Sandhu,
2023
Patient-Reported Outcomes Measurement Information System (PROMIS-PI-SF-8a), Short Opioid Withdrawal Scale (SHOWS), Health-related Quality of Life (SF-12v2 health survey and EuroQol 5-dimension 5-level), Sleep Quality (Pittsburgh Sleep Quality Index), Emotional Wellbeing (HADS), Pain Self-Efficacy (PSEQ)At 4-month follow-up, the education intervention showed significant improvements in mental health, pain self-efficacy, and health-related quality of life, but did not show improvements at any other data collection time point. No statistically significant between-group differences in opioid withdrawal symptoms, sleep quality, or pain interference were found.
Does,
2024
Depression (PHQ-9), Quality of Life, Health, and Functional Status (PROMIS), Patient Activation Measure (PAM-13)The intervention demonstrated less moderate/severe depression symptoms and higher overall health and function status. The intervention had no effect on activation scores at 12 months.
Naylor,
2010
Function/Disability from the Treatment Outcomes in Pain Survey, Depression (BDI), Pain Coping (CSQ).TIVR intervention group demonstrated improved coping, depression symptoms, function, and disability, compared to the standard follow-up group.
Nielssen,
2019
Depression (PHQ9), Anxiety (GAD-7)Reduction in opioid consumption was strongly associated with decreases in anxiety and depression symptoms.
Day,
2019
Physical Function, Depression, and Pain Interference (PROMIS)MBCT group improved significantly more than MM group on pain interference, physical function, and depression symptoms. MBCT and CT group did not differ significantly on any of the measures.
Spangeus,
2023
Health-related Quality of Life (EQ-5D-3L, RAND-36, Qualeffo-41), Static and Dynamic Balance Tests, Fall Risk and Physical Activity (FES-I), Theoretical Knowledge (open-ended questions)Significant improvements were found for quality of life, balance, tandem walking backwards, and theoretical knowledge. These changes were maintained at the 1-year follow-up.
Nelli,
2023
NANA
Moffat,
2023 *
NANA
Zeliadt,
2022 *
NANA
Huffman,
2019
Pain-related Functional Impairment (PDI), Depression and Anxiety (DASS)Intervention showed significant pre-post treatment improvements in functional impairment, depression, and anxiety symptoms.
Townsend,
2008
Health Status (SF-36), Pain Catastrophizing Scale (PCS), Depression (CES-D)Significant improvements were found on health status, pain catastrophizing, and depression symptoms following treatment and six-month post-treatment irrespective of opioid status at admission.
Ward,
2022 *
Depression (PHQ9), VA Stratification Tool for Opioid Risk Mitigation (STORM)Reduced depression scores in the post-treatment year were found in the engaged group. EVP showed a 65% lower mortality risk compared to the untreated group.
Van Der Merwe, 2021 *Pain Interference (BPI), Pain Catastrophizing Scale (PCS), Mood (CORE), Post-traumatic Stress Symptoms (Impact of Events Scale: IES-6), Self-Efficacy and Confidence (PSEQ)Pain management program significantly improved pain-related interference, mood, self-efficacy, and confidence, post-traumatic stress symptoms, and pain catastrophizing.
Hooten,
2007
Health Status (SF-36), Pain Coping (CSQ), Depression (CES-D)Health status, coping, and depression scores demonstrated improvement with the intervention.
Davis,
2018
Pain Interference, Fatigue, Physical Function, Sleep Disturbance, Emotional Distress—Anxiety, Emotional Distress—Depression, and Social Isolation Short Forms (PROMIS)Significant improvements were found in pain-related interference, physical function, fatigue, anxiety, depression, sleep disturbance, and social isolation.
Schumann,
2020
Pain Catastrophizing Scale (PCS), Depressive symptoms (CES-D, PHQ-9), Quality of Life (Medical Outcomes Study 36-Item Short Form Survey)Significant treatment effects with large effect sizes were observed for all outcome measures at post-treatment and 6-month follow-up.
Gibson,
2020 *
Pain Catastrophizing Scale (PCS), Current Opioid Misuse Measure (COMM), Patient Treatment Satisfaction Scale (PTSS)Significant decrease in pain-related interference, pain catastrophizing, pain magnification, pain helplessness, and opioid misuse were found.
Van Hoof,
2012
Roland and Morris Disability Questionnaire (RMDQ), SF36 PCS Short Form 36 Physical Component Scale, SF36 MCS Short Form 36 Mental Component Scale, pain disturbance of ADLs (0–100 scale)For the 1 and 2-year follow-up, only pain disturbance of ADLs significantly improved: df (1,84), t = 2.57, p = 0.01.
Gilliam,
2020
PTSD Checklist with a brief Criterion A assessment (PCL-5), Pain Catastrophizing Scale (PCS), Depression (PHQ-9), Physical performance measuresIntervention showed significant improvements in PTSD, depression, physical performance, and pain outcomes.
Trinh,
2023
Depression (PHQ9), Anxiety (GAD-7), Pain Disability QuestionnaireIntervention showed a 24.4% reduction in depression, 31% reduction in anxiety, and significant improvement in function/disability.
Passmore,
2022
NANA
Buchfuhrer,
2023
NANA
Barrett,
2021
Pain Interference (BPI), Pain willingness and activity engagement (CPAQ), Depression (PHQ-9)No significant changes in pain interference, but significant improvements in pain willingness, activity engagement, and depression were found.
Matyac,
2022
Opioid Risk (ORT), Pain Catastrophizing (PCS)The program showed reduction in pain catastrophizing and pain scores. Combining data from opioid risk and data on sleep apnea, the results showed that 31% of participants were at high risk of opioid overdose.
Nilsen,
2010
Health-related Quality of Life (SF-36), Neurocognitive TestsNeuropsychological functioning improved on some tests; others remained unchanged. Opioid use decreased without significant reduction in quality of life.
McCrae,
2020
NANA
Miller-Matero, 2022Pain Interference (BPI), Pain Catastrophizing (PCS), Depressive Symptoms (HADS)Intervention showed decreases in pain catastrophizing and depression symptoms. There were significant improvements in pain-related interferences.
First Arthur, YearTotal Quality Index Score
(Range, 0–29 Points)
Garcia, 202129
Jensen, 202029
Zheng, 201929
Garland, 202227
Hudak, 202127
Wilson, 202327
Garland, 202427
DeBar, 202227
Gardiner, 201926
Wartko, 202326
Groessl, 201726
Roseen, 202226
Sandhu, 202325
Does, 202424
Naylor, 201024
Nielssen, 201922
Day, 201924
Spangeus, 202323
Nelli, 202324
Moffat, 202321
Zeliadt, 202223
Huffman, 201923
Townsend, 200823
Ward, 202223
Van Der Merwe, 202023
Hooten, 200722
Davis, 201823
Schumann, 202023
Gibson, 202022
Van Hooff, 201217
Gilliam, 202021
Trinh, 202323
Passmore, 202223
Buchfuhrer, 202321
Barrett, 202323
Matyac, 202220
Nilsen, 201019
McCrae, 202021
Miller-Matero, 202223
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Coffee, Z.; Cheng, K.; Slebodnik, M.; Mulligan, K.; Yu, C.H.; Vanderah, T.W.; Gordon, J.S. The Impact of Nonpharmacological Interventions on Opioid Use for Chronic Noncancer Pain: A Scoping Review. Int. J. Environ. Res. Public Health 2024 , 21 , 794. https://doi.org/10.3390/ijerph21060794

Coffee Z, Cheng K, Slebodnik M, Mulligan K, Yu CH, Vanderah TW, Gordon JS. The Impact of Nonpharmacological Interventions on Opioid Use for Chronic Noncancer Pain: A Scoping Review. International Journal of Environmental Research and Public Health . 2024; 21(6):794. https://doi.org/10.3390/ijerph21060794

Coffee, Zhanette, Kevin Cheng, Maribeth Slebodnik, Kimberly Mulligan, Chong Ho Yu, Todd W. Vanderah, and Judith S. Gordon. 2024. "The Impact of Nonpharmacological Interventions on Opioid Use for Chronic Noncancer Pain: A Scoping Review" International Journal of Environmental Research and Public Health 21, no. 6: 794. https://doi.org/10.3390/ijerph21060794

Article Metrics

Article access statistics, supplementary material.

ZIP-Document (ZIP, 279 KiB)

Further Information

Mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

COMMENTS

  1. Systematic Reviews: Step 7: Extract Data from Included Studies

    A librarian can advise you on data extraction for your systematic review, including: What the data extraction stage of the review entails; Finding examples in the literature of similar reviews and their completed data tables; How to choose what data to extract from your included articles ; How to create a randomized sample of citations for a ...

  2. Data extraction methods for systematic review (semi)automation: Update

    Background: The reliable and usable (semi)automation of data extraction can support the field of systematic review by reducing the workload required to gather information about the conduct and results of the included studies. This living systematic review examines published approaches for data extraction from reports of clinical studies. Methods: We systematically and continually search PubMed ...

  3. Summarising good practice guidelines for data extraction for systematic

    Data extraction is the process of a systematic review that occurs between identifying eligible studies and analysing the data, whether it can be a qualitative synthesis or a quantitative synthesis involving the pooling of data in a meta-analysis. The aims of data extraction are to obtain information about the included studies in terms of the characteristics of each study and its population and ...

  4. A practical guide to data analysis in general literature reviews

    This article is a practical guide to conducting data analysis in general literature reviews. The general literature review is a synthesis and analysis of published research on a relevant clinical issue, and is a common format for academic theses at the bachelor's and master's levels in nursing, physiotherapy, occupational therapy, public health and other related fields.

  5. Development, testing and use of data extraction forms in systematic

    Methods. We reviewed guidance on the development and pilot testing of data extraction forms and the data extraction process. We reviewed four types of sources: 1) methodological handbooks of systematic review organisations (SRO); 2) textbooks on conducting systematic reviews; 3) method documents from health technology assessment (HTA) agencies and 4) journal articles.

  6. Chapter 5: Collecting data

    Training of data extractors is intended to familiarize them with the review topic and methods, the data collection form or data system, and issues that may arise during data extraction. Results of the pilot testing of the form should prompt discussion among review authors and extractors of ambiguous questions or responses to establish consistency.

  7. Data extraction methods for systematic review (semi)automation: A

    1. Review published methods and tools aimed at automating or semi-automating the process of data extraction in the context of a systematic review of medical research studies. 2. Review this evidence in the scope of a living review, keeping information up to date and relevant to the challenges faced by systematic reviewers at any time.

  8. Guidance on Conducting a Systematic Literature Review

    The entire literature review process, including literature search, data extraction and analysis, and reporting, should be tailored to answer the research question (Kitchenham and Charters 2007). Second, choose a review type suitable for the review purpose.

  9. Systematic Reviews and Meta-Analyses: Data Extraction

    Data Extraction Templates. Data extraction is often performed using a single form to extract data from all included (relevant) studies in a uniform manner.Because the data extraction stage is driven by the scope and goals of a systematic review, there is not a gold standard or one-size-fits all approach to developing a data extraction form.. However, there are templates and guidance available ...

  10. PDF Data Extraction for Intervention Systematic Reviews

    the review workflow. Data extraction Data extraction is the process of collecting and gathering data from various sources. It identifies relevant information and extracts it from documents, databases, or other sources. Data extraction focuses on obtaining the necessary data that will be used for subsequent analysis, synthesis or evaluation

  11. Step 7: Data Extraction & Charting

    What the data extraction stage of the review entails; Finding examples in the literature of similar reviews and their completed data tables; How to choose what data to extract from your included articles ; How to create a randomized sample of citations for a pilot test; Export specific data elements from the included studies like title, authors ...

  12. JABSOM Library: Systematic Review Toolbox: Data Extraction

    Extracting data from reviewed studies should be done in accordance to pre-established guidelines, such as the ones from PRISMA. From each included study, the following data may need to be extracted, depending on the review's purpose: title, author, year, journal, research question and specific aims, conceptual framework, hypothesis, research ...

  13. Data extraction and comparison for complex systematic reviews: a step

    Data extraction (DE) is a challenging step in systematic reviews (SRs). Complex SRs can involve multiple interventions and/or outcomes and encompass multiple research questions. Attempts have been made to clarify DE aspects focusing on the subsequent meta-analysis; there are, however, no guidelines for DE in complex SRs. Comparing datasets extracted independently by pairs of reviewers to ...

  14. Five tips for developing useful literature summary tables for writing

    Literature reviews offer a critical synthesis of empirical and theoretical literature to assess the strength of evidence, develop guidelines for practice and policymaking, and identify areas for future research.1 It is often essential and usually the first task in any research endeavour, particularly in masters or doctoral level education. For effective data extraction and rigorous synthesis ...

  15. A Guide to Evidence Synthesis: 10. Data Extraction

    For an overview on RevMan, including how it may be used to extract and analyze data, watch the RevMan Web Quickstart Guide or check out the RevMan Knowledge Base. SRDR. SRDR (Systematic Review Data Repository) is a Web-based tool for the extraction and management of data for systematic review or meta-analysis. It is also an open and searchable ...

  16. PDF Data Extraction Templates in Systematic Literature Reviews: How

    Systematic literature reviews (SLR) are the foundation informing clinical and cost-effectiveness analyses in healthcare decision-making. Established guidelines have encouraged the use of standardised data extraction templates (DET) to guide extraction, ensure transparency in information collected across the studies and allow qualitative and/or ...

  17. Data Extraction/Coding/Study characteristics/Results

    Types of literature review, methods, & resources; Protocol and registration; Search strategy; Medical Literature Databases to search; ... The next step is for the researchers to read the full text of each article identified for inclusion in the review and extract the pertinent data using a standardized data extraction/coding form. The data ...

  18. A Systematic Literature Review on Big Data Extraction ...

    Data analytics offers myriad advantages to modern-day organizations, for example, organizations are able to derive knowledge and intelligence to make strategic decisions through data analytics [5, 17].In addition to decision support and other numerous advantages reported in literature [], data analytics provides an avenue for organizations to make accurate forecast about sales, revenue ...

  19. Validity of data extraction in evidence synthesis practice of adverse

    Objectives To investigate the validity of data extraction in systematic reviews of adverse events, the effect of data extraction errors on the results, and to develop a classification framework for data extraction errors to support further methodological research. Design Reproducibility study. Data sources PubMed was searched for eligible systematic reviews published between 1 January 2015 and ...

  20. Systematic Literature Review of Information Extraction From Textual

    Information extraction (IE) is a challenging task, particularly when dealing with highly heterogeneous data. State-of-the-art data mining technologies struggle to process information from textual data. Therefore, various IE techniques have been developed to enable the use of IE for textual data. However, each technique differs from one another because it is designed for different data types ...

  21. The Future of Research: AI-Driven Automation in Systematic Reviews

    The systematic literature review (SLR) is the gold standard that provides firm scientific evidence to support decision-making. SLRs play a vital role in offering a holistic assessment of efficacy, safety, and cost-effectiveness of a diagnostic aid or therapy by synthesizing data from various clinical studies.

  22. Forensic journalism: A sistematic literature review

    This Systematic Literature Review (SLR) rests on two ideas: that "there is no research in this area that stems directly from the communication curriculum" (Walker, 2014: 36), and an attempt to respond to Sellnow et al.'s (2015) call to link communication and forensic education. The potential impact of bridging the media and criminal ...

  23. Automating data extraction in systematic reviews: a systematic review

    Despite their widely acknowledged usefulness [], the process of systematic review, specifically the data extraction step (step 4), can be time-consuming.In fact, it typically takes 2.5-6.5 years for a primary study publication to be included and published in a new systematic review [].Further, within 2 years of the publication of systematic reviews, 23 % are out of date because they have not ...

  24. JBI Evidence Synthesis

    The draft data extraction tool will be piloted on 3 sources and modified as necessary. All completed sources will be reviewed if the data extraction tool is revised. Modifications will be detailed in the scoping review. Any disagreements that arise between the reviewers will be resolved through discussion or with a third reviewer.

  25. systematic review of literature examining the application of a social

    Data extraction and analysis. A systematic search of the literature identified 222 eligible papers for inclusion in the final review. A data extraction table was used to extract information regarding location of the research, type of paper (e.g. review, empirical), service of interest and key findings.

  26. Prevalence of mental, behavioural or neurodevelopmental disorders

    The data extraction form will be piloted on a sample of the included studies and possibly modified. Inclusion of a primary source provided; we intend to contact authors for further information when necessary. Concerning the data extraction—in alignment with the aims of this project—our current data extraction form contains the following items:

  27. Development, testing and use of data extraction forms in systematic

    Data extraction forms link systematic reviews with primary research and provide the foundation for appraising, analysing, summarising and interpreting a body of evidence. This makes their development, pilot testing and use a crucial part of the systematic reviews process. Several studies have shown that data extraction errors are frequent in systematic reviews, especially regarding outcome data.

  28. Patient safety in orthodontic care: a scoping literature review with

    Studies providing information about the cycle's steps related to orthodontics were included. Study selection and data extraction were performed by two of the authors. Results. A total of 3,923 articles were retrieved. After review of titles and abstracts, 41 articles were selected for full-text review and 25 articles were eligible for inclusion.

  29. Automating data extraction in systematic reviews: a systematic review

    Automation of the parts of systematic review process, specifically the data extraction step, may be an important strategy to reduce the time necessary to complete a systematic review. However, the state of the science of automatically extracting data elements from full texts has not been well described. This paper performs a systematic review of published and unpublished methods to automate ...

  30. IJERPH

    The present scoping review followed recommendations for rigorous reviews with four independent reviewers conducting the literature search, extracting data, and assessing study quality. To identify as many applicable studies as possible and reduce the risk of bias for this review, a thorough and highly sensitive search strategy was employed.