University of Texas

  • University of Texas Libraries
  • UT Libraries

Systematic Reviews & Evidence Synthesis Methods

  • Types of Reviews
  • Formulate Question
  • Find Existing Reviews & Protocols
  • Register a Protocol
  • Searching Systematically
  • Supplementary Searching
  • Managing Results
  • Deduplication
  • Critical Appraisal
  • Glossary of terms
  • Librarian Support
  • Video tutorials This link opens in a new window
  • Systematic Review & Evidence Synthesis Boot Camp

Once you have completed your analysis, you will want to both summarize and synthesize those results. You may have a qualitative synthesis, a quantitative synthesis, or both.

Qualitative Synthesis

In a qualitative synthesis, you describe for readers how the pieces of your work fit together. You will summarize, compare, and contrast the characteristics and findings, exploring the relationships between them. Further, you will discuss the relevance and applicability of the evidence to your research question. You will also analyze the strengths and weaknesses of the body of evidence. Focus on where the gaps are in the evidence and provide recommendations for further research.

Quantitative Synthesis

Whether or not your Systematic Review includes a full meta-analysis, there is typically some element of data analysis. The quantitative synthesis combines and analyzes the evidence using statistical techniques. This includes comparing methodological similarities and differences and potentially the quality of the studies conducted.

Summarizing vs. Synthesizing

In a systematic review, researchers do more than summarize findings from identified articles. You will synthesize the information you want to include.

While a summary is a way of concisely relating important themes and elements from a larger work or works in a condensed form, a synthesis takes the information from a variety of works and combines them together to create something new.

Synthesis :

"The goal of a systematic synthesis of qualitative research is to integrate or compare the results across studies in order to increase understanding of a particular phenomenon, not to add studies together. Typically the aim is to identify broader themes or new theories – qualitative syntheses usually result in a narrative summary of cross-cutting or emerging themes or constructs, and/or conceptual models."

Denner, J., Marsh, E. & Campe, S. (2017). Approaches to reviewing research in education. In D. Wyse, N. Selwyn, & E. Smith (Eds.), The BERA/SAGE Handbook of educational research (Vol. 2, pp. 143-164). doi: 10.4135/9781473983953.n7

  • Approaches to Reviewing Research in Education from Sage Knowledge

Data synthesis  (Collaboration for Environmental Evidence Guidebook)

Interpreting findings and and reporting conduct   (Collaboration for Environmental Evidence Guidebook)

Interpreting results and drawing conclusions  (Cochrane Handbook, Chapter 15)

Guidance on the conduct of narrative synthesis in systematic reviews  (ESRC Methods Programme)

  • Last Updated: May 16, 2024 11:05 AM
  • URL: https://guides.lib.utexas.edu/systematicreviews

Creative Commons License

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • For authors
  • Browse by collection
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 8, Issue 2
  • Improving Conduct and Reporting of Narrative Synthesis of Quantitative Data (ICONS-Quant): protocol for a mixed methods study to develop a reporting guideline
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • Mhairi Campbell 1 ,
  • Srinivasa Vittal Katikireddi 1 ,
  • Amanda Sowden 2 ,
  • Joanne E McKenzie 3 ,
  • Hilary Thomson 1
  • 1 MRC/CSO Social and Public Health Sciences Unit , University of Glasgow , Glasgow , UK
  • 2 Centre for Reviews and Dissemination , University of York , York , UK
  • 3 School of Public Health and Preventive Medicine , Monash University , Melbourne , Victoria , Australia
  • Correspondence to Ms Mhairi Campbell; Mhairi.Campbell{at}glasgow.ac.uk

Introduction Reliable evidence syntheses, based on rigorous systematic reviews, provide essential support for evidence-informed clinical practice and health policy. Systematic reviews should use reproducible and transparent methods to draw conclusions from the available body of evidence. Narrative synthesis of quantitative data (NS) is a method commonly used in systematic reviews where it may not be appropriate, or possible, to meta-analyse estimates of intervention effects. A common criticism of NS is that it is opaque and subject to author interpretation, casting doubt on the trustworthiness of a review’s conclusions. Despite published guidance funded by the UK’s Economic and Social Research Council on the conduct of NS, recent work suggests that this guidance is rarely used and many review authors appear to be unclear about best practice. To improve the way that NS is conducted and reported, we are developing a reporting guideline for NS of quantitative data.

Methods We will assess how NS is implemented and reported in Cochrane systematic reviews and the findings will inform the creation of a Delphi consensus exercise by an expert panel. We will use this Delphi survey to develop a checklist for reporting standards for NS. This will be accompanied by supplementary guidance on the conduct and reporting of NS, as well as an online training resource.

Ethics and dissemination Ethical approval for the Delphi survey was obtained from the University of Glasgow in December 2017 (reference 400170060). Dissemination of the results of this study will be through peer-reviewed publications, and national and international conferences.

  • evidence synthesis
  • health policy

This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: http://creativecommons.org/licenses/by/4.0/

https://doi.org/10.1136/bmjopen-2017-020064

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

This study will be the first to develop a consensus-based reporting guideline for narrative synthesis of quantitative data (NS) in systematic reviews.

The study follows the recommended methodology for developing reporting standards.

The online Delphi survey of international experts in NS will be an effective method of gaining reliable consensus from a group of experts.

The reporting guideline and the supplementary materials developed to support use of existing guidance will aid the implementation of best practice conduct and reporting of NS.

Introduction 

Well-conducted systematic reviews are important for informing clinical practice and health policy. 1 In some reviews, meta-analysis of effect estimates may not be possible or sensible. For example, data may be insufficient to allow calculation of standardised effect estimates, the effect metrics arising from different study designs may not be amenable to synthesis (eg, those arising from interrupted time series and randomised trials), or high levels of statistical heterogeneity may mean that presenting an average effect is misleading. For reviews of quantitative data where statistical synthesis is not possible, narrative synthesis of quantitative data (NS) is often the alternative method of choice. A major concern about NS is that it lacks transparency and therefore introduces bias into the synthesis. 2 3 This is an important criticism, which raises questions about the validity and utility of reviews using NS, and ultimately increases the risk of adding to research waste. 4 NS involves collating study findings into a coherent textual narrative, with descriptions of differences in characteristics of the studies including context and validity, often using tables and graphs to display results. 5 6 Published guidance for NS funded by the UK’s Economic and Social  Research Council (ESRC) describes techniques for promoting transparency between review level data and conclusions; these include graphical and structured tabulation of the data. 5 However, a recent analysis of systematic reviews of public health interventions suggests that this guidance is rarely used. 7

Relative to developments in meta-analysis or statistical synthesis, and synthesis of qualitative data in the past decade, work to support improved conduct and transparent reporting in NS has been scarce. While a reporting guideline has been developed for systematic reviews and meta-analysis, the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA), 8 the focus of the synthesis items is on meta-analysis of effect estimates, with no items for alternative approaches to synthesis. The Cochrane Methodological Expectations of Cochrane Intervention Reviews (MECIR) standards for conducting and reporting Cochrane reviews specify one general item referring to non-quantitative synthesis or non-statistical synthesis, and do not have any items specifically for NS. 2 Reporting guidelines have had some impact on improving the reporting for randomised trials and may have similar benefits for improving the reporting of methods and results from NS. 9

There is a growing demand for reviews addressing complex questions, and which incorporate diverse sources of data. Cochrane, a global leader in evidence synthesis of health and public health interventions, has recognised this. 10 Following the prioritisation of relevance and breadth of coverage in the Cochrane strategy, 10 it is likely that the proportion of Cochrane reviews addressing complex questions will increase; this may result in increased use of NS methods. Realising the need for improved implementation and reporting of NS methods, the Cochrane Strategic Methods Fund has funded the ICONS-Quant project: Improving Conduct and Reporting of Narrative Synthesis of Quantitative Data. This paper presents the protocol for the work that will be undertaken.

ICONS-Quant

The ICONS-Quant project aims to improve the implementation of NS methods through enhancing existing guidance on the conduct of NS and developing a reporting guideline. Provision of reporting guidelines alone will not necessarily lead to improved research conduct; provision of explanatory guidance, dissemination, endorsement and support for adherence is also necessary. 11 We will produce materials to support the implementation of best practice in the application of NS methods, and improved reporting. While our focus is on Cochrane reviews, the key outputs of the project will be of use for reviews published elsewhere and will be made freely available. We will:

describe current practice in conduct and reporting of NS in Cochrane reviews;

achieve expert consensus on reporting standards for NS;

provide support for those involved in NS through the provision of enhanced guidance on NS conduct and online training resources.

We intend ICONS-Quant guideline to be used in combination with the PRISMA guidelines. 8 The PRISMA guidelines provide items relating to the various stages of review conduct, for example, providing a clear abstract, explaining the literature search strategy, reporting methods to assess risk of bias. The ICONS-Quant reporting guideline will focus on the methods of synthesis, relating most closely to expanding on PRISMA Item 14 ‘synthesis of results’, outlining details that require to be reported to promote transparency in NS.

Methods and analysis

The ICONS-Quant project will be conducted over a period of 24 months from May 2017. Here, we outline the development of a reporting guideline for NS and supporting materials for existing guidance. In line with recommendations for best practice in developing reporting guidelines, 11 we will:

identify the need for the ICONS-Quant guideline (Work Programme One);

conduct a Delphi survey and consensus meeting (Work Programme Two);

enhance existing guidance on NS (Work Programme Three);

develop learning materials for implementation of NS (Work Programme Four).

Below we outline the Project Advisory Group (PAG) and the research that will be conducted within each Work Programme. Details of the ICONS-Quant project have been registered with the Enhancing the Quality and Transparency of Health Research Network, which provides a database of reporting guidelines in development ( http://www.equator-network.org/library/reporting-guidelines-under-development/ #74).

Project Advisory Group

We have established an ICONS-Quant PAG which will provide governance for the project as well as expert advice. The ICONS PAG includes named project collaborators from Cochrane Review Groups (Effective Practice and Organisation of Care, Consumers and Communication, and Tobacco Addiction), a representative with experience of NS from the Campbell Collaboration Methods Group and a user representative from the National Institute for Health and Care Excellence.

Work Programme One: assessment of current reporting and conduct of NS in Cochrane reviews

Previously we investigated current practice in the conduct and reporting of NS in systematic reviews of public health interventions. 7 Work Programme One will extend this exercise to assess use of NS methods and their reporting across all Cochrane Review Groups. We will identify all Cochrane reviews published between April 2016 and April 2017 and screen them to determine the method of synthesis for the primary outcome. Reviews will be included for further examination if the method for reporting the synthesis of the primary outcomes relies on text. We will identify those that use NS or that synthesise studies using text only, whether or not the authors refer to the use of NS or textual methods for synthesis. Reviews will be excluded if they are empty, include only one study, report on diagnostic test accuracy, or are a review of methodology. We will record how the synthesis has been conducted and reported. We will use the existing data extraction template designed for our previous assessment of NS in public health reviews. This template is based on key sources of best practice for NS, 12–15 including the ESRC guidance on the conduct of NS. 5 Questions relate to use of theory; investigation of differences across included studies and reported findings; transparency of links between data and text (including data visualisation tools used); assessment of robustness of the synthesis; and adequacy of description of NS methods. 5 Using a similar format to our review of NS in public health reviews, 16 we will tabulate the extracted data. This will allow description of:

the extent of reporting of NS methods: the amount and type of detail included;

the range of approaches and tools used to narratively synthesise data;

how conceptual and methodological heterogeneity is managed;

review authors’ reflection on robustness of synthesis.

The results of this exercise will be used to inform development of the initial checklist for inclusion in the Delphi survey.

Work Programme Two: Delphi survey

A Delphi consensus survey will be conducted. This is the standard approach to elicit expert opinion for the purposes of developing consensus-based reporting guidelines. 17 18 The results of the assessment exercise in Work Programme One, in conjunction with key texts on NS, 12–15 19 20 findings from the previous assessment of reporting NS in public health reviews 16 and input from the ICONS PAG, will be used to develop the initial items for Round One of the Delphi survey. An expert panel will then be consulted to inform the development of the Delphi survey. The panel will be identified by the project team and members of the ICONS PAG, and will comprise 15–20 authors and methodologists experienced in or familiar with the purpose and conduct of NS. A videoconference with the expert panel will be used to present findings from Work Programme One and a draft of the proposed Delphi survey. Participants’ input will be recorded and used to refine the Delphi survey.

The Delphi online survey will use a questionnaire to achieve consensus on the content and wording of reporting items considered to capture the pertinent details of NS. The online platform will be created by the MRC/CSO Social and Public Health Sciences Unit, University of Glasgow, using a web-based platform recently developed for this purpose. The platform facilitates personalised invitations to participate, password-protected logins and personalised reminders, and enables data collation for quantitative and qualitative analyses. There will be two rounds of the survey, with a third version conducted if necessary to gain consensus among participants.

Participants will include members of the ICONS PAG and others experienced in NS. Suitable participants will be identified by the project team, through recommendations from the ICONS PAG and through the data extraction exercise described in Work Programme One. The data extraction exercise will help articulate identified gaps in reporting of methods and findings of NS where transparency is particularly lacking. The identified gaps will be used when drafting reporting item questions for the Delphi consensus exercise, to improve transparency in NS. We will invite a maximum of 100 individuals to participate and they will be recruited via their workplace email address. We will ask for their professional opinion on the content of a draft reporting guideline. The invitation will outline the aim of the Delphi survey, the process involved and the time commitment, and include a participant information sheet. Individuals who accept the invitation will be asked to take part in each round of the survey. It will be clearly stated that at any stage, a respondent can opt out of the Delphi survey. The survey will ask participants to provide details of their job category; no personal information will be collected. Respondents will be asked to use their email address to log in to the survey. This information will be used only to verify the appropriate use of the survey and will not be used in the analysis. The Delphi survey will involve implied consent: it will be made clear to participants that by responding to the survey, they are consenting to participate in the study. It will be explained to respondents that: their responses will not be linked to their identity (deidentified); only researchers will have access to the data; and the data will be stored on a password-encrypted computer and stored and destroyed in accordance with Medical Research Council guidelines.

The Delphi survey will consist of closed and open-ended questions. Round One of the survey will provide an introduction to the project and instructions for the survey. The participants will be invited to rank each of the proposed guideline items on a 4-point Likert scale (essential, desirable, possible, omit, used in previous Delphi surveys for developing reporting guidelines 21 22 ). For each item, the participants will be invited to provide comments. A reminder email will be sent approximately 2 weeks after the initial invitation. Round One will close approximately 4 weeks after the first invitations are issued. Responses to Round One of the Delphi will be exported verbatim into a Microsoft Excel spreadsheet and collated. Responses to the scale rating will be summarised as counts and percentage frequencies. The free-text content will be collated and summarised. The results from both the quantitative and qualitative data collation will be used to inform the development of Round Two of the Delphi and the content of the final guideline checklist. Redrafting of the Delphi survey items will be conducted in discussion with all study group members within 1 month of closure of the round.

All participants from Round One will be invited to take part in Round Two. In Round Two, the proposed checklist items will be presented in three sections:

Items that reached high consensus in Round One and that are expected to be included in the final checklist. These items will have an a priori agreement of >70% approval, as recommended by Diamond et al . 23 Participants will not be asked to rate these items again but will be asked to comment on whether they agree with the inclusion of each item in the checklist and to provide comments if they disagree or with suggestions to clarify the wording of the items.

Items that have been significantly altered or are additional as a result of Round One. The participants will be invited to rate these items on the 4-point scale and provide comments on each item.

Items that were rated as ‘omit’ in Round One and that are not expected to be included in the final checklist. Participants will not be asked to rate these items; they will be invited to provide their opinion on the removal of these items from the final checklist.

If there is a substantial lack of consensus remaining following Round Two of the Delphi, a third round will be prepared and conducted. Round Three will follow the same format as Round Two, providing the reporting guideline items in three sections: items that are expected to be included in the final guideline; those significantly altered; and items that will be removed from the final checklist.

Consensus meeting

An expert panel of individuals experienced in NS methods will be invited to participate in the consensus meeting to finalise the content of the guideline. It is anticipated that this will be held as a face-to-face meeting at the Cochrane colloquium in 2018 in Edinburgh, UK. If this is not possible, an online consensus meeting will be conducted using webinar software. If necessary, an additional virtual meeting will be held to accommodate different time zones of invitees. At the consensus meeting the reporting guideline items developed from the Delphi survey will be discussed, with priority given to establishing consensus on the content and wording of items for which the level of consensus is less clear.

Work Programme Three: enhancement of existing guidance on NS methods

We will produce materials to support the current guidance that includes information on the rationale for, as well as implementation of each stage of NS. This will be developed as a supplement to the reporting guideline items. The enhanced guidance will be accessible to novice reviewers and will provide examples of good practice to illustrate how methods of NS may be used. The findings of Work Programmes One and Two, the assessment of current reporting of NS and the Delphi consensus will be used to inform development of enhanced guidance on NS. 5 Cochrane Review Groups who publish reviews incorporating NS will be identified through the process of Work Programme One. We anticipate that these will include a range of Cochrane Review Groups and examples will be developed which are relevant to all groups. An overview of methodological tools which can be used to support NS and which have been developed since publication of the ESRC guidance in 2006 will also be incorporated. The PAG will be asked for comments on the draft guidance before it is piloted.

Work Programme Four: development of learning materials on implementation of NS

Training materials based on the guidance developed in Work Programmes Two and Three will be produced to promote improved use of NS methods. We have secured support from Cochrane Training to collaborate in Work Programme Four. We will deliver two to three live participatory webinars (to allow for different time zones) to present the agreed guidance developed in Work Programme Two. One webinar will be recorded and provided on a web page, along with a record of the questions raised in the webinar, and any other frequently asked questions that emerge.

In addition, an online training module on NS will be developed in collaboration with Cochrane Training and a specialist e-learning company. The module will include a mix of didactic and participatory teaching methods involving assessment and interpretation of data and syntheses. We will work with Cochrane colleagues to incorporate the reporting items into the MECIR standards, and offer to update the relevant chapters of the Cochrane Handbook.

Ethics and dissemination

Dissemination of the results of this study will be through peer-reviewed publications, and national and international conferences. In addition, the objectives of Work Programmes Three and Four are to distribute and encourage use of the ICONS-Quant guideline through webinars and an online training module.

  • Posada FB ,
  • Haines A , et al
  • Higgins J ,
  • Lasserson T ,
  • Chandler J , et al
  • Valentine JC ,
  • Wilson SJ ,
  • Rindskopf D , et al
  • Glasziou P ,
  • Altman DG ,
  • Bossuyt P , et al
  • Roberts H ,
  • Sowden A , et al
  • Petticrew M ,
  • Rehfuess E ,
  • Noyes J , et al
  • Campbell M ,
  • Thomson H ,
  • Katikireddi SV , et al
  • Liberati A ,
  • Tetzlaff J , et al
  • Shamseer L ,
  • Altman DG , et al
  • 10. ↵ Cochrane Collaboration . Cochrane strategy to 2020 . 2015 http://community.cochrane.org/organizational-info/resources/strategy-2020
  • Schulz KF ,
  • Simera I , et al
  • Armstrong R ,
  • Jackson N , et al
  • Williamson PR
  • Higgins JP ,
  • 20. ↵ World Health Organization . WHO handbook for guideline development . Geneva : World Health Organization , 2014 .
  • Hoffmann TC ,
  • Glasziou PP ,
  • Boutron I , et al
  • Craig P , et al
  • Diamond IR ,
  • Feldman BM , et al

Contributors HT conceived the idea of the study. HT, SVK, AS, JEM and MC designed the study methodology. MC prepared the first draft of the protocol manuscript and all authors critically reviewed and approved the final manuscript.

Funding This project was supported by funds provided by the Cochrane Strategic Methods Fund. MC, HT and SVK receive funding from the UK Medical Research Council (MC_UU_12017-13 and MC_UU_12017-15) and the Scottish Government Chief Scientist Office (SPHSU13 and SPHSU15). SVK is supported by an NHS Research Scotland Senior Clinical Fellowship (SCAF/15/02). JEM is supported by a National Health and Medical Research Council (NHMRC) Australian Public Health Fellowship (1072366).

Disclaimer The views expressed in the protocol are those of the authors and not necessarily those of Cochrane or its registered entities, committees or working groups.

Competing interests HT and SVK are Cochrane editors. JEM is a co-convenor of the Cochrane Statistical Methods Group.

Patient consent Not required.

Ethics approval University of Glasgow College of Social Sciences Ethics Committee (reference number400170060)

Provenance and peer review Not commissioned; externally peer reviewed.

Read the full text or download the PDF:

  • Methodology
  • Open access
  • Published: 24 April 2023

Quantitative evidence synthesis: a practical guide on meta-analysis, meta-regression, and publication bias tests for environmental sciences

  • Shinichi Nakagawa   ORCID: orcid.org/0000-0002-7765-5182 1 , 2 ,
  • Yefeng Yang   ORCID: orcid.org/0000-0002-8610-4016 1 ,
  • Erin L. Macartney   ORCID: orcid.org/0000-0003-3866-143X 1 ,
  • Rebecca Spake   ORCID: orcid.org/0000-0003-4671-2225 3 &
  • Malgorzata Lagisz   ORCID: orcid.org/0000-0002-3993-6127 1  

Environmental Evidence volume  12 , Article number:  8 ( 2023 ) Cite this article

7320 Accesses

18 Citations

29 Altmetric

Metrics details

Meta-analysis is a quantitative way of synthesizing results from multiple studies to obtain reliable evidence of an intervention or phenomenon. Indeed, an increasing number of meta-analyses are conducted in environmental sciences, and resulting meta-analytic evidence is often used in environmental policies and decision-making. We conducted a survey of recent meta-analyses in environmental sciences and found poor standards of current meta-analytic practice and reporting. For example, only ~ 40% of the 73 reviewed meta-analyses reported heterogeneity (variation among effect sizes beyond sampling error), and publication bias was assessed in fewer than half. Furthermore, although almost all the meta-analyses had multiple effect sizes originating from the same studies, non-independence among effect sizes was considered in only half of the meta-analyses. To improve the implementation of meta-analysis in environmental sciences, we here outline practical guidance for conducting a meta-analysis in environmental sciences. We describe the key concepts of effect size and meta-analysis and detail procedures for fitting multilevel meta-analysis and meta-regression models and performing associated publication bias tests. We demonstrate a clear need for environmental scientists to embrace multilevel meta-analytic models, which explicitly model dependence among effect sizes, rather than the commonly used random-effects models. Further, we discuss how reporting and visual presentations of meta-analytic results can be much improved by following reporting guidelines such as PRISMA-EcoEvo (Preferred Reporting Items for Systematic Reviews and Meta-Analyses for Ecology and Evolutionary Biology). This paper, along with the accompanying online tutorial, serves as a practical guide on conducting a complete set of meta-analytic procedures (i.e., meta-analysis, heterogeneity quantification, meta-regression, publication bias tests and sensitivity analysis) and also as a gateway to more advanced, yet appropriate, methods.

Evidence synthesis is an essential part of science. The method of systematic review provides the most trusted and unbiased way to achieve the synthesis of evidence [ 1 , 2 , 3 ]. Systematic reviews often include a quantitative summary of studies on the topic of interest, referred to as a meta-analysis (for discussion on the definitions of ‘meta-analysis’, see [ 4 ]). The term meta-analysis can also mean a set of statistical techniques for quantitative data synthesis. The methodologies of the meta-analysis were initially developed and applied in medical and social sciences. However, meta-analytic methods are now used in many other fields, including environmental sciences [ 5 , 6 , 7 ]. In environmental sciences, the outcomes of meta-analyses (within systematic reviews) have been used to inform environmental and related policies (see [ 8 ]). Therefore, the reliability of meta-analytic results in environmental sciences is important beyond mere academic interests; indeed, incorrect results could lead to ineffective or sometimes harmful environmental policies [ 8 ].

As in medical and social sciences, environmental scientists frequently use traditional meta-analytic models, namely fixed-effect and random-effects models [ 9 , 10 ]. However, we contend that such models in their original formulation are no longer useful and are often incorrectly used, leading to unreliable estimates and errors. This is mainly because the traditional models assume independence among effect sizes, but almost all primary research papers include more than one effect size, and this non-independence is often not considered (e.g., [ 11 , 12 , 13 ]). Furthermore, previous reviews of published meta-analyses in environmental sciences (hereafter, ‘environmental meta-analyses’) have demonstrated that less than half report or investigate heterogeneity (inconsistency) among effect sizes [ 14 , 15 , 16 ]. Many environmental meta-analyses also do not present any sensitivity analysis, for example, for publication bias (i.e., statistically significant effects being more likely to be published, making collated data unreliable; [ 17 , 18 ]). These issues might have arisen for several reasons, for example, because of no clear conduct guideline for the statistical part of meta-analyses in environmental sciences and rapid developments in meta-analytic methods. Taken together, the field urgently requires a practical guide to implement correct meta-analyses and associated procedures (e.g., heterogeneity analysis, meta-regression, and publication bias tests; cf. [ 19 ]).

To assist environmental scientists in conducting meta-analyses, the aims of this paper are five-fold. First, we provide an overview of the processes involved in a meta-analysis while introducing some key concepts. Second, after introducing the main types of effect size measures, we mathematically describe the two commonly used traditional meta-analytic models, demonstrate their utility, and introduce a practical, multilevel meta-analytic model for environmental sciences that appropriately handles non-independence among effect sizes. Third, we show how to quantify heterogeneity (i.e., consistencies among effect sizes and/or studies) using this model, and then explain such heterogeneity using meta-regression. Fourth, we show how to test for publication bias in a meta-analysis and describe other common types of sensitivity analysis. Fifth, we cover other technical issues relevant to environmental sciences (e.g., scale and phylogenetic dependence) as well as some advanced meta-analytic techniques. In addition, these five aims (sections) are interspersed with two more sections, named ‘Notes’ on: (1) visualisation and interpretation; and (2) reporting and archiving. Some of these sections are accompanied by results from a survey of 73 environmental meta-analyses published between 2019 and 2021; survey results depict current practices and highlight associated problems (for the method of the survey, see Additional file 1 ). Importantly, we provide easy-to-follow implementations of much of what is described below, using the R package, metafor [ 20 ] and other R packages at the webpage ( https://itchyshin.github.io/Meta-analysis_tutorial/ ), which also connects the reader to the wealth of online information on meta-analysis (note that we also provide this tutorial as Additional file 2 ; see also [ 21 ]).

Overview with key concepts

Statistically speaking, we have three general objectives when conducting a meta-analysis [ 12 ]: (1) estimating an overall mean , (2) quantifying consistency ( heterogeneity ) between studies, and (3) explaining the heterogeneity (see Table 1 for the definitions of the terms in italic ). A notable feature of a meta-analysis is that an overall mean is estimated by taking the sampling variance of each effect size into account: a study (effect size) with a low sampling variance (usually based on a larger sample size) is assigned more weight in estimating an overall mean than one with a high sampling variance (usually based on a smaller sample size). However, an overall mean estimate itself is often not informative because one can get the same overall mean estimates in different ways. For example, we may get an overall estimate of zero if all studies have zero effects with no heterogeneity. In contrast, we might also obtain a zero mean across studies that have highly variable effects (e.g., ranging from strongly positive to strongly negative), signifying high heterogeneity. Therefore, quantifying indicators of heterogeneity is an essential part of a meta-analysis, necessary for interpreting the overall mean appropriately. Once we observe non-zero heterogeneity among effect sizes, then, our job is to explain this variation by running meta-regression models, and, at the same time, quantify how much variation is accounted for (often quantified as R 2 ). In addition, it is important to conduct an extra set of analyses, often referred to as publication bias tests , which are a type of sensitivity analysis [ 11 ], to check the robustness of meta-analytic results.

Choosing an effect size measure

In this section, we introduce different kinds of ‘effect size measures’ or ‘effect measures’. In the literature, the term ‘effect size’ is typically used to refer to the magnitude or strength of an effect of interest or its biological interpretation (e.g., environmental significance). Effect sizes can be quantified using a range of measures (for details, see [ 22 ]). In our survey of environmental meta-analyses (Additional file 1 ), the two most commonly used effect size measures are: the logarithm of response ratio, lnRR ([ 23 ]; also known as the ratio of means; [ 24 ]) and standardized mean difference, SMD (often referred to as Hedges’ g or Cohen’s d [ 25 , 26 ]). These are followed by proportion (%) and Fisher’s z -transformation of correlation, or Zr . These four effect measures nearly fit into the three categories, which are named: (1) single-group measures (a statistical summary from one group; e.g., proportion), (2) comparative measures (comparing between two groups e.g., SMD and lnRR), and (3) association measures (relationships between two variables; e.g., Zr ). Table 2 summarizes effect measures that are common or potentially useful for environmental scientists. It is important to note that any measures with sampling variance can become an ‘effect size’. The main reason why SMD, lnRR, Zr, or proportion are popular effect measures is that they are unitless, while a meta-analysis of mean, or mean difference, can only be conducted when all effect sizes have the same unit (e.g., cm, kg).

Table 2 also includes effect measures that are likely to be unfamiliar to environmental scientists; these are effect sizes that characterise differences in the observed variability between samples, (i.e., lnSD, lnCV, lnVR and lnCVR; [ 27 , 28 ]) rather than central tendencies (averages). These dispersion-based effect measures can provide us with extra insights along with average-based effect measures. Although the literature survey showed none of these were used in our sample, these effect sizes have been used in many fields, including agriculture (e.g., [ 29 ]), ecology (e.g., [ 30 ]), evolutionary biology (e.g., [ 31 ]), psychology (e.g., [ 32 ]), education (e.g., [ 33 ]), psychiatry (e.g., [ 34 ]), and neurosciences (e.g. [ 35 ],),. Perhaps, it is not difficult to think of an environmental intervention that can affect not only the mean but also the variance of measurements taken on a group of individuals or a set of plots. For example, environmental stressors such as pesticides and eutrophication are likely to increase variability in biological systems because stress accentuates individual differences in environmental responses (e.g. [ 36 , 37 ],). Such ideas are yet to be tested meta-analytically (cf. [ 38 , 39 ]).

Choosing a meta-analytic model

Fixed-effect and random-effects models.

Two traditional meta-analytic models are called the ‘fixed-effect’ model and the ‘random-effects’ model. The former assumes that all effect sizes (from different studies) come from one population (i.e., they have one true overall mean), while the latter does not have such an assumption (i.e., each study has different overall means or heterogeneity exists among studies; see below for more). The fixed-effect model, which should probably be more correctly referred to as the ‘common-effect’ model, can be written as [ 9 , 10 , 40 ]:

where the intercept, \({\beta }_{0}\) is the overall mean, z j (the response/dependent variable) is the effect size from the j th study ( j  = 1, 2,…, N study ; in this model, N study  = the number of studies = the number of effect sizes), m j is the sampling error, related to the j th sampling variance ( v j ), which is normally distributed with the mean of 0 and the ‘study-specific’ sampling variance, v j (see also Fig.  1 A).

figure 1

Visualisation of the three statistical models of meta-analysis: A a fixed-effect model (1-level), B a random-effects model (2-level), and C a multilevel model (3-level; see the text for what symbols mean)

The overall mean needs to be estimated and often done so as the weighted average with the weights, \({w}_{j}=1/{v}_{j}\) (i.e., the inverse-variance approach). An important, but sometimes untenable, assumption of meta-analysis is that sampling variance is known. Indeed, we estimate sampling variance, using formulas, as in Table 2 , meaning that vj is submitted by sampling variance estimates (see also section ‘ Scale dependence ’). Of relevance, the use of the inverse-variance approach has been recently criticized, especially for SMD and lnRR [ 41 , 42 ] and we note that the inverse-variance approach using the formulas in Table 2 is one of several different weighting approaches used in meta-analysis (e.g., for adjusted sampling-variance weighing, see [ 43 , 44 ]; for sample-size-based weighting, see [ 41 , 42 , 45 , 46 ]). Importantly, the fixed-effect model assumes that the only source of variation in effect sizes ( z j ) is the effect due to sampling variance (which is inversely proportional to the sample size, n ; Table 2 ).

Similarly, the random-effects model can be expressed as:

where u j is the j th study effect, which is normally distributed with the mean of 0 and the between-study variance, \({\tau }^{2}\) (for different estimation methods, see [ 47 , 48 , 49 , 50 ]), and other notations are the same as in Eq.  1 (Fig.  1 B). Here, the overall mean can be estimated as the weighted average with weights \({w}_{j}=1/\left({\tau }^{2}+{v}_{j}^{2}\right)\) (note that different weighting approaches, mentioned above, are applicable to the random-effects model and some of them are to the multilevel model, introduced below). The model assumes each study has its specific mean, \({b}_{0}+{u}_{j}\) , and (in)consistencies among studies (effect sizes) are indicated by \({\tau }^{2}\) . When \({\tau }^{2}\) is 0 (or not statistically different from 0), the random-effects model simplifies to the fixed-effect model (cf. Equations  1 and 2 ). Given no studies in environmental sciences are conducted in the same manner or even at exactly the same place and time, we should expect different studies to have different means. Therefore, in almost all cases in the environmental sciences, the random-effects model is a more ‘realistic’ model [ 9 , 10 , 40 ]. Accordingly, most environmental meta-analyses (68.5%; 50 out of 73 studies) in our survey used the random-effects model, while only 2.7% (2 of 73 studies) used the fixed-effect model (Additional file 1 ).

Multilevel meta-analytic models

Although we have introduced the random-effects model as being more realistic than the fixed-effect model (Eq.  2 ), we argue that the random-effects model is rather limited and impractical for the environmental sciences. This is because random-effects models, like fixed-effect models, assume all effect sizes ( z j ) to be independent. However, when multiple effect sizes are obtained from a study, these effect sizes are dependent (for more details, see the next section on non-independence). Indeed, our survey showed that in almost all datasets used in environmental meta-analyses, this type of non-independence among effect sizes occurred (97.3%; 71 out of 73 studies, with two studies being unclear, so effectively 100%; Additional file 1 ). Therefore, we propose the simplest and most practical meta-analytic model for environmental sciences as [ 13 , 40 ] (see also [ 51 , 52 ]):

where we explicitly recognize that N effect ( i  = 1, 2,…, N effect ) >  N study ( j  = 1, 2,…, N study ) and, therefore, we now have the study effect (between-study effect), u j[i] (for the j th study and i th effect size) and effect-size level (within-study) effect, e i (for the i th effect size), with the between-study variance, \({\tau }^{2}\) , and with-study variance, \({\sigma }^{2}\) , respectively, and other notations are the same as above. We note that this model (Eq.  3 ) is an extension of the random-effects model (Eq.  2 ), and we refer to it as the multilevel/hierarchical model (used in 7 out of 73 studies: 9.6% [Additional file 1 ]; note that Eq.  3 is also known as a three-level meta-analytic model; Fig.  1 C). Also, environmental scientists who are familiar with (generalised) linear mixed-models may recognize u j (the study effect) as the effect of a random factor which is associated with a variance component, i.e., \({\tau }^{2}\) [ 53 ]; also, e i and m i can be seen as parts of random factors, associated with \({\sigma }^{2}\) and v i (the former is comparable to the residuals, while the latter is sampling variance, specific to a given effect size).

It seems that many researchers are aware of the issue of non-independence so that they often use average effect sizes per study or choose one effect size (at least 28.8%, 21 out of 73 environmental meta-analyses; Additional file 1 ). However, as we discussed elsewhere [ 13 , 40 ], such averaging or selection of one effect size per study dramatically reduces our ability to investigate environmental drivers of variation among effect sizes [ 13 ]. Therefore, we strongly support the use of the multilevel model. Nevertheless, this proposed multilevel model, formulated as Eq.  3 does not usually deal with the issue of non-independence completely, which we elaborate on in the next section.

Non-independence among effect sizes and among sampling errors

When you have multiple effect sizes from a study, there are two broad types and three cases of non-independence (cf. [ 11 , 12 ]): (1) effect sizes are calculated from different cohorts of individuals (or groups of plots) within a study (Fig.  2 A, referred to as ‘shared study identity’), and (2) effects sizes are calculated from the same cohort of individuals (or group of plots; Fig.  2 B, referred to as ‘shared measurements’) or partially from the same individuals and plots, more concretely, sharing individuals and plots from the control group (Fig.  2 C, referred to as ‘shared control group’). The first type of non-independence induces dependence among effect sizes, but not among sampling variances, and the second type leads to non-independence among sampling variances. Many datasets, if not almost all, will have a combination of these three cases (or even are more complex, see the section " Complex non-independence "). Failing to deal with these non-independences will inflate Type 1 error (note that the overall estimate, b 0 is unlikely to be biased, but standard error of b 0 , se( b 0 ), will be underestimated; note that this is also true for all other regression coefficients, e.g., b 1 ; see Table 1 ). The multilevel model (as in Eq.  3 ) only takes care of cases of non-independence that are due to the shared study identity but neither shared measurements nor shared control group.

figure 2

Visualisation of the three types of non-independence among effect sizes: A due to shared study identities (effect sizes from the same study), B due to shared measurements (effect sizes come from the same group of individuals/plots but are based on different types of measurements), and C due to shared control (effect sizes are calculated using the same control group and multiple treatment groups; see the text for more details)

There are two practical ways to deal with non-independence among sampling variances. The first method is that we explicitly model such dependence using a variance–covariance (VCV) matrix (used in 6 out of 73 studies: 8.2%; Additional file 1 ). Imagine a simple scenario with a dataset of three effect sizes from two studies where two effects sizes from the first study are calculated (partially) using the same cohort of individuals (Fig.  2 B); in such a case, the sampling variance effect, \({m}_{i}\) , as in Eq.  3 , should be written as:

where M is the VCV matrix showing the sampling variances, \({v}_{1\left[1\right]}\) (study 1 and effect size 1), \({v}_{1\left[2\right]}\) (study 1 and effect size 2), and \({v}_{2\left[3\right]}\) (study 2 and effect size 3) in its diagonal, and sampling covariance, \(\rho \sqrt{{v}_{1\left[1\right]}{v}_{1\left[2\right]}}= \rho \sqrt{{v}_{1\left[2\right]}{v}_{1\left[1\right]}}\) in its off-diagonal elements, where \(\rho \) is a correlation between two sampling variances due to shared samples (individuals/plots). Once this VCV matrix is incorporated into the multilevel model (Eq.  3 ), all the types of non-independence, as in Fig.  2 , are taken care of. Table 3 shows formulas for the sampling variance and covariance of the four common effect sizes (SDM, lnRR, proportion and Zr ). For comparative effect measures (Table 2 ), exact covariances can be calculated under the case of ‘shared control group’ (see [ 54 , 55 ]). But this is not feasible for most circumstances because we usually do not know what \(\rho \) should be. Some have suggested fixing this value at 0.5 (e.g., [ 11 ]) or 0.8 (e.g., [ 56 ]); the latter is a more conservative assumption. Or one can run both and use one for the main analysis and the other for sensitivity analysis (for more, see the ‘ Conducting sensitivity analysis and critical appraisal " section).

The second method overcomes this very issue of unknown \(\rho \) by approximating average dependence among sampling variance (and effect sizes) from the data and incorporating such dependence to estimate standard errors (only used in 1 out of 73 studies; Additional file 1 ). This method is known as ‘robust variance estimation’, RVE, and the original estimator was proposed by Hedges and colleagues in 2010 [ 57 ]. Meta-analysis using RVE is relatively new, and this method has been applied to multilevel meta-analytic models only recently [ 58 ]. Note that the random-effects model (Eq.  2 ) and RVE could correctly model both types of non-independence. However, we do not recommend the use of RVE with Eq.  2 because, as we will later show, estimating \({\sigma }^{2}\) as well as \({\tau }^{2}\) will constitute an important part of understanding and gaining more insights from one’s data. We do not yet have a definite recommendation on which method to use to account for non-independence among sampling errors (using the VCV matrix or RVE). This is because no simulation work in the context of multilevel meta-analysis has been done so far, using multilevel meta-analyses [ 13 , 58 ]. For now, one could use both VCV matrices and RVE in the same model [ 58 ] (see also [ 21 ]).

Quantifying and explaining heterogeneity

Measuring consistencies with heterogeneity.

As mentioned earlier, quantifying heterogeneity among effect sizes is an essential component of any meta-analysis. Yet, our survey showed only 28 out of 73 environmental meta-analyses (38.4%; Additional file 1 ) report at least one index of heterogeneity (e.g., \({\tau }^{2}\) , Q , and I 2 ). Conventionally, the presence of heterogeneity is tested by Cochrane’s Q test. However, Q (often noted as Q T or Q total ), and its associated p value, are not particularly informative: the test does not tell us about the extent of heterogeneity (e.g. [ 10 ],), only whether heterogeneity is zero or not (when p  < 0.05). Therefore, for environmental scientists, we recommend two common ways of quantifying heterogeneity from a meta-analytic model: absolute heterogeneity measure (i.e., variance components, \({\tau }^{2}\) and \({\sigma }^{2}\) ) and relative heterogeneity measure (i.e., I 2 ; see also the " Notes on visualisation and interpretation " section for another way of quantifying and visualising heterogeneity at the same time, using prediction intervals; see also [ 59 ]). We have already covered the absolute measure (Eqs.  2 & 3 ), so here we explain I 2 , which ranges from 0 to 1 (for some caveats for I 2 , see [ 60 , 61 ]). The heterogeneity measure, I 2 , for the random-effect model (Eq.  2 ) can be written as:

Where \(\overline{v}\) is referred to as the typical sampling variance (originally this is called ‘within-study’ variance, as in Eq.  2 , and note that in this formulation, within-study effect and the effect of sampling error is confounded; see [ 62 , 63 ]; see also [ 64 ]) and the other notations are as above. As you can see from Eq.  5 , we can interpret I 2 as relative variation due to differences between studies (between-study variance) or relative variation not due to sampling variance.

By seeing I 2 as a type of interclass correlation (also known as repeatability [ 65 ],), we can generalize I 2 to multilevel models. In the case of Eq.  3 ([ 40 , 66 ]; see also [ 52 ]), we have:

Because we can have two more I 2 , Eq.  7 is written as \({I}_{total}^{2}\) ; these other two are \({I}_{study}^{2}\) and \({I}_{effect}^{2}\) , respectively:

\({I}_{total}^{2}\) represents relative variance due to differences both between and within studies (between- and within-study variance) or relative variation not due to sampling variance, while \({I}_{study}^{2}\) is relative variation due to differences between studies, and \({I}_{effect}^{2}\) is relative variation due to differences within studies (Fig.  3 A). Once heterogeneity is quantified (note almost all data will have non-zero heterogeneity and an earlier meta-meta-analysis suggests in ecology, we have on average, I 2 close to 90% [ 66 ]), it is time to fit a meta-regression model to explain the heterogeneity. Notably, the magnitude of \({I}_{study}^{2}\) (and \({\tau }^{2}\) ) and \({I}_{effect}^{2}\) (and \({\sigma }^{2}\) ) can already inform you which predictor variable (usually referred to as ‘moderator’) is likely to be important, which we explain in the next section.

figure 3

Visualisation of variation (heterogeneity) partitioned into different variance components: A quantifying different types of I 2 from a multilevel model (3-level; see Fig.  1 C) and B variance explained, R 2 , by moderators. Note that different levels of variances would be explained, depending on which level a moderator belongs to (study level and effect-size level)

Explaining variance with meta-regression

We can extend the multilevel model (Eq.  3 ) to a meta-regression model with one moderator (also known as predictor, independent, explanatory variable, or fixed factor), as below:

where \({\beta }_{1}\) is a slope of the moderator ( x 1 ), \({x}_{1j\left[i\right]}\) denotes the value of x 1 , corresponding to the j th study (and the i th effect sizes). Equation ( 10 ) (meta-regression) is comparable to the simplest regression with the intercept ( \({\beta }_{0}\) ) and slope ( \({\beta }_{1}\) ). Notably, \({x}_{1j\left[i\right]}\) differs between studies and, therefore, it will mainly explain the variance component, \({\tau }^{2}\) (which relates to \({I}_{study}^{2}\) ). On the other hand, if noted like \({x}_{1i}\) , this moderator would vary within studies or at the level of effect sizes, therefore, explaining \({\sigma }^{2}\) (relating to \({I}_{effect}^{2}\) ). Therefore, when \({\tau }^{2}\) ( \({I}_{study}^{2}\) ), or \({\sigma }^{2}\) ( \({I}_{effect}^{2}\) ), is close to zero, there will be little point fitting a moderator(s) at the level of studies, or effect sizes, respectively.

As in multiple regression, we can have multiple (multi-moderator) meta-regression, which can be written as:

where \(\sum_{h=1}^{q}{\beta }_{h}{x}_{h\left[i\right]}\) denotes the sum of all the moderator effects, with q being the number of slopes (staring with h  = 1). We note that q is not necessarily the number of moderators. This is because when we have a categorical moderator, which is common, with more than two levels (e.g., method A, B & C), the fixed effect part of the formula is \({\beta }_{0}+{\beta }_{1}{x}_{1}+{\beta }_{2}{x}_{2}\) , where x 1 and x 2 are ‘dummy’ variables, which code whether the i th effect size belongs to, for example, method B or C, with \({\beta }_{1}\) and \({\beta }_{2}\) being contrasts between A and B and between A and C, respectively (for more explanations of dummy variables, see our tutorial page [ https://itchyshin.github.io/Meta-analysis_tutorial/ ]; also see [ 67 , 68 ]). Traditionally, researchers conduct separate meta-analyses per different groups (known as ‘sub-group analysis’), but we prefer a meta-regression approach with a categorical variable, which is statistically more powerful [ 40 ]. Also, importantly, what can be used as a moderator(s) is very flexible, including, for example, individual/plot characteristics (e.g., age, location), environmental factors (e.g., temperature), methodological differences between studies (e.g., randomization), and bibliometric information (e.g., publication year; see more in the section ‘Checking for publication bias and robustness’). Note that moderators should be decided and listed a priori in the meta-analysis plan (i.e., a review protocol or pre-registration).

As with meta-analysis, the Q -test ( Q m or Q moderator ) is often used to test the significance of the moderator(s). To complement this test, we can also quantify variance explained by the moderator(s) using R 2 . We can define R 2 using Eq. ( 11 ) as:

where R 2 is known as marginal R 2 (sensu [ 69 , 70 ]; cf. [ 71 ]), \({f}^{2}\) is the variance due to the moderator(s), and \({(f}^{2}+{\tau }^{2}+{\sigma }^{2})\) here equals to \(({\tau }^{2}+{\sigma }^{2})\) in Eq.  7 , as \({f}^{2}\) ‘absorbs’ variance from \({\tau }^{2}\) and/or \({\sigma }^{2}\) . We can compare the similarities and differences in Fig.  3 B where we denote a part of \({f}^{2}\) originating from \({\tau }^{2}\) as \({f}_{study}^{2}\) while \({\sigma }^{2}\) as \({f}_{effect}^{2}\) . In a multiple meta-regression model, we often want to find a model with the ‘best’ or an adequate set of predictors (i.e., moderators). R 2 can potentially help such a model selection process. Yet, methods based on information criteria (such as Akaike information criterion, AIC) may be preferable. Although model selection based on the information criteria is beyond the scope of the paper, we refer the reader to relevant articles (e.g., [ 72 , 73 ]), and we show an example of this procedure in our online tutorial ( https://itchyshin.github.io/Meta-analysis_tutorial/ ).

Notes on visualisation and interpretation

Visualization and interpretation of results is an essential part of a meta-analysis [ 74 , 75 ]. Traditionally, a forest plot is used to display the values and 95% of confidence intervals (CIs) for each effect size and the overall effect and its 95% CI (the diamond symbol is often used, as shown in Fig.  4 A). More recently, adding a 95% prediction interval (PI) to the overall estimate has been strongly recommended because 95% PIs show a predicted range of values in which an effect size from a new study would fall, assuming there is no sampling error [ 76 ]. Here, we think that examining the formulas for 95% CIs and PIs for the overall mean (from Eq.  3 ) is illuminating:

where \({t}_{df\left[\alpha =0.05\right]}\) denotes the t value with the degree of freedom, df , at 97.5 percentile (or \(\alpha =0.05\) ) and other notations are as above. In a meta-analysis, it has been conventional to use z value 1.96 instead of \({t}_{df\left[\alpha =0.05\right]}\) , but simulation studies have shown the use of t value over z value reduces Type 1 errors under many scenarios and, therefore, is recommended (e.g., [ 13 , 77 ]). Also, it is interesting to note that by plotting 95% PIs, we can visualize heterogeneity as Eq.  15 includes \({\tau }^{2}\) and \({\sigma }^{2}\) .

figure 4

Different types of plots useful for a meta-analysis using data from Midolo et al. [ 133 ]: A a typical forest plot with the overall mean shown as a diamond at the bottom (20 effect sizes from 20 studies are used), B a caterpillar plot (100 effect sizes from 24 studies are used), C an orchard plot of categorical moderator with seven levels (all effect sizes are used), and D a bubble plot of a continuous moderator. Note that the first two only show confidence intervals, while the latter two also show prediction intervals (see the text for more details)

A ‘forest’ plot can become quickly illegible as the number of studies (effect sizes) becomes large, so other methods of visualizing the distribution of effect sizes have been suggested. Some suggested to present a ‘caterpillar’ plot, which is a version of the forest plot, instead (Fig.  4 B; e.g., [ 78 ]). We here recommend an ‘orchard’ plot, as it can present results across different groups (or a result of meta-regression with a categorical variable), as shown in Fig.  4 C [ 78 ]. For visualization of a continuous variable, we suggest what is called a ‘bubble’ plot, shown in Fig.  4 D. Visualization not only helps us interpret meta-analytic results, but can also help to identify something we may not see from statistical results, such as influential data points and outliers that could threaten the robustness of our results.

Checking for publication bias and robustness

Detecting and correcting for publication bias.

Checking for and adjusting for any publication bias is necessary to ensure the validity of meta-analytic inferences [ 79 ]. However, our survey showed almost half of the environmental meta-analyses (46.6%; 34 out of 73 studies; Additional file 1 ) neither tested for nor corrected for publication bias (cf. [ 14 , 15 , 16 ]). The most popular methods used were: (1) graphical tests using funnel plots (26 studies; 35.6%), (2) regression-based tests such as Egger regression (18 studies; 24.7%), (3) Fail-safe number tests (12 studies; 16.4%), and (4) trim-and-fill tests (10 studies; 13.7%). We recently showed that these methods are unsuitable for datasets with non-independent effect sizes, with the exception of funnel plots [ 80 ] (for an example of funnel plots, see Fig.  5 A). This is because these methods cannot deal with non-independence in the same way as the fixed-effect and random-effects models. Here, we only introduce a two-step method for multilevel models that can both detect and correct for publication bias [ 80 ] (originally proposed by [ 81 , 82 ]), more specifically, the “small study effect” where an effect size value from a small-sample-sized study can be much larger in magnitude than a ‘true’ effect [ 83 , 84 ]. This method is a simple extension of Egger’s regression [ 85 ], which can be easily implemented by using Eq.  10 :

where \({\widetilde{n}}_{i}\) is known as effective sample size; for Zr and proportion it is just n i , and for SMD and lnRR, it is \({n}_{iC}{n}_{iT}/\left({n}_{iC}+{n}_{iT}\right)\) , as in Table 2 . When \({\beta }_{1}\) is significant, we conclude there exists a small-study effect (in terms of a funnel plot, this is equivalent to significant funnel asymmetry). Then, we fit Eq.  17 and we look at the intercept \({\beta }_{0}\) , which will be a bias-corrected overall estimate [note that \({\beta }_{0}\) in Eq. ( 16 ) provides less accurate estimates when non-zero overall effects exist [ 81 , 82 ]; Fig.  5 B]. An intuitive explanation of why \({\beta }_{0}\) (Eq.  17 ) is the ‘bias-corrected’ estimate is that the intercept represents \(1/\widetilde{{n}_{i}}=0\) (or \(\widetilde{{n}_{i}}=\infty \) ); in other words, \({\beta }_{0}\) is the estimate of the overall effect when we have a very large (infinite) sample size. Of note, appropriate bias correction requires a selection-mode-based approach although such an approach is yet to be available for multilevel meta-analytic models [ 80 ].

figure 5

Different types of plots for publication bias tests: A a funnel plot using model residuals, showing a funnel (white) that shows the region of statistical non-significance (30 effect sizes from 30 studies are used; note that we used the inverse of standard errors for the y -axis, but for some effect sizes, sample size or ‘effective’ sample size may be more appropriate), B a bubble plot visualising a multilevel meta-regression that tests for the small study effect (note that the slope was non-significant: b  = 0.120, 95% CI = [− 0.095, 0.334]; all effect sizes are used), and C a bubble plot visualising a multilevel meta-regression that tests for the decline effect (the slope was non-significant: b  = 0.003, 95%CI = [− 0.002, 0.008])

Conveniently, this proposed framework can be extended to test for another type of publication bias, known as time-lag bias, or the decline effect, where effect sizes tend to get closer to zero over time, as larger or statistically significant effects are published more quickly than smaller or non-statistically significant effects [ 86 , 87 ]. Again, a decline effect can be statistically tested by adding year to Eq. ( 3 ):

where \(c\left(yea{r}_{j\left[i\right]}\right)\) is the mean-centred publication year of a particular study (study j and effect size i ); this centring makes the intercept \({\beta }_{0}\) meaningful, representing the overall effect estimate at the mean value of publication years (see [ 68 ]). When the slope is significantly different from 0, we deem that we have a decline effect (or time-lag bias; Fig.  5 C).

However, there may be some confounding moderators, which need to be modelled together. Indeed, Egger’s regression (Eqs.  16 and 17 ) is known to detect the funnel asymmetry when there is little heterogeneity; this means that we need to model \(\sqrt{1/{\widetilde{n}}_{i}}\) with other moderators that account for heterogeneity. Given this, we probably should use a multiple meta-regression model, as below:

where \(\sum_{h=3}^{q}{\beta }_{h}{x}_{h\left[i\right]}\) is the sum of the other moderator effects apart from the small-study effect and decline effect, and other notations are as above (for more details see [ 80 ]). We need to carefully consider which moderators should go into Eq.  19 (e.g., fitting all moderators or using an AIC-based model selection method; see [ 72 , 73 ]). Of relevance, when running complex models, some model parameters cannot be estimated well, or they are not ‘identifiable’ [ 88 ]. This is especially so for variance components (random-effect part) rather than regression coeffects (fixed-effect part). Therefore, it is advisable to check whether model parameters are all identifiable, which can be checked using the profile function in metafor (for an example, see our tutorial webpage [ https://itchyshin.github.io/Meta-analysis_tutorial/ ]).

Conducting sensitivity analysis and critical appraisal

Sensitivity analysis explores the robustness of meta-analytic results by running a different set of analyses from the original analysis, and comparing the results (note that some consider publication bias tests a part of sensitivity analysis; [ 11 ]). For example, we might be interested in assessing how robust results are to the presence of influential studies, to the choice of method for addressing non-independence, or weighting effect sizes. Unfortunately, in our survey, only 37% of environmental meta-analyses (27 out of 73) conducted sensitivity analysis (Additional file 1 ). There are two general and interrelated ways to conduct sensitivity analyses [ 73 , 89 , 90 ]. The first one is to take out influential studies (e.g., outliers) and re-run meta-analytic and meta-regression models. We can also systematically take each effect size out and run a series of meta-analytic models to see whether any resulting overall effect estimates are different from others; this method is known as ‘leave-one-out’, which is considered less subjective and thus recommended.

The second way of approaching sensitivity analysis is known as subset analysis, where a certain group of effect sizes (studies) will be excluded to re-run the models without this group of effect sizes. For example, one may want to run an analysis without studies that did not randomize samples. Yet, as mentioned earlier, we recommend using meta-regression (Eq.  13 ) with a categorical variable of randomization status (‘randomized’ or ‘not randomized’), to statistically test for an influence of moderators. It is important to note that such tests for risk of bias (or study quality) can be considered as a way of quantitatively evaluating the importance of study features that were noted at the stage of critical appraisal, which is an essential part of any systematic review (see [ 11 , 91 ]). In other words, we can use meta-regression or subset analysis to quantitatively conduct critical appraisal using (study-level) moderators that code, for example, blinding, randomization, and selective reporting. Despite the importance of critical appraisal ([ 91 ]), only 4 of 73 environmental meta-analyses (5.6%) in our survey assessed the risk of bias in each study included in a meta-analysis (i.e., evaluating a primary study in terms of the internal validity of study design and reporting; Additional file 1 ). We emphasize that critically appraising each paper or checking them for risk of bias is an extremely important topic. Also, critical appraisal is not restricted to quantitative synthesis. Therefore, we do not cover any further in this paper for more, see [ 92 , 93 ]).

Notes on transparent reporting and open archiving

For environmental systematic reviews and maps, there are reporting guidelines called RepOrting standards for Systematic Evidence Syntheses in environmental research, ROSES [ 94 ] and synthesis assessment checklist, the Collaboration for Environmental Evidence Synthesis Appraisal Tool (CEESAT; [ 95 ]). However, these guidelines are somewhat limited in terms of reporting quantitative synthesis because they cover only a few core items. These two guidelines are complemented by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses for Ecology and Evolutionary Biology (PRISMA-EcoEvo; [ 96 ]; cf. [ 97 , 98 ]), which provides an extended set of reporting items covering what we have described above. Items 20–24 from PRISMA-EcoEvo are most relevant: these items outline what should be reported in the Methods section: (i) sample sizes and study characteristics, (ii) meta-analysis, (iii) heterogeneity, (iv) meta-regression and (v) outcomes of publication bias and sensitivity analysis (see Table 4 ). Our survey, as well as earlier surveys, suggest there is a large room for improvement in the current practice ([ 14 , 15 , 16 ]). Incidentally, the orchard plot is well aligned with Item 20, as this plot type shows both the number of effect sizes and studies for different groups (Fig.  4 C). Further, our survey of environmental meta-analyses highlighted the poor standards of data openness (with 24 studies sharing data: 32.9%) and code sharing (7 studies: 29.2%; Additional file 1 ). Environmental scientists must archive their data as well as their analysis code in accordance with the FAIR principles (Findable, Accessible, Interoperable, and Reusable [ 99 ]) using dedicated depositories such as Dryad, FigShare, Open Science Framework (OSF), Zenodo or others (cf. [ 100 , 101 ]), preferably not on publisher’s webpages (as paywall may block access). However, archiving itself is not enough; data requires metadata (detailed descriptions) and the code needs to also be FAIR [ 102 , 103 ].

Other relevant and advanced issues

Scale dependence.

The issue of scale dependence is a unique yet widespread problem in environmental sciences (see [ 7 , 104 ]); our literature survey indicated three quarters of the environmental meta-analyses (56 out of 73 studies) have inferences that are potentially vulnerable to scale-dependence [ 105 ]. For example, studies that set out to compare group means in biodiversity measures, such as species richness, can vary as a function of the scale (size) of the sampling unit. When the unit of replication is a plot (not an individual animal or plant), the aerial size of a plot (e.g., 100 cm 2 or 1 km 2 ) will affect both the precision and accuracy of effect size estimates (e.g., lnRR and SMD). In general, a study with larger plots might have more accurately estimated species richness differences, but less precisely than a study with smaller plots and greater replication. Lower replication means that our sampling variance estimates are likely to be misestimated, and the study with larger plots will generally have less weight than the study with smaller plots, due to higher sampling variance. Inaccurate variance estimates in little-replicated ecological studies are known to cause an accumulating bias in precision-weighted meta-analysis, requiring correction [ 43 ]. To assess the potential for scale-dependence, it is recommended that analysts test for possible covariation among plot size, replication, variances, and effect sizes [ 104 ]. If detected, analysts should use an effect size measure that is less sensitive to scale dependence (lnRR), and could use the size of a plot as a moderator in meta-regression, or alternatively, they consider running an unweighted model ([ 7 ]; note that only 12%, 9 out of 73 studies, accounted for sampling area in some way; Additional file 1 ).

  • Missing data

In many fields, meta-analytic data almost always encompass missing values see [ 106 , 107 , 108 ]. Broadly, we have two types of missing data in meta-analyses [ 109 , 110 ]: (1) missing data in standard deviations or sample sizes, associated with means, preventing effect size calculations (Table 2 ), and (2) missing data in moderators. There are several solutions for both types. The best, and first to try, should be contacting the authors. If this fails, we can potentially ‘impute’ missing data. Single imputation methods using the strong correlation between standard deviation and mean values (known as mean–variance relationship) are available, although single imputation can lead to Type I error [ 106 , 107 ] (see also [ 43 ]) because we do not model the uncertainty of imputation itself. Contrastingly, multiple imputation, which creates multiple versions of imputed datasets, incorporates such uncertainty. Indeed, multiple imputation is a preferred and proven solution for missing data in effect sizes and moderators [ 109 , 110 ]. Yet, correct implementation can be challenging (see [ 110 ]). What we require now is an automated pipeline of merging meta-analysis and multiple imputation, which accounts for imputation uncertainty, although it may be challenging for complex meta-analytic models. Fortunately, however, for lnRR, there is a series of new methods that can perform better than the conventional method and which can deal with missing SDs [ 44 ]; note that these methods do not deal with missing moderators. Therefore, where applicable, we recommend these new methods, until an easy-to-implement multiple imputation workflow arrives.

Complex non-independence

Above, we have only dealt with the model that includes study identities as a clustering/grouping (random) factor. However, many datasets are more complex, with potentially more clustering variables in addition to the study identity. It is certainly possible that an environmental meta-analysis contains data from multiple species. Such a situation creates an interesting dependence among effect sizes from different species, known as phylogenetic relatedness, where closely related species are more likely to be similar in effect sizes compared to distantly related ones (e.g., mice vs. rats and mice vs. sparrows). Our multilevel model framework is flexible and can accommodate phylogenetic relatedness. A phylogenetic multilevel meta-analytic model can be written as [ 40 , 111 , 112 ]:

where \({a}_{k\left[i\right]}\) is the phylogenetic (species) effect for the k th species (effect size i ; N effect ( i  = 1, 2,…, N effect ) >  N study ( j  = 1, 2,…, N study ) >  N species ( k  = 1, 2,…, N species )), normally distributed with \({\omega }^{2}{\text{A}}\) where is the phylogenetic variance and A is a correlation matrix coding how close each species are to each other and \({\omega }^{2}\) is the phylogenetic variance, \({s}_{k\left[i\right]}\) is the non-phylogenetic (species) effect for the k th species (effect size i ), normally distributed with the variance of \({\gamma }^{2}\) (the non-phylogenetic variance), and other notations are as above. It is important to realize that A explicitly models relatedness among species, and we do need to provide this correlation matrix, using a distance relationship usually derived from a molecular-based phylogenetic tree (for more details, see [ 40 , 111 , 112 ]). Some may think that the non-phylogenetic term ( \({s}_{k\left[i\right]}\) ) is unnecessary or redundant because \({s}_{k\left[i\right]}\) and the phylogenetic term ( \({a}_{k\left[i\right]}\) ) are both modelling variance at the species level. However, a simulation recently demonstrated that failing to have the non-phylogenetic term ( \({s}_{k\left[i\right]}\) ) will often inflate the phylogenetic variance \({\omega }^{2}\) , leading to an incorrect conclusion that there is a strong phylogenetic signal (as shown in [ 112 ]). The non-phylogenetic variance ( \({\gamma }^{2}\) ) arises from, for example, ecological similarities among species (herbivores vs. carnivores or arboreal vs. ground-living) not phylogeny [ 40 ].

Like phylogenetic relatedness, effect sizes arising from closer geographical locations are likely to be more correlated [ 113 ]. Statistically, spatial correlation can be also modelled in a manner analogous to phylogenetic relatedness (i.e., rather than a phylogenetic correlation matrix, A , we fit a spatial correlation matrix). For example, Maire and colleagues [ 114 ] used a meta-analytic model with spatial autocorrelation to investigate the temporal trends of fish communities in the network of rivers in France. We note that a similar argument can be made for temporal correlation, but in many cases, temporal correlations could be dealt with, albeit less accurately, as a special case of ‘shared measurements’, as in Fig.  2 . An important idea to take away is that one can model different, if not all, types of non-independence as the random factor(s) in a multilevel model.

Advanced techniques

Here we touch upon five advanced meta-analytic techniques with potential utility for environmental sciences, providing relevant references so that interested readers can obtain more information on these advanced topics. The first one is the meta-analysis of magnitudes, or absolute values (effect sizes), where researchers may be interested in deviations from 0, rather than the directionality of the effect [ 115 ]. For example, Cohen and colleagues [ 116 ] investigated absolute values of phenological responses, as they were concerned with the magnitudes of changes in phenology rather than directionality.

The second method is the meta-analysis of interaction where our focus is on synthesizing the interaction effect of, usually, 2 × 2 factorial design (e.g., the effect of two simultaneous environmental stressors [ 54 , 117 , 118 ]; see also [ 119 ]). Recently, Siviter and colleagues [ 120 ] showed that agrochemicals interact synergistically (i.e., non-additively) to increase the mortality of bees; that is, two agrochemicals together caused more mortality than the sum of mortalities of each chemical.

Third, network meta-analysis has been heavily used in medical sciences; network meta-analysis usually compares different treatments in relation to placebo and ranks these treatments in terms of effectiveness [ 121 ]. The very first ‘environmental’ network meta-analysis, as far as we know, investigated the effectives of ecosystem services among different land types [ 122 ].

Fourth, a multivariate meta-analysis is where one can model two or more different types of effect sizes with the estimation of pair-wise correlations between different effect sizes. The benefit of such an approach is known as the ‘borrowing of strength’, where the error of fixed effects (moderators; e.g., b 0 and b 1 ) can be reduced when different types of effect sizes are correlated (i.e., se ( b 0 ) and se ( b 1 ) can be smaller [ 123 ]) For example, it is possible for lnRR (differences in mean) and lnVR (differences in SDs) to be modelled together (cf. [ 124 ]).

Fifth, as with network meta-analysis, there has been a surge in the use of ‘individual participants data’, called ‘IPD meta-analysis’, in medical sciences [ 125 , 126 ]. The idea of IPD meta-analysis is simple—rather than using summary statistics reported in papers (sample means and variances), we directly use raw data from all studies. We can either model raw data using one complex multilevel (hierarchical) model (one-step method) or calculate statistics for each study and use a meta-analysis (two-step method; note that both methods will usually give the same results). Study-level random effects can be incorporated to allow the response variable of interest to vary among studies, and overall effects correspond to fixed, population-level estimates. The use of IPD or ‘full-data analyses’ has also surged in ecology, aided by open-science policies that encourage the archival of raw data alongside articles, and initiatives that synthesise raw data (e.g., PREDICTS [ 127 ], BioTime [ 128 ]). In health disciplines, such meta-analyses are considered the ‘gold standard’ [ 129 ], owing to their potential for resolving issues regarding study-specific designs and confounding variation, and it is unclear whether and how they might resolve issues such as scale dependence in environmental meta-analyses [ 104 , 130 ].

Conclusions

In this article, we have attempted to describe the most practical ways to conduct quantitative synthesis, including meta-analysis, meta-regression, and publication bias tests. In addition, we have shown that there is much to be improved in terms of meta-analytic practice and reporting via a survey of 73 recent environmental meta-analyses. Such improvements are urgently required, especially given the potential influence that environmental meta-analyses can have on policies and decision-making [ 8 ]. So often, meta-analysts have called for better reporting of primary research (e.g. [ 131 , 132 ]), and now this is the time to raise the standards of reporting in meta-analyses. We hope our contribution will help to catalyse a turning point for better practice in quantitative synthesis in environmental sciences. We remind the reader most of what is described is implemented in the R environment on our tutorial webpage and researchers can readily use the proposed models and techniques ( https://itchyshin.github.io/Meta-analysis_tutorial/ ). Finally, meta-analytic techniques are always developing and improving. It is certainly possible that in the future, our proposed models and related methods will become dated, just as the traditional fixed-effect and random-effects models already are. Therefore, we must endeavour to be open-minded to new ways of doing quantitative research synthesis in environmental sciences.

Availability of data and materials

All data and material are provided as additional files.

Higgins JP, Thomas JE, Chandler JE, Cumpston ME, Li TE, Page MJ, Welch VA. Cochrane handbook for systematic reviews of interventions. 2nd ed. Chichester: Wikey; 2019.

Book   Google Scholar  

Cooper HM, Hedges LV, Valentine JC. The handbook of research synthesis and meta-analysis . 3rd ed. New York: Russell Sage Foundation; 2019.

Google Scholar  

Schmid CH, Stijnen TE, White IE. Handbook of meta-analysis. 1st ed. Boca Ranton: CRC; 2021.

Vetter D, Rucker G, Storch I. Meta-analysis: a need for well-defined usage in ecology and conservation biology. Ecosphere. 2013;4(6):1.

Article   Google Scholar  

Koricheva J, Gurevitch J, Mengersen K, editors. Handbook of meta-analysis in ecology and evolution. Princeton: Princeton Univesity Press; 2017.

Gurevitch J, Koricheva J, Nakagawa S, Stewart G. Meta-analysis and the science of research synthesis. Nature. 2018;555(7695):175–82.

Article   CAS   Google Scholar  

Spake R, Doncaster CP. Use of meta-analysis in forest biodiversity research: key challenges and considerations. Forest Ecol Manag. 2017;400:429–37.

Bilotta GS, Milner AM, Boyd I. On the use of systematic reviews to inform environmental policies. Environ Sci Policy. 2014;42:67–77.

Hedges LV, Vevea JL. Fixed- and random-effects models in meta-analysis. Psychol Methods. 1998;3(4):486–504.

Borenstein M, Hedges LV, Higgins JPT, Rothstein H. Introduction to meta-analysis. 2nd ed. Chichester: Wiley; 2021.

Noble DWA, Lagisz M, Odea RE, Nakagawa S. Nonindependence and sensitivity analyses in ecological and evolutionary meta-analyses. Mol Ecol. 2017;26(9):2410–25.

Nakagawa S, Noble DWA, Senior AM, Lagisz M. Meta-evaluation of meta-analysis: ten appraisal questions for biologists. Bmc Biol. 2017;15:1.

Nakagawa S, Senior AM, Viechtbauer W, Noble DWA. An assessment of statistical methods for nonindependent data in ecological meta-analyses: comment. Ecology. 2022;103(1): e03490.

Romanelli JP, Meli P, Naves RP, Alves MC, Rodrigues RR. Reliability of evidence-review methods in restoration ecology. Conserv Biol. 2021;35(1):142–54.

Koricheva J, Gurevitch J. Uses and misuses of meta-analysis in plant ecology. J Ecol. 2014;102(4):828–44.

O’Leary BC, Kvist K, Bayliss HR, Derroire G, Healey JR, Hughes K, Kleinschroth F, Sciberras M, Woodcock P, Pullin AS. The reliability of evidence review methodology in environmental science and conservation. Environ Sci Policy. 2016;64:75–82.

Rosenthal R. The “file drawer problem” and tolerance for null results. Psychol Bull. 1979;86(3):638–41.

Nakagawa S, Lagisz M, Jennions MD, Koricheva J, Noble DWA, Parker TH, Sánchez-Tójar A, Yang Y, O’Dea RE. Methods for testing publication bias in ecological and evolutionary meta-analyses. Methods Ecol Evol. 2022;13(1):4–21.

Cheung MWL. A guide to conducting a meta-analysis with non-independent effect sizes. Neuropsychol Rev. 2019;29(4):387–96.

Viechtbauer W. Conducting meta-analyses in R with the metafor package. J Stat Softw. 2010;36(3):1–48.

Yang Y, Macleod M, Pan J, Lagisz M, Nakagawa S. Advanced methods and implementations for the meta-analyses of animal models: current practices and future recommendations. Neurosci Biobehav Rev. 2022. https://doi.org/10.1016/j.neubiorev.2022.105016:105016 .

Nakagawa S, Cuthill IC. Effect size, confidence interval and statistical significance: a practical guide for biologists. Biol Rev. 2007;82(4):591–605.

Hedges LV, Gurevitch J, Curtis PS. The meta-analysis of response ratios in experimental ecology. Ecology. 1999;80(4):1150–6.

Friedrich JO, Adhikari NKJ, Beyene J. The ratio of means method as an alternative to mean differences for analyzing continuous outcome variables in meta-analysis: A simulation study. BMC Med Res Methodol. 2008;8:5.

Hedges L, Olkin I. Statistical methods for meta-analysis. New York: Academic Press; 1985.

Cohen J. Statistical power analysis for the beahvioral sciences. 2nd ed. Hillsdale: Lawrence Erlbaum; 1988.

Senior AM, Viechtbauer W, Nakagawa S. Revisiting and expanding the meta-analysis of variation: the log coefficient of variation ratio. Res Synth Methods. 2020;11(4):553–67.

Nakagawa S, Poulin R, Mengersen K, Reinhold K, Engqvist L, Lagisz M, Senior AM. Meta-analysis of variation: ecological and evolutionary applications and beyond. Methods Ecol Evol. 2015;6(2):143–52.

Knapp S, van der Heijden MGA. A global meta-analysis of yield stability in organic and conservation agriculture. Nat Commun. 2018;9:3632.

Porturas LD, Anneberg TJ, Cure AE, Wang SP, Althoff DM, Segraves KA. A meta-analysis of whole genome duplication and theeffects on flowering traits in plants. Am J Bot. 2019;106(3):469–76.

Janicke T, Morrow EH. Operational sex ratio predicts the opportunity and direction of sexual selection across animals. Ecol Lett. 2018;21(3):384–91.

Chamberlain R, Brunswick N, Siev J, McManus IC. Meta-analytic findings reveal lower means but higher variances in visuospatial ability in dyslexia. Brit J Psychol. 2018;109(4):897–916.

O’Dea RE, Lagisz M, Jennions MD, Nakagawa S. Gender differences in individual variation in academic grades fail to fit expected patterns for STEM. Nat Commun. 2018;9:3777.

Brugger SP, Angelescu I, Abi-Dargham A, Mizrahi R, Shahrezaei V, Howes OD. Heterogeneity of striatal dopamine function in schizophrenia: meta-analysis of variance. Biol Psychiat. 2020;87(3):215–24.

Usui T, Macleod MR, McCann SK, Senior AM, Nakagawa S. Meta-analysis of variation suggests that embracing variability improves both replicability and generalizability in preclinical research. Plos Biol. 2021;19(5): e3001009.

Hoffmann AA, Merila J. Heritable variation and evolution under favourable and unfavourable conditions. Trends Ecol Evol. 1999;14(3):96–101.

Wood CW, Brodie ED 3rd. Environmental effects on the structure of the G-matrix. Evolution. 2015;69(11):2927–40.

Hillebrand H, Donohue I, Harpole WS, Hodapp D, Kucera M, Lewandowska AM, Merder J, Montoya JM, Freund JA. Thresholds for ecological responses to global change do not emerge from empirical data. Nat Ecol Evol. 2020;4(11):1502.

Yang YF, Hillebrand H, Lagisz M, Cleasby I, Nakagawa S. Low statistical power and overestimated anthropogenic impacts, exacerbated by publication bias, dominate field studies in global change biology. Global Change Biol. 2022;28(3):969–89.

Nakagawa S, Santos ESA. Methodological issues and advances in biological meta-analysis. Evol Ecol. 2012;26(5):1253–74.

Bakbergenuly I, Hoaglin DC, Kulinskaya E. Estimation in meta-analyses of response ratios. BMC Med Res Methodol. 2020;20(1):1.

Bakbergenuly I, Hoaglin DC, Kulinskaya E. Estimation in meta-analyses of mean difference and standardized mean difference. Stat Med. 2020;39(2):171–91.

Doncaster CP, Spake R. Correction for bias in meta-analysis of little-replicated studies. Methods Ecol Evol. 2018;9(3):634–44.

Nakagawa S, Noble DW, Lagisz M, Spake R, Viechtbauer W, Senior AM. A robust and readily implementable method for the meta-analysis of response ratios with and without missing standard deviations. Ecol Lett. 2023;26(2):232–44

Hamman EA, Pappalardo P, Bence JR, Peacor SD, Osenberg CW. Bias in meta-analyses using Hedges’ d. Ecosphere. 2018;9(9): e02419.

Bakbergenuly I, Hoaglin DC, Kulinskaya E. On the Q statistic with constant weights for standardized mean difference. Brit J Math Stat Psy. 2022;75(3):444–65.

DerSimonian R, Kacker R. Random-effects model for meta-analysis of clinical trials: an update. Contemp Clin Trials. 2007;28(2):105–14.

Veroniki AA, Jackson D, Viechtbauer W, Bender R, Bowden J, Knapp G, Kuss O, Higgins JPT, Langan D, Salanti G. Methods to estimate the between-study variance and its uncertainty in meta-analysis. Res Synth Methods. 2016;7(1):55–79.

Langan D, Higgins JPT, Simmonds M. Comparative performance of heterogeneity variance estimators in meta-analysis: a review of simulation studies. Res Synth Methods. 2017;8(2):181–98.

Panityakul T, Bumrungsup C, Knapp G. On estimating residual heterogeneity in random-effects meta-regression: a comparative study. J Stat Theory Appl. 2013;12(3):253–65.

Bishop J, Nakagawa S. Quantifying crop pollinator dependence and its heterogeneity using multi-level meta-analysis. J Appl Ecol. 2021;58(5):1030–42.

Cheung MWL. Modeling dependent effect sizes with three-level meta-analyses: a structural equation modeling approach. Psychol Methods. 2014;19(2):211–29.

Bolker BM, Brooks ME, Clark CJ, Geange SW, Poulsen JR, Stevens MHH, White JSS. Generalized linear mixed models: a practical guide for ecology and evolution. Trends Ecol Evol. 2009;24(3):127–35.

Lajeunesse MJ. On the meta-analysis of response ratios for studies with correlated and multi-group designs. Ecology. 2011;92(11):2049–55.

Gleser LJ, Olkin I. Stochastically dependent effect sizes. In: Cooper H, Hedges LV, Valentine JC, editors. The handbook of research synthesis and meta-analysis. New York: Russell Sage Foundation; 2009.

Tipton E, Pustejovsky JE. Small-sample adjustments for tests of moderators and model fit using robust variance estimation in meta-regression. J Educ Behav Stat. 2015;40(6):604–34.

Hedges LV, Tipton E, Johnson MC. Robust variance estimation in meta-regression with dependent effect size estimates (vol 1, pg 39, 2010). Res Synth Methods. 2010;1(2):164–5.

Pustejovsky JE, Tipton E. Meta-analysis with robust variance estimation: expanding the range of working models. Prev Sci. 2021. https://doi.org/10.1007/s11121-021-01246-3 .

Cairns M, Prendergast LA. On ratio measures of heterogeneity for meta-analyses. Res Synth Methods. 2022;13(1):28–47.

Borenstein M, Higgins JPT, Hedges LV, Rothstein HR. Basics of meta-analysis: I2 is not an absolute measure of heterogeneity. Res Synth Methods. 2017;8(1):5–18.

Hoaglin DC. Practical challenges of I-2 as a measure of heterogeneity. Res Synth Methods. 2017;8(3):254–254.

Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med. 2002;21(11):1539–58.

Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. Brit Med J. 2003;327(7414):557–60.

Xiong CJ, Miller JP, Morris JC. Measuring study-specific heterogeneity in meta-analysis: application to an antecedent biomarker study of Alzheimer’s disease. Stat Biopharm Res. 2010;2(3):300–9.

Nakagawa S, Schielzeth H. Repeatability for Gaussian and non-Gaussian data: a practical guide for biologists. Biol Rev. 2010;85(4):935–56.

Senior AM, Grueber CE, Kamiya T, Lagisz M, O’Dwyer K, Santos ESA, Nakagawa S. Heterogeneity in ecological and evolutionary meta-analyses: its magnitude and implications. Ecology. 2016;97(12):3293–9.

Gelman A, Hill J. Data analysis using regression and multilevel/hierarchical models. Cambridge: Cambridge University Press; 2007.

Schielzeth H. Simple means to improve the interpretability of regression coefficients. Methods Ecol Evol. 2010;1(2):103–13.

Nakagawa S, Schielzeth H. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods Ecol Evol. 2013;4(2):133–42.

Nakagawa S, Johnson PCD, Schielzeth H. The coefficient of determination R-2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. J R Soc Interface. 2017;14(134):20170213.

Aloe AM, Becker BJ, Pigott TD. An alternative to R-2 for assessing linear models of effect size. Res Synth Methods. 2010;1(3–4):272–83.

Cinar O, Umbanhowar J, Hoeksema JD, Viechtbauer W. Using information-theoretic approaches for model selection in meta-analysis. Res Synth Methods. 2021. https://doi.org/10.1002/jrsm.1489 .

Viechtbauer W. Model checking in meta-analysis. In: Schmid CH, Stijnen T, White IR, editors. Handbook of meta-analysis. Boca Raton: CRC; 2021.

Anzures-Cabrera J, Higgins JPT. Graphical displays for meta-analysis: An overview with suggestions for practice. Res Synth Methods. 2010;1(1):66–80.

Kossmeier M, Tran US, Voracek M. Charting the landscape of graphical displays for meta-analysis and systematic reviews: a comprehensive review, taxonomy, and feature analysis. Bmc Med Res Methodol. 2020;20(1):1.

Intout J, Ioannidis JPA, Rovers MM, Goeman JJ. Plea for routinely presenting prediction intervals in meta-analysis. BMJ Open. 2016;6(7): e010247.

Moeyaert M, Ugille M, Beretvas SN, Ferron J, Bunuan R, Van den Noortgate W. Methods for dealing with multiple outcomes in meta-analysis a comparison between averaging effect sizes, robust variance estimation and multilevel meta-analysis. Int J Soc Res Methodol. 2017;20:559.

Nakagawa S, Lagisz M, O’Dea RE, Rutkowska J, Yang YF, Noble DWA, Senior AM. The orchard plot: cultivating a forest plot for use in ecology, evolution, and beyond. Res Synth Methods. 2021;12(1):4–12.

Rothstein H, Sutton AJ, Borenstein M. Publication bias in meta-analysis : prevention, assessment and adjustments. Hoboken: Wiley; 2005.

Nakagawa S, Lagisz M, Jennions MD, Koricheva J, Noble DWA, Parker TH, Sanchez-Tojar A, Yang YF, O’Dea RE. Methods for testing publication bias in ecological and evolutionary meta-analyses. Methods Ecol Evol. 2022;13(1):4–21.

Stanley TD, Doucouliagos H. Meta-regression analysis in economics and business. New York: Routledge; 2012.

Stanley TD, Doucouliagos H. Meta-regression approximations to reduce publication selection bias. Res Synth Methods. 2014;5(1):60–78.

Sterne JAC, Becker BJ, Egger M. The funnel plot. In: Rothstein H, Sutton AJ, Borenstein M, editors. Publication bias in meta-analysis: prevention, assessment and adjustments. Chichester: Wiley; 2005. p. 75–98.

Sterne JAC, Sutton AJ, Ioannidis JPA, Terrin N, Jones DR, Lau J, Carpenter J, Rucker G, Harbord RM, Schmid CH, et al. Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials. Br Med J. 2011;343:4002.

Egger M, Smith GD, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. Brit Med J. 1997;315(7109):629–34.

Jennions MD, Moller AP. Relationships fade with time: a meta-analysis of temporal trends in publication in ecology and evolution. P Roy Soc B-Biol Sci. 2002;269(1486):43–8.

Koricheva J, Kulinskaya E. Temporal instability of evidence base: a threat to policy making? Trends Ecol Evol. 2019;34(10):895–902.

Raue A, Kreutz C, Maiwald T, Bachmann J, Schilling M, Klingmuller U, Timmer J. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics. 2009;25(15):1923–9.

Matsushima Y, Noma H, Yamada T, Furukawa TA. Influence diagnostics and outlier detection for meta-analysis of diagnostic test accuracy. Res Synth Methods. 2020;11(2):237–47.

Viechtbauer W, Cheung MWL. Outlier and influence diagnostics for meta-analysis. Res Synth Methods. 2010;1(2):112–25.

Haddaway NR, Macura B. The role of reporting standards in producing robust literature reviews comment. Nat Clim Change. 2018;8(6):444–7.

Frampton G, Whaley P, Bennett M, Bilotta G, Dorne JLCM, Eales J, James K, Kohl C, Land M, Livoreil B, et al. Principles and framework for assessing the risk of bias for studies included in comparative quantitative environmental systematic reviews. Environ Evid. 2022;11(1):12.

Stanhope J, Weinstein P. Critical appraisal in ecology: what tools are available, and what is being used in systematic reviews? Res Synth Methods. 2022. https://doi.org/10.1002/jrsm.1609 .

Haddaway NR, Macura B, Whaley P, Pullin AS. ROSES RepOrting standards for systematic evidence syntheses: pro forma, flow-diagram and descriptive summary of the plan and conduct of environmental systematic reviews and systematic maps. Environ Evid. 2018;7(1):1.

Woodcock P, Pullin AS, Kaiser MJ. Evaluating and improving the reliability of evidence syntheses in conservation and environmental science: a methodology. Biol Conserv. 2014;176:54–62.

O’Dea RE, Lagisz M, Jennions MD, Koricheva J, Noble DWA, Parker TH, Gurevitch J, Page MJ, Stewart G, Moher D, et al. Preferred reporting items for systematic reviews and meta-analyses in ecology and evolutionary biology: a PRISMA extension. Biol Rev. 2021;96(5):1695–722.

Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Plos Med. 2009;6(7):e1000097.

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Shamseer L, Tetzlaff JM, Akl EA, Brennan SE, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Plos Med. 2021;18(3): e1003583.

Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, Santos LBD, Bourne PE, et al. Comment: the FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016;3: 160018.

Culina A, Baglioni M, Crowther TW, Visser ME, Woutersen-Windhouwer S, Manghi P. Navigating the unfolding open data landscape in ecology and evolution. Nat Ecol Evol. 2018;2(3):420–6.

Roche DG, Lanfear R, Binning SA, Haff TM, Schwanz LE, Cain KE, Kokko H, Jennions MD, Kruuk LE. Troubleshooting public data archiving: suggestions to increase participation. Plos Biol. 2014;12(1): e1001779.

Roche DG, Kruuk LEB, Lanfear R, Binning SA. Public data archiving in ecology and evolution: how well are we doing? Plos Biol. 2015;13(11): e1002295.

Culina A, van den Berg I, Evans S, Sanchez-Tojar A. Low availability of code in ecology: a call for urgent action. Plos Biol. 2020;18(7): e3000763.

Spake R, Mori AS, Beckmann M, Martin PA, Christie AP, Duguid MC, Doncaster CP. Implications of scale dependence for cross-study syntheses of biodiversity differences. Ecol Lett. 2021;24(2):374–90.

Osenberg CW, Sarnelle O, Cooper SD. Effect size in ecological experiments: the application of biological models in meta-analysis. Am Nat. 1997;150(6):798–812.

Noble DWA, Nakagawa S. Planned missing data designs and methods: options for strengthening inference, increasing research efficiency and improving animal welfare in ecological and evolutionary research. Evol Appl. 2021;14(8):1958–68.

Nakagawa S, Freckleton RP. Missing inaction: the dangers of ignoring missing data. Trends Ecol Evol. 2008;23(11):592–6.

Mavridis D, Chaimani A, Efthimiou O, Leucht S, Salanti G. Addressing missing outcome data in meta-analysis. Evid-Based Ment Health. 2014;17(3):85.

Ellington EH, Bastille-Rousseau G, Austin C, Landolt KN, Pond BA, Rees EE, Robar N, Murray DL. Using multiple imputation to estimate missing data in meta-regression. Methods Ecol Evol. 2015;6(2):153–63.

Kambach S, Bruelheide H, Gerstner K, Gurevitch J, Beckmann M, Seppelt R. Consequences of multiple imputation of missing standard deviations and sample sizes in meta-analysis. Ecol Evol. 2020;10(20):11699–712.

Hadfield JD, Nakagawa S. General quantitative genetic methods for comparative biology: phylogenies, taxonomies and multi-trait models for continuous and categorical characters. J Evol Biol. 2010;23(3):494–508.

Cinar O, Nakagawa S, Viechtbauer W. Phylogenetic multilevel meta-analysis: a simulation study on the importance of modelling the phylogeny. Methods Ecol Evol. 2021. https://doi.org/10.1111/2041-210X.13760 .

Ives AR, Zhu J. Statistics for correlated data: phylogenies, space, and time. Ecol Appl. 2006;16(1):20–32.

Maire A, Thierry E, Viechtbauer W, Daufresne M. Poleward shift in large-river fish communities detected with a novel meta-analysis framework. Freshwater Biol. 2019;64(6):1143–56.

Morrissey MB. Meta-analysis of magnitudes, differences and variation in evolutionary parameters. J Evol Biol. 2016;29(10):1882–904.

Cohen JM, Lajeunesse MJ, Rohr JR. A global synthesis of animal phenological responses to climate change. Nat Clim Change. 2018;8(3):224.

Gurevitch J, Morrison JA, Hedges LV. The interaction between competition and predation: a meta-analysis of field experiments. Am Nat. 2000;155(4):435–53.

Macartney EL, Lagisz M, Nakagawa S. The relative benefits of environmental enrichment on learning and memory are greater when stressed: a meta-analysis of interactions in rodents. Neurosci Biobehav R. 2022. https://doi.org/10.1016/j.neubiorev.2022.104554 .

Spake R, Bowler DE, Callaghan CT, Blowes SA, Doncaster CP, Antão LH, Nakagawa S, McElreath R, Chase JM. Understanding ‘it depends’ in ecology: a guide to hypothesising, visualising and interpreting statistical interactions. Biol Rev. 2023. https://doi.org/10.1111/brv.12939 .

Siviter H, Bailes EJ, Martin CD, Oliver TR, Koricheva J, Leadbeater E, Brown MJF. Agrochemicals interact synergistically to increase bee mortality. Nature. 2021;596(7872):389.

Salanti G, Schmid CH. Research synthesis methods special issue on network meta-analysis: introduction from the editors. Res Synth Methods. 2012;3(2):69–70.

Gomez-Creutzberg C, Lagisz M, Nakagawa S, Brockerhoff EG, Tylianakis JM. Consistent trade-offs in ecosystem services between land covers with different production intensities. Biol Rev. 2021;96(5):1989–2008.

Jackson D, White IR, Price M, Copas J, Riley RD. Borrowing of strength and study weights in multivariate and network meta-analysis. Stat Methods Med Res. 2017;26(6):2853–68.

Sanchez-Tojar A, Moran NP, O’Dea RE, Reinhold K, Nakagawa S. Illustrating the importance of meta-analysing variances alongside means in ecology and evolution. J Evol Biol. 2020;33(9):1216–23.

Riley RD, Lambert PC, Abo-Zaid G. Meta-analysis of individual participant data: rationale, conduct, and reporting. BMJ. 2010;340:c221.

Riley RD, Tierney JF, Stewart LA. Individual participant data meta-analysis : a handbook for healthcare research. 1st ed. Hoboken: Wiley; 2021.

Hudson LN, Newbold T, Contu S, Hill SLL, Lysenko I, De Palma A, Phillips HRP, Alhusseini TI, Bedford FE, Bennett DJ, et al. The database of the PREDICTS (projecting responses of ecological diversity in changing terrestrial systems) project. Ecol Evol. 2017;7(1):145–88.

Dornelas M, Antao LH, Moyes F, Bates AE, Magurran AE, Adam D, Akhmetzhanova AA, Appeltans W, Arcos JM, Arnold H, et al. BioTIME: a database of biodiversity time series for the anthropocene. Glob Ecol Biogeogr. 2018;27(7):760–86.

Mengersen K, Gurevitch J, Schmid CH. Meta-analysis of primary data. In: Koricheva J, Gurevitch J, Mengersen K, editors. Handbook of meta-analysis in ecology and evolution. Priceton: Princeton university; 2013. p. 300–12.

Spake R, O’Dea RE, Nakagawa S, Doncaster CP, Ryo M, Callaghan CT, Bullock JM. Improving quantitative synthesis to achieve generality in ecology. Nat Ecol Evol. 2022;6(12):1818–28.

Gerstner K, Moreno-Mateos D, Gurevitch J, Beckmann M, Kambach S, Jones HP, Seppelt R. Will your paper be used in a meta-analysis? Make the reach of your research broader and longer lasting. Methods Ecol Evol. 2017;8(6):777–84.

Haddaway NR. A call for better reporting of conservation research data for use in meta-analyses. Conserv Biol. 2015;29(4):1242–5.

Midolo G, De Frenne P, Holzel N, Wellstein C. Global patterns of intraspecific leaf trait responses to elevation. Global Change Biol. 2019;25(7):2485–98.

White IR, Schmid CH, Stijnen T. Choice of effect measure and issues in extracting outcome data. In: Schmid CH, Stijnen T, White IR, editors. Handbook of meta-analysis. Boca Raton: CRC; 2021.

Lajeunesse MJ. Bias and correction for the log response ratio in ecological meta-analysis. Ecology. 2015;96(8):2056–63.

Download references

Acknowledgements

SN, ELM, and ML were supported by the ARC (Australian Research Council) Discovery grant (DP200100367), and SN, YY, and ML by the ARC Discovery grant (DP210100812). YY was also supported by the National Natural Science Foundation of China (32102597). A part of this research was conducted while visiting the Okinawa Institute of Science and Technology (OIST) through the Theoretical Sciences Visiting Program (TSVP) to SN.

Australian Research Council Discovery grant (DP200100367); Australian Research Council Discovery grant (DP210100812); The National Natural Science Foundation of China (32102597).

Author information

Authors and affiliations.

Evolution & Ecology Research Centre and School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, NSW, 2052, Australia

Shinichi Nakagawa, Yefeng Yang, Erin L. Macartney & Malgorzata Lagisz

Theoretical Sciences Visiting Program, Okinawa Institute of Science and Technology Graduate University, Onna, 904-0495, Japan

Shinichi Nakagawa

School of Biological Sciences, Whiteknights Campus, University of Reading, Reading, RG6 6AS, UK

Rebecca Spake

You can also search for this author in PubMed   Google Scholar

Contributions

SN was commissioned to write this article so he assembled a team of co-authors. SN discussed the idea with YY, ELM, RS and ML, and all of them contributed to the design of this review. ML led the survey working with YY and ELM, while YY led the creation of the accompanying webpage working with RS. SN supervised all aspects of this work and wrote the first draft, which was commented on, edited, and therefore, significantly improved by the other co-authors. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Shinichi Nakagawa or Yefeng Yang .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Contest for publication

The authors provide consent for publication.

Competing interests

The authors report no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:.

The survey of meta-analyses in environmnetal sciences.

Additional file 2:

The hands-on R tutorial.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Nakagawa, S., Yang, Y., Macartney, E.L. et al. Quantitative evidence synthesis: a practical guide on meta-analysis, meta-regression, and publication bias tests for environmental sciences. Environ Evid 12 , 8 (2023). https://doi.org/10.1186/s13750-023-00301-6

Download citation

Received : 13 January 2023

Accepted : 23 March 2023

Published : 24 April 2023

DOI : https://doi.org/10.1186/s13750-023-00301-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Hierarchical models
  • Robust variance estimation
  • Spatial dependency
  • Variance–covariance matrix
  • Meta-analysis of variance
  • Network meta-analysis
  • Multivariate meta-analysis

Environmental Evidence

ISSN: 2047-2382

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

systematic review quantitative synthesis

RMIT University

Teaching and Research guides

Systematic reviews.

  • Starting the review
  • About systematic reviews
  • Research question
  • Plan your search
  • Sources to search
  • Search example
  • Screen and analyse

What is synthesis?

Quantitative synthesis (meta-analysis), qualitative synthesis.

  • Guides and software
  • Further help

Synthesis is a stage in the systematic review process where extracted data (findings of individual studies) are combined and evaluated. The synthesis part of a systematic review will determine the outcomes of the review.

There are two commonly accepted methods of synthesis in systematic reviews:

  • Quantitative data synthesis
  • Qualitative data synthesis

The way the data is extracted from your studies and synthesised and presented depends on the type of data being handled.

If you have quantitative information, some of the more common tools used to summarise data include:

  • grouping of similar data, i.e. presenting the results in tables
  • charts, e.g. pie-charts
  • graphical displays such as forest plots

If you have qualitative information, some of the more common tools used to summarise data include:

  • textual descriptions, i.e. written words
  • thematic or content analysis

Whatever tool/s you use, the general purpose of extracting and synthesising data is to show the outcomes and effects of various studies and identify issues with methodology and quality. This means that your synthesis might reveal a number of elements, including:

  • overall level of evidence
  • the degree of consistency in the findings
  • what the positive effects of a drug or treatment are, and what these effects are based on
  • how many studies found a relationship or association between two things

In a quantitative systematic review, data is presented statistically. Typically, this is referred to as a meta-analysis . 

The usual method is to combine and evaluate data from multiple studies. This is normally done in order to draw conclusions about outcomes, effects, shortcomings of studies and/or applicability of findings.

Remember, the data you synthesise should relate to your research question and protocol (plan). In the case of quantitative analysis, the data extracted and synthesised will relate to whatever method was used to generate the research question (e.g. PICO method), and whatever quality appraisals were undertaken in the analysis stage.

One way of accurately representing all of your data is in the form of a f orest plot . A forest plot is a way of combining results of multiple clinical trials in order to show point estimates arising from different studies of the same condition or treatment. 

It is comprised of a graphical representation and often also a table. The graphical display shows the mean value for each trial and often with a confidence interval (the horizontal bars). Each mean is plotted relative to the vertical line of no difference.

  • Forest Plots - Understanding a Meta-Analysis in 5 Minutes or Less (5:38 min) In this video, Dr. Maureen Dobbins, Scientific Director of the National Collaborating Centre for Methods and Tools, uses an example from social health to explain how to construct a forest plot graphic.
  • How to interpret a forest plot (5:32 min) In this video, Terry Shaneyfelt, Clinician-educator at UAB School of Medicine, talks about how to interpret information contained in a typical forest plot, including table data.
  • An introduction to meta-analysis (13 mins) Dr Christopher J. Carpenter introduces the concept of meta-analysis, a statistical approach to finding patterns and trends among research studies on the same topic. Meta-analysis allows the researcher to weight study results based on size, moderating variables, and other factors.

Journal articles

  • Neyeloff, J. L., Fuchs, S. C., & Moreira, L. B. (2012). Meta-analyses and Forest plots using a microsoft excel spreadsheet: step-by-step guide focusing on descriptive data analysis. BMC Research Notes, 5(1), 52-57. https://doi.org/10.1186/1756-0500-5-52 Provides a step-by-step guide on how to use Excel to perform a meta-analysis and generate forest plots.
  • Ried, K. (2006). Interpreting and understanding meta-analysis graphs: a practical guide. Australian Family Physician, 35(8), 635- 638. This article provides a practical guide to appraisal of meta-analysis graphs, and has been developed as part of the Primary Health Care Research Evaluation Development (PHCRED) capacity building program for training general practitioners and other primary health care professionals in research methodology.

In a qualitative systematic review, data can be presented in a number of different ways. A typical procedure in the health sciences is  thematic analysis .

As explained by James Thomas and Angela Harden (2008) in an article for  BMC Medical Research Methodology : 

"Thematic synthesis has three stages:

  • the coding of text 'line-by-line'
  • the development of 'descriptive themes'
  • and the generation of 'analytical themes'

While the development of descriptive themes remains 'close' to the primary studies, the analytical themes represent a stage of interpretation whereby the reviewers 'go beyond' the primary studies and generate new interpretive constructs, explanations or hypotheses" (p. 45).

A good example of how to conduct a thematic analysis in a systematic review is the following journal article by Jorgensen et al. (2108) on cancer patients. In it, the authors go through the process of:

(a) identifying and coding information about the selected studies' methodologies and findings on patient care

(b) organising these codes into subheadings and descriptive categories

(c) developing these categories into analytical themes

Jørgensen, C. R., Thomsen, T. G., Ross, L., Dietz, S. M., Therkildsen, S., Groenvold, M., Rasmussen, C. L., & Johnsen, A. T. (2018). What facilitates “patient empowerment” in cancer patients during follow-up: A qualitative systematic review of the literature. Qualitative Health Research, 28(2), 292-304. https://doi.org/10.1177/1049732317721477

Thomas, J., & Harden, A. (2008). Methods for the thematic synthesis of qualitative research in systematic reviews. BMC Medical Research Methodology, 8(1), 45-54. https://doi.org/10.1186/1471-2288-8-45

  • << Previous: Screen and analyse
  • Next: Write >>

Creative Commons license: CC-BY-NC.

  • Last Updated: May 13, 2024 7:35 AM
  • URL: https://rmit.libguides.com/systematicreviews

News alert: UC Berkeley has announced its next university librarian

Secondary menu

  • Log in to your Library account
  • Hours and Maps
  • Connect from Off Campus
  • UC Berkeley Home

Search form

Research methods--quantitative, qualitative, and more: evidence synthesis/systematic reviews.

  • Quantitative Research
  • Qualitative Research
  • Data Science Methods (Machine Learning, AI, Big Data)
  • Text Mining and Computational Text Analysis
  • Evidence Synthesis/Systematic Reviews
  • Get Data, Get Help!

About Evidence Synthesis and Systematic Reviews

According to the Royal Society, 'evidence synthesis' refers to the process of bringing together information from a range of sources and disciplines to inform debates and decisions on specific issues. They generally include a methodical and comprehensive literature synthesis focused on a well-formulated research question. Their aim is to identify and synthesize  all  of the scholarly research on a particular topic, including both published and unpublished studies. Evidence syntheses are conducted in an unbiased, reproducible way to provide evidence for practice and policy-making, as well as to identify gaps in the research. Evidence syntheses may also include a meta-analysis, a more quantitative process of synthesizing and visualizing data retrieved from various studies. 

Evidence syntheses are much more time-intensive than traditional literature reviews and require a multi-person research team. See this  PredicTER tool  to get a sense of a systematic review timeline (one type of evidence synthesis). Before embarking on an evidence synthesis, it's important to clearly identify your reasons for conducting one. 

(From the Cornell University Library Research Guide,  A Guide to Evidence Synthesis )

Open Access Evidence Synthesis Resources

(From the Cornell University Library Research Guide, A Guide to Evidence Synthesis )

New content will be added by the Cornell University team to the list below as it becomes available. Browse our  public Zotero library of evidence synthesis research here.  

Training Materials:

The Evidence Synthesis Institute  is a training program aimed at library staff supporting evidence syntheses in topics outside of the health sciences, and is fully funded by the Institute of Museum and Library Services (IMLS).  The teaching slides from this institute are available here  and are licensed under a  Creative Commons Attribution-NonCommercial 4.0 International License . Slides cover all aspects of the evidence synthesis process and much of the content is applicable to researchers as well as librarians. 

Introduction to Systematic Review and Meta-Analysis : This Massive Open Online Course (MOOC), offered through Coursera, describes and provides instruction for completing all stages of systematic reviews and meta-analyses.

Cochrane Interactive Learning  provides tutorials for performing systematic reviews on health-related topics.  Module 1: Introduction to Conducting Systematic Reviews  is free if you sign up for a  Cochrane account. 

INASP  provides a free  search strategies tutorial  to teach users how to clearly define and describe a search topic, identify suitable search terms, pick the best platform(s) on which to search, and use tools and techniques to refine and modify your search. 

Software Tools:

SR Toolbox : a web-based catalogue of tools that support various tasks within the systematic review and wider evidence synthesis process

Rayyan : Free article screening tool 

Zotero : Free citation management tool 

RawGraphs : Free tool for creating appealing graphs

Open Science Framework Preregistration : Free, open source resource for preregistering evidence synthesis protocols, with  well-documented guidance  on using the tool

SnowGlobe : Free tool for "snowballing" included resources (locating their citations, and those papers that have cited them).  

Grey Literature:

[Grey literature includes reliable sources that may not necessarily have been published in a peer reviewed journal, such as government reports.  PubMed is a good source for grey literature.]  PubMed  comprises more than 32 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full text content from PubMed Central and publisher web sites.  The PubMed Trainer's Toolkit  contains instruction materials and short tutorials for navigating PubMed.

Is a Systematic Review for Me?

(From the University of California, San Francisco Library Research Guide, Systematic Reviews )

Systematic review -  A systematic review synthesizes data from articles into a summary review which has the potential to make conclusions more certain. Systematic reviews are considered the highest level of evidence in evidence-based medicine (EBM)  evidence pyramid . An overview of the systematic review process includes:

  • Time Commitment : Typically 12–24 months from start to finish (may take longer)
  • Team Requirement : Typically 3-5 people at minimum. You will need at least a primary reviewer and a secondary reviewer. Other roles to consider include a subject expert, methodologist/statistician, operations manager, and medical librarian.
  • Topic : A significant question is being asked and answered. The topic is not the subject of a recent review and is not being worked on currently by others.

If this doesn't meet your needs, see " A typology of reviews: an analysis of 14 review types and associated methodologies ".

Grant, Maria J., and Andrew Booth. “A Typology of Reviews: An Analysis of 14 Review Types and Associated Methodologies.”  Health information and libraries journal  26.2 (2009): 91–108. Web.

For your reference, see these examples of a UCSF-authored  systematic review ,  scoping review , and  protocol .

Systematic Review Resource List

(From the University of California, San Francisco Library Research Guide,  Systematic Reviews )

  • PRISMA reporting guidelines  -  for systematic reviews ( PRISMA 2020 ), scoping reviews ( PRISMA-ScR ), searching ( PRISMA-S ) & more

Scoping studies: towards a methodological framework (2005)  &  Scoping studies: advancing the methodology (2010)  - Scoping review frameworks & guidelines

Systematic review handbooks & guidelines -  Duke University

Steps in the systematic review process  - Stanford University

Searching for & publishing systematic review protocols -  Stanford University

Screening software for systematic reviews  - Cambridge University

Critical appraisal tools  - Memorial Sloan Kettering Library

Covidence- A Tool for Systematic Reviews

If you do decide to do a systematic review, UC Berkeley licenses Covidence, a tool to help you. In Covidence, you can  import citations ,  screen titles and abstracts ,  upload references ,  screen full text ,  create forms for critical appraisal ,  perform risk of bias tables ,  complete data extraction , and  export a PRISMA flowchart  summarizing your review process. As an institutional member, our users have priority access to Covidence support.   To access Covidence using the UC Berkeley institutional account ,  start at this page  and follow the instructions.

  • << Previous: Text Mining and Computational Text Analysis
  • Next: Get Data, Get Help! >>
  • Last Updated: Apr 25, 2024 11:09 AM
  • URL: https://guides.lib.berkeley.edu/researchmethods

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 16 May 2024

Systematic review and meta-analysis of ex-post evaluations on the effectiveness of carbon pricing

  • Niklas Döbbeling-Hildebrandt   ORCID: orcid.org/0000-0002-3170-1825 1 , 2 ,
  • Klaas Miersch   ORCID: orcid.org/0000-0003-1055-267X 1 , 3 ,
  • Tarun M. Khanna   ORCID: orcid.org/0000-0003-3442-7689 1 , 4 ,
  • Marion Bachelet 1 ,
  • Stephan B. Bruns   ORCID: orcid.org/0000-0002-3028-9699 5 , 6 , 7 ,
  • Max Callaghan   ORCID: orcid.org/0000-0001-8292-8758 1 ,
  • Ottmar Edenhofer 1 , 3 , 8 ,
  • Christian Flachsland 1 , 9 ,
  • Piers M. Forster   ORCID: orcid.org/0000-0002-6078-0171 2 ,
  • Matthias Kalkuhl   ORCID: orcid.org/0000-0003-4797-6628 1 , 10 ,
  • Nicolas Koch   ORCID: orcid.org/0000-0002-7775-034X 1 , 11 ,
  • William F. Lamb   ORCID: orcid.org/0000-0003-3273-7878 1 , 2 ,
  • Nils Ohlendorf 1 , 3 ,
  • Jan Christoph Steckel   ORCID: orcid.org/0000-0002-5325-9214 1 , 12 &
  • Jan C. Minx   ORCID: orcid.org/0000-0002-2862-0178 1 , 2  

Nature Communications volume  15 , Article number:  4147 ( 2024 ) Cite this article

6837 Accesses

118 Altmetric

Metrics details

  • Carbon and energy
  • Climate-change mitigation
  • Climate-change policy

Today, more than 70 carbon pricing schemes have been implemented around the globe, but their contributions to emissions reductions remains a subject of heated debate in science and policy. Here we assess the effectiveness of carbon pricing in reducing emissions using a rigorous, machine-learning assisted systematic review and meta-analysis. Based on 483 effect sizes extracted from 80 causal ex-post evaluations across 21 carbon pricing schemes, we find that introducing a carbon price has yielded immediate and substantial emission reductions for at least 17 of these policies, despite the low level of prices in most instances. Statistically significant emissions reductions range between –5% to –21% across the schemes (–4% to –15% after correcting for publication bias). Our study highlights critical evidence gaps with regard to dozens of unevaluated carbon pricing schemes and the price elasticity of emissions reductions. More rigorous synthesis of carbon pricing and other climate policies is required across a range of outcomes to advance our understanding of “what works” and accelerate learning on climate solutions in science and policy.

Similar content being viewed by others

systematic review quantitative synthesis

The economic commitment of climate change

systematic review quantitative synthesis

A meta-analysis on global change drivers and the risk of infectious disease

systematic review quantitative synthesis

The role of artificial intelligence in achieving the Sustainable Development Goals

Introduction.

Countries are not on track to meet the climate goals they committed to under the Paris Agreement 1 , 2 . To step up implementation, learning about what policy instruments work in reducing emissions at the necessary speed and scale is critical. But despite more than three decades of experience with carbon pricing and more than 70 implementations of both carbon taxes (37) and cap-and-trade (36) schemes around the world 3 at national, regional and sub-national level, there remains no consensus in science nor policy as to how effective such policies are in reducing greenhouse gas (GHG) emissions.

Proponents have suggested carbon pricing as a key instrument to incentivise GHG emissions reductions on the basis that it would avoid the need for detailed regulatory decisions targeted at specific emission sources 4 , 5 , 6 , 7 , 8 . However, the effectiveness of carbon pricing is highly dependent on the context and the effect could be higher or lower based on the institutions and infrastructures 9 , 10 . Critics doubt the ability of carbon pricing to unlock the investments required for the development and application of low carbon technologies 11 . There are also concerns about whether policymakers can overcome political barriers and raise carbon prices high enough to deliver emissions reductions at the scale and pace required 11 , 12 , 13 .

We aim to systematically review the empirical literature on the effectiveness of carbon pricing policies in reducing GHG emissions. While there are other market based policy instruments, such as fuel taxes, import taxes or value added taxes, we focus here on policies which impose a carbon price across fuels based on their carbon contents. One way to assess the effects of carbon pricing is to evaluate experiences in the real world. A growing scientific literature has provided quantitative evaluations of the effects of different carbon pricing schemes on emissions 14 , 15 , 16 . This evidence is usually provided in the form of quasi-experimental studies which assess the effect of the introduction of the policy (treatment effect). Based on this evidence, our meta-analysis addresses the question: What was the emissions reduction effect of the introduction of a carbon price during the early years of its application? This is different from the question, how emissions respond to gradual changes in existing carbon prices. There exist only very few studies estimating this relationship between the carbon price level and emissions 17 , 18 , 19 . The comprehensive literature on the elasticity of fuel use in response to fuel price changes has been reviewed before in a number of meta-analyses 20 , 21 , 22 , 23 , 24 .

We focus on the growing evidence base on the effectiveness of introducing a carbon price. Reviews of this literature have tended not to employ rigorous systematic review methods such as meta-analysis. A number of reviews describe the literature and summarise the findings of the primary studies but do not attempt a quantitative synthesis of the findings 15 , 25 , 26 , 27 . Green provides a range of effect sizes reported in the reviewed literature without any formal methodology for their harmonisation and analysis, concluding that the policy has no or only a very small effect on emission reductions (0–2%) 28 . None of the available reviews provide a critical appraisal of the quality of the primary studies considered. Biases of such traditional literature reviews have been widely documented in the literature 29 , 30 . The lack of comprehensive systematic review evidence on a multitude of policy questions hampers IPCC assessments to learn from implemented climate policies 31 , 32 , 33 .

We fill this gap by conducting a systematic review and meta-analysis of the empirical ex-post literature on the effectiveness of carbon pricing, covering 21 enacted carbon tax and cap-and-trade policies around the globe following the guidelines by the Collaboration for Environmental Evidence 34 . We use a machine-learning enhanced approach as proposed by Callaghan and Müller-Hansen 35 to screen 16,748 studies from five different literature databases, identifying 80 relevant ex-post policy assessments. We extract and harmonise estimates of average emissions reductions from the introduction of a carbon price. We conduct a meta-analysis on 483 effect sizes on 21 different carbon pricing schemes and estimate emissions reduction effects. We study the heterogeneity in the reported findings and conduct a critical appraisal as well as a publication bias assessment to analyse the impact of different study design choices on the results. Our methodology is transparent and reproducible, ensuring that our analysis is updatable in the future as new information and experiences with carbon pricing policies are gained around the world 36 . The data and code is publicly available: https://github.com/doebbeling/carbon_pricing_effectiveness.git .

We find consistent evidence that carbon pricing policies have caused emissions reductions. Statistically significant emissions reductions are found for 17 of the reviewed carbon pricing policies, with immediate and sustained reductions of between –5% to –21% (–4% to –15% when correcting for publication bias). Our heterogeneity analysis suggests that differences in estimates from the studies are driven by the policy design and context in which carbon pricing is implemented, while often discussed factors like cross-country differences in carbon prices, sectoral coverage, and the design of the policy as a tax or trading scheme do not capture the identified heterogeneity in effect sizes.

Evidence base — larger and more diverse than previously suggested

With the help of our machine-learning assisted approach, we identify 80 quantitative ex-post evaluations across 21 carbon pricing schemes around the globe (see Methods). Previous reviews covered a maximum of 35 research articles on the emissions reduction effect of carbon pricing policies 15 , 25 , 26 , 28 .

As shown in Table  1 , the carbon pricing schemes covered here are very diverse and differ in terms of their specific policy design, scope, and policy context. For example, some of the schemes are targeted at large scale emitters in the industry and energy sectors, while others focus on households via home energy use and the transport sector. In the European Union, some sectors are regulated with a carbon tax while others are covered by the European wide emission trading scheme. We also observe substantial differences in carbon price levels of the covered schemes. All of these differences may give rise to considerable variations in emissions reductions achieved.

Beyond these differences in policy design, carbon price levels, and regional contexts, all considered policy experiences speak to the question whether carbon pricing is or is not effective in reducing GHG emissions. A systematic assessment and comparison of the outcomes of these policies can inform policymakers and future research by synthesising the available evidence.

The number of available ex-post evaluations on the effectiveness of carbon pricing differs substantially across schemes. Prior reviews suggested a bias towards evaluating schemes in Europe and North America 15 , 26 , 28 , however here we find that the vast majority of the available ex-post evidence on the effectiveness of carbon pricing assess the pilot emission trading schemes in China — 35 of the 80 articles. There are 13 studies on the European emissions trading scheme (EU ETS), seven on British Columbia and five on the Regional Greenhouse Gas Initiative (RGGI) in the United States. The remaining schemes are evaluated by a single or very few studies.

Our systematic review also reveals some fundamental evidence gaps in the literature. Despite the broad set of bibliographic databases searched, we found evidence only for 20 out of 73 carbon pricing policies in place in 2023 3 and for the Australian carbon tax, which was repealed two years after its implementation. For some, more recently implemented, policies this may be explained by the time needed for sufficient data to become available, be assessed, and the results published. But even of the 38 carbon pricing schemes already implemented by 2015, for 18 of these we could not find a single study on effectiveness, despite the broad set of bibliographic databases searched (see  Supplementary Information) . There is also little evidence on the effectiveness of carbon pricing relative to the level of the carbon price (carbon price elasticity). We identify only nine price elasticity studies, providing too few effect sizes for meta-analysing these separately.

Average emissions reductions across carbon pricing schemes

In order to provide a meaningful and transparent synthesis of the available quantitative evidence, we harmonise the effect sizes extracted from the individual studies to a common treatment effect metric following standard systematic review guidance 34 . This treatment effect is expressed as the percentage difference between the counterfactual emissions without carbon pricing and observed emissions after the introduction of a carbon price. It assumes emissions reductions to take place at the time of the introduction of the policy and to persist throughout the observation period as a constant difference to counterfactual emissions. Most studies directly provide treatment effects, which we standardise to represent a percentage change in emissions levels. Effect sizes provided as price elasticity are interpreted at the mean carbon price (see Methods). Overall, we harmonise 483 effect sizes from 80 reviewed articles, covering 21 carbon pricing schemes that provide the starting point for our quantitative synthesis.

Our results show that carbon pricing effectively reduces greenhouse gas emissions. We use multilevel random and mixed effects models to account for dependencies among effect sizes in our sample and estimate the average treatment effects. The mixed effects model includes dummy variables for each of the included carbon pricing schemes to estimate the effectiveness for each of the schemes. As depicted in Panel a of Fig.  1 , emissions reduction effects are observed consistently across schemes with considerable variation in magnitude. For 17 of the carbon pricing schemes we find statistically significant average reduction effects from the introduction of a carbon price. The estimated reduction effects range from about –21% to about –5%. Across carbon pricing schemes, we find that on average the policy has reduced emissions by –10.4% [95% CI = (–11.9%, –8.9%)]. This effect is both substantial and highly statistically significant.

figure 1

Panel ( a , d , g ) present weighted mean effect sizes together with their 95% confidence intervals based on multilevel random and mixed effects models and represent the effect of the policy observed in each period after its introduction in comparison to the counterfactual emissions without the policy. The estimates are ordered according to the number of studies they comprise (depicted on the left). The average treatment effect for the Chinese ETS pilots comprises the effects of all eight regional pilot schemes. Cross-country collects the evidence from studies assessing countries with and without carbon pricing, not focusing on a specific carbon pricing scheme. Panel ( a , d , g ) comprise, respectively n  = 470, n  = 253, and n  = 142 effect sizes clustered on the study level. Panels ( b , e , h ) show the distribution of assigned risks of bias (RoB). Panel ( c , f , i ) show the distribution of statistical power. Power above 80% is considered adequate. For synthetic control designs no statistical power was derived, thus presented as “NA".

The reviewed literature provides large differences in the amount and quality of evidence for individual schemes. Focusing on those with the largest evidence base, we find an average treatment effect for the eight Chinese ETS pilots of –13.1% [95% CI = (–15.2%, –11.1%)], which is higher than the –10.4% average treatment effect across the schemes. The EU ETS and the British Columbia carbon tax both have estimated emission reduction effects below the overall average treatment effect. These are estimated at –7.3% [95% CI = (–10.5%, –4.0%)] and –5.4% [95% CI = (–9.6%, –1.2%)]. Reduction effects smaller than –5% are only reported in three instances with severe problems in study design exposing estimates to a high risk of bias (Korean ETS, Australian carbon tax, Swiss ETS).

Critical appraisal and publication bias

The average treatment effects presented in the previous section were based on all reviewed studies. However, the quality of the primary studies is not uniform and some are subject to biases in the study design. Additionally, the average treatment effect might be subject to publication bias. Therefore we re-estimate the treatment effects by adjusting for potential quality issues and publication bias, adopting transparent and reproducible criteria.

We critically appraise each primary study, to identify potential biases in the study design. These biases often arise from the unreasonable selection of a control group used in a quasi-experimental design; from inadequately controlling for confounding factors like the introduction of other relevant policies; or from statistical specifications that do not allow to single out the policy effect. The assessment criteria for the critical appraisal are set out in the methods section and the  Supplementary Information . 46% of the reviewed studies are assessed to have a medium or high risk of bias. When we remove studies with medium or high risk of bias from the sample, the average treatment effects for some of the schemes are adjusted by up to 5 percentage points, while the estimation uncertainty increases due to the reduction of considered primary estimates (see Fig.  1 , Panel d). The identified biases, however, do not systematically impact the estimated treatment effects in either direction. The average treatment effect across policies is practically unchanged when removing studies with medium or high risk of bias.

Secondly, we adjust the average treatment effect for the influence of publication bias. Publication bias could arise from a tendency in the literature towards only publishing statistically significant effects 37 , 38 , 39 , 40 . A precision effect test 41 , 42 confirms the presence of publication bias in the set of studies reviewed here (see  Supplementary Information) . As suggested in the literature, we correct for publication bias by estimating average effects for a subsample of effect sizes with adequate statistical power (see Methods) 38 , which applies to about 30% of the reviewed effect sizes. This subsample analysis adjusts most of the scheme-wise average treatment effects towards lower estimated emissions reductions (see Fig.  1 , Panel g), ranging from –15% to –4%. Across the schemes, the average treatment effect is reduced to –6.8% [95% CI = (–8.1%, –5.6%)]. Despite these adjustments, the publication bias corrected estimates support the overall finding that carbon pricing policies cause significant reductions in in GHG emissions.

Studies with a high risk of bias and low power are not uniformly distributed across schemes. Some schemes are evaluated only by a few biased studies, resulting in very high or low average treatment effects. For example, when considering all available evidence, the carbon pricing schemes in South Korea, Switzerland, and Australia are estimated to have the lowest negative or even positive average treatment effects. These estimates are based entirely on studies with a high risk of bias and are no longer considered when re-estimating the treatment effects based on low risk of bias studies (see Fig.  1 d). The two carbon pricing policies from the United States (California CaT, RGGI), which show the largest negative average treatment effect when considering all available studies, show lower average treatment effects after the adjustment for publication bias (see Fig.  1 g). For other schemes, like the EU ETS and British Columbia’s carbon tax, there is no substantial change in the average treatment effect when studies with high risk of bias are excluded.

Explaining heterogeneity in effect sizes

There is considerable variation in the effect sizes reported by primary studies included in this review. This could arise from heterogeneity in the design of the carbon pricing policies or from heterogeneity in the design of the primary studies. The carbon pricing literature mainly discusses three policy design factors that could potentially explain differences in the effectiveness of the policy. First, there are debates whether carbon prices are better applied as carbon taxes or as emission trading schemes 5 , 43 , 44 , 45 , 46 , 47 . Secondly, it is argued that the policy causes different reduction rates in different sectors 48 , 49 , 50 . And thirdly, the level of the carbon price can be expected to play a decisive role for the magnitude of the emission reductions 5 , 51 , 52 . We assess whether, and to what extent, such factors are able to explain differences in the treatment effects reported. We test which factors are most relevant to explain the reported emissions reductions by using scheme and study characteristics as explanatory variables in meta-regressions.

As we are confronted with a large number of potentially relevant explanatory variables, we use Bayesian model averaging (BMA) to assess the heterogeneity in the estimated effect sizes reported by the different studies. BMA is particularly suitable for meta-analysis as it allows for running a large number of meta-regressions with different possible combinations of explanatory variables and does not require selecting one individual specification (see Methods). We include explanatory variables for the three policy design factors provided above: price level, sector coverage, and a variable differentiating between carbon taxes and cap-and-trade schemes. In addition we add dummy variables for each of the carbon pricing schemes, capturing the remaining policy design and contextual factors of each policy scheme. Additionally, we test whether studies assessing longer periods after the policy implementation find higher or lower treatment effects. To assess the impact of methodological choices made in the studies, we study a set of variables including the type of study design, estimation method, and data used in the primary studies.

The results from the BMA are provided in Table  2 and Fig.  2 . The posterior inclusion probability (PIP) indicates the relevance of each variable. Commonly, variables with a PIP above 0.5 are interpreted to be relevant explanatory factors, while variables with lower PIPs are unable to capture the observed heterogeneity. The table furthermore provides the posterior mean and standard deviation of the estimated effect averaged across all meta-regressions that include the respective variable.

figure 2

The columns in the figure depict the best 26,435 estimated meta-regressions, with each column showing the outcome of one estimated meta-regression model. The dependent variable for each of the meta-regression models is the percentage change in emissions. The possible explanatory variables are depicted in the rows (ordered by their PIP in descending order) and the explanatory variables included in a respective meta-regression model of the column is indicated by the colours. Red colour indicates the variable was included with a negative sign (larger emission reductions). Blue colour indicates a positive sign (smaller emission reductions). No colour indicates that the variable was not included in the meta-regression model represented by that column. The horizontal axis indicates the cumulative posterior model probabilities across all models. The models are ranked by their posterior model probability with the model on the left accounting for the largest posterior model probability. The definitions of the explanatory variables are provided in the  Supplementary Information .

Variation in carbon prices, the sectoral coverage of schemes, and choice of carbon tax vs. cap-and-trade do not seem to be important variables in explaining the observed heterogeneity in emissions reductions (PIP < 0.5). Instead the dummy variables for the place where the schemes are applied do a better job in explaining this heterogeneity than the variables that capture specific design characteristics. The variables for the RGGI and the Chinese ETS pilots have a larger reduction effect on emissions than the EU ETS, which is set as the reference category. The Swiss ETS is estimated to have less of a reduction effect compared to the benchmark. Alternative specifications of the BMA, provided in the  Supplementary Information , also estimate a larger reduction effect for the Swedish carbon tax compared to the benchmark. The directions of these coefficients are in line with the average treatment effects presented in Fig.  1 , for the respective geographies.

If we remove the dummy variables for the schemes, the size of the carbon price becomes an important variable in the BMA to explain the heterogeneity in emission reductions with a PIP close to 1 (see  Supplementary Information) . However, in the absence of the scheme dummies the effect of the price variable is likely to be confounded as the scheme dummies account for any omitted context variable that does not vary within a scheme. The high correlation of 0.96 between the scheme dummies and the price variable indicates that the price variable captures the heterogeneity between schemes. In fact, the price coefficient is estimated with a positive sign in the BMA specification without the scheme dummies, implying that lower emissions reductions are achieved with higher carbon prices. The counterintuitive direction of the price effect indicates a misspecification of the model when the scheme dummies are excluded. Below we discuss possible causes for this inverse relationship between the price and the reduction effect in our data. The effect of carbon prices on emissions reductions is better identified by adding scheme dummies to focus on the variation of prices within each scheme. However, the largest share of the variation in our carbon price variable comes from variation between the schemes (91%) and only 9% from within scheme variation. This is not a limitation of our dataset. Indeed, carbon prices tend to vary strongly across countries based on the design and coverage of the scheme. But for individual schemes prices have historically been stagnant (EU ETS till recently, RGGI, Chinese ETS pilots) or increases relatively modest (BC carbon tax) 53 and the effect size estimates evaluated here provide limited time frequency. We suspect that due to this low variation, our sample has insufficient power to identify carbon prices as a relevant factor in explaining emissions reductions.

Studies assessing the effectiveness of carbon pricing over longer time periods find larger emission reductions. The coefficient for the variable duration has a PIP of 0.76 and is estimated with a negative sign for all regression specifications it is included in. Testing for the spatial and temporal granularity of the data suggests that only the use of city level data compared to the country level explains some of the heterogeneity in reported effect sizes. Methodological differences in the reviewed studies only have a minor influence on effect sizes. These are discussed in further detail, in the  Supplementary Information .

In line with the previous section, we also include the risk of bias variable and the standard error, capturing the publication bias. They are both not detected to be most relevant to explain the heterogeneity.

In this first quantitative meta-analysis of carbon pricing evaluations, we find robust evidence that existing carbon pricing schemes that have been evaluated to date are effective in reducing GHG emissions. Our machine-learning enhanced approach to study identification finds more than twice as many ex-post evaluations than existing reviews 15 , 25 , 26 , 28 , studying the effectiveness of 21 carbon pricing policies. Our meta-analysis finds that at least 17 of these policies have caused significant emissions reductions ranging from –5% to –21%. These are substantially larger than the 0% to –2% suggested in the recent and widely cited review by Green 28 , which lacks a clear and transparent methodology to synthesise the literature 29 , not allowing us to formally compare our results. Our finding is robust to biases from poor study designs as well as publication bias. Correcting for the latter adjusts the range of observed emissions reductions to –4% to –15% across carbon pricing schemes.

The synthesis of research findings across carbon pricing schemes provides comprehensive and consistent evidence of its effectiveness, despite the heterogeneity of policy designs and regional contexts. Compared to the recent assessment report by the IPCC, which provides a quantification of achieved reductions only for the EU ETS 54 , our systematic review adds synthesised emission reduction estimates for more than a dozen carbon pricing schemes. We provide these estimates together with uncertainty ranges and a transparent assessment of study quality and highlight the presence of substantial variation in emissions reductions achieved across the schemes in our sample, ranging from –5% for the carbon tax in British Columbia to –21% for the RGGI. We conduct an early application of Bayesian model averaging for meta-regressions on our dataset of 483 effect sizes to disentangle which factors explain these differences. The findings suggest that the individual context and policy design of the schemes best explain the heterogeneity in achieved emissions reductions. These are the most relevant explanatory factors despite controlling for broader policy design features like the sectoral coverage or the design as carbon tax or carbon trading scheme as well as for study design features of the primary studies.

Our heterogeneity analysis does not identify a relationship between the price level and the achieved emissions reductions, i.e. the size of the emissions reductions observed across schemes from the introduction of a carbon price cannot be explained well by the carbon price level. This is not surprising as marginal abatement costs may differ widely as, for example, prominently acknowledged in the literature on linking carbon pricing schemes 55 , 56 . It is further different from the expectation that higher carbon prices lead to larger emissions reductions within a carbon pricing scheme as commonly found in available assessments of fuel price elasticities 24 , 57 , 58 . In line with this argument, we find that the relationship between carbon price levels and emissions reductions in our meta-analytic framework is dominated by the across-scheme variation in prices, which accounts for 91% of the variation in our dataset while the variation within schemes only accounts for 9%. The interpretation for not finding a clear relationship should thus rather be that when implementing a carbon price in two countries with different country contexts, the country with the higher carbon price would not necessarily experience the higher emissions reductions.

This can be observed, for instance, when looking at the cases of China, the EU, and British Columbia. The reviewed literature finds larger emissions reduction effects for the pilot emission trading schemes in China (–13.1%) than for the EU ETS (–7.3%) and the carbon tax in British Columbia (–5.4%), despite the very low carbon prices of the Chinese schemes. The average prices of the eight Chinese pilot schemes are all below US$ 8 during the study period, while the average prices for the EU and British Columbia are at US$ 20 and US$ 18, respectively. This is likely a result of lower abatement costs in China 59 together with differences in the policy contexts of the countries. The effectiveness is certainly influenced by other policies in place. In China indirect carbon prices are lower than in the EU countries and Canada 60 , allowing for a higher marginal effect of the implementation of the ETS pilots in China. Non-pricing instruments also diverge across countries. In addition, the implementation of a carbon price (even with a low price) can have a signalling effect towards the emitters, underlining the commitment of the government towards climate mitigation. Evidence for the Guangdong province suggests that signalling has significantly contributed to the achieved emissions reductions in the context of the introduction of the ETS pilots in China 61 . Another example highlighting the relevance of the context of the policy implementation is the case of the RGGI. The policy implementation coincides with the shale gas boom, which drastically reduced the prices of natural gas in the USA and started around the same time as the RGGI was implemented. In face of these general price dynamics in the US energy sector, RGGI participating states reduced their emissions considerably stronger compared to non-regulated states 62 , 63 , while the carbon price was only US$ 3 on average.

Even if across schemes the price level of the carbon price, is not found to be the relevant driver of the emissions reductions achieved with the introduction of the policy, within a scheme the effectiveness is expected to increase with increasing prices. This is well studied for other changes in fuel prices, which are found to substantially reduce its consumption 57 , 58 . That literature studies all possible price changes on a single fuel, while the here assessed literature on carbon prices studies the effect of a single policy instrument across all fuels. It is thus a complementary but distinct body of evidence. Meta-analyses estimate a reduction of fuel consumption between 0.31% and 0.85% in the long run for a 1% increase in the fuel price 20 , 21 , 22 , 23 , 24 .

Within the literature evaluating the policy effectiveness we identified only nine primary studies estimating semi-elasticities of carbon prices. Four are using the stepwise introduction of the carbon tax in British Columbia to estimate elasticities for the transport and buildings sectors 17 , 18 , 64 , 65 , while one is conducted respectively for RGGI 63 and EU ETS 19 . In addition, some studies estimate elasticities across countries and carbon pricing schemes 15 , 66 , 67 . These studies support what was already known from studies on the price elasticity of fuel consumption 20 , 21 , 22 , 23 , 24 , 57 , 58 : increasing prices reduce fuel use and emissions. Hence, as carbon prices further rise after the introduction additional emissions reductions are achieved. Interestingly, some studies suggest that an increase in the carbon tax leads to larger emissions reductions than an increase of the same size in the market price of the fuel 17 , 18 , 64 , 65 . It will thus be a relevant avenue for future research to understand whether it is a generalisable finding that price elasticities are higher for policy induced price changes compared to market price changes of fossil fuels. Such research could draw on the comprehensive evidence from the fuel price literature.

Our meta-regression results suggest that the policy effectiveness of carbon pricing policies increases with time. Studies covering longer time periods after the introduction of the carbon price report larger emissions reduction effects compared to assessments for shorter time periods. While this finding should be treated with caution, as most of the primary studies assume constant treatment effects for their estimations, it hints towards increasing emissions reductions in the years following the policy introduction. The assumption of constant treatment effects reflects not only methodological considerations of the primary studies, but is also based on the expectation that as long as the carbon price of the implemented policy is unchanged, the emission reduction effect should not intensify. The finding of our meta-regression to some extend counters that assumption. An increasing policy effectiveness could be a result of steady adjustment processes, enforced by innovation and investments into cleaner production and infrastructure. Additionally, the literature reviewed here provides some evidence that an increasing policy stringency has also played its role in strengthening the effectiveness of the policy. Increases in the carbon prices led to additional emissions reductions in Sweden 68 and the United Kingdom 69 . Similar effects are found for the EU ETS, where the effectiveness increases with the increasing stringency from phases I, II, and III 70 , 71 , 72 , 73 .

While the harmonisation and synthesis of the emissions reduction effects provides an overview of the policy effectiveness across a large number of policy schemes, it raises a number of policy relevant research questions, which cannot be answered with our purely quantitative, meta-econometric approach – which is inherently dependent on the available evidence base. These limitations could be addressed using promising and widely unexplored mixed method review designs such as realist synthesis 74 , 75 which systematically combine quantitative and qualitative information to better understand why particular policy designs work, under what conditions, and why. Some research gaps, however, need to be filled by further primary research. First, there are more than 50 carbon pricing schemes that have not yet been evaluated for their emission reduction effect, despite some of them being enacted for more than ten years (see  Supplementary Information) . Others have still been studied insufficiently or only poorly. Second, we lack ex-post evidence of higher carbon prices. There are currently less than ten studies assessing emissions reductions in schemes with mean carbon prices higher than US$ 30 across the observation period. As policy ambitions are raised over time, there is an opportunity to strengthen that evidence base. Thirdly, this systematic review highlights substantial challenges with the quality of available primary evidence. Only about half of the studies assessed here follow rigorous study designs with a low risk of bias and only 30% of the studies are adequately powered. While some of this might be related to a lack of access to adequate data for the most rigorous research designs, high quality primary research is essential to understand the effectiveness of climate policies 76 . The multitude of supplementary or conflicting policies as well as other confounding factors pose a challenge to the clear identification of the causal effects of a specific policy 77 . Novel methods of reverse causality are a promising avenue to address this challenge 78 .

The effectiveness is just one dimension of policy outcome relevant to the selection of the best policy measures. Systematic assessments of the ex-post climate policy literature on a multitude of policy outcomes and different climate policy options could be the basis for accelerated learning on climate policies and considerably improve upcoming IPCC assessments. Unless we raise our standards and do this work, policy makers and society will remain in the dark as to the most promising pathways towards addressing the climate crisis.

The systematic review broadly follows the guidance for systematic reviews by the Collaboration for Environmental Evidence 34 , extended by a machine- learning assisted identification of relevant studies. A description of our methods has been published as a review protocol on OSF Registries in advance 79 .

Literature search

We search the bibliographic databases Web of Science, Scopus, JSTOR, RePEc and the web-based academic search engine Google Scholar using a broad search string which comprises a large set of carbon pricing synonyms and indicator words for quantitative ex-post study designs. The full query can be found in the protocol 79 . After the removal of duplicates the search, conducted in the second week of March 2022, returned a set of 16,748 articles (see Fig.  3 ).

figure 3

Adapted from the ROSES flow diagram for systematic reviews 97 .

We screened these articles for their eligibility in two stages. First, we screened them at the title and abstract level using the NACSOS software 80 followed by a screening at full text level. Studies are included if they infer a causal relationship between carbon pricing and the emission development. Eligible studies analyse effects on emission levels or emission levels per capita. Studies were excluded if they assess the effect on emission intensity or emission productivity, i.e. the effect on emissions relative to output. The included policy measures are restricted to explicit carbon taxes and cap-and-trade schemes. Studies on implicit carbon taxes and carbon offsetting mechanisms are excluded. We only include studies published in English language.

The screening at the title and abstract level was simplified by an active learning algorithm, using support vector machines to rank the studies in the order of relevance. We stopped screening when we were 90% confident that we had identified at least 90% of the articles relevant to our systematic review, based on the conservative stopping criterion provided by Callaghan and Müller-Hansen 35 . This reduced the amount of manually screened documents by 77%. All articles included after the title and abstract screening were screened at full text, without any further application of machine-learning algorithms. Figure  3 depicts the articles included and excluded at each screening stage.

Data extraction and critical appraisal

From the included studies we extract the effect size information, including the estimated effect size and direction of the effect, the uncertainty measure, provided as standard error, t statistic, confidence interval, p value, or the indicated significance level, as well as the provided mean emissions and, for price elasticity studies, the mean carbon price. We also capture information on the studied carbon pricing scheme, time of the intervention, study period, emission coverage (sectors, fuels, gases), study design, and estimation method.

We developed criteria for a critical appraisal, by adapting the ROBINS-I assessment criteria 81 to the specific nature of the research studies at hand. First, while the treatment (i.e. the policy application) in the reviewed studies is independent of the conducted research, the study design should cover a representative sample and suitable data. The control group needs to have high similarity with the treatment group, based on demographic, economic, and institutional proximity and similarity in pre-treatment emissions pathways. Statistical methods such as matching or synthetic control methods can increase the comparability of the control group with the treatment group. Second, the study design must control for confounding factors that are expected to influence the emissions of the study objects. For some studies we identify further risks of bias in the set-up of the statistical methods, which are also recorded.

All extracted data is made publicly available (see Data Availability).

Standardising effect sizes

We standardise the extracted effect sizes, based on the heterogeneous study designs and estimation methods, into a common metric. The largest part of the primary literature estimates treatment effects using quasi-experimental study designs (difference-in-differences or regression discontinuity in time). A few studies estimate the treatment effect by comparing the emission levels between countries with and without carbon pricing without any quasi-experimental design (termed cross-country studies in this review). Some studies estimate a carbon price elasticity, i.e. the effect of a marginal change in the carbon price on emissions. All effect sizes are transformed to treatment effects measured as a percentage difference between the counterfactual emissions without the policy and the observed emissions with the policy in place. Effect sizes expressed in tons of CO 2 are standardised using the mean emissions given in the study, while effect sizes from log-level regression specifications are standardised using exponential transformation. Effect sizes from price elasticity estimations are interpreted at the mean carbon price of the intervention during the period studied by the primary study.

Standard errors are derived accordingly. If the statistical (in)significance of an estimate at a specified significance level is the only uncertainty measure provided, this information is used to approximate the standard error. For the non-linear effect size transformation in the case of log-level regression coefficients, we derive the standard error by keeping the t statistic constant. For effect sizes from price elasticity estimations, we interpret the standard errors at the mean price level, just as for the transformation of the effect size itself.

Effect size averaging

We use a multilevel random effects model to estimate the average treatment effect. The random effects model does not assume that all effect sizes converge to a common effect size mean 82 , which in our case accounts for the heterogeneity in the studied countries and schemes. The common variance component is estimated using the restricted maximum likelihood (REML) estimation 83 , 84 . We apply a multilevel estimation to account for the non-independence of effect sizes from the same study, assuming a compound symmetric variance-covariance matrix 84 .

For the estimation of average treatment effects for the individual policy schemes we extend the random effects model to a mixed effects model, inserting dummy variables for each carbon pricing scheme. Studies conducting a cross-sectional assessment of a set of carbon pricing schemes in multiple countries are collected with a separate dummy variable. The eight Chinese pilot ETS schemes are collected in a single dummy variable, as they are commonly assessed together as a single policy in the primary studies. For many of the schemes only one to five studies are available, which does not allow for appropriate clustering of the effect sizes 85 , 86 . The multilevel estimation of the model should still adequately capture the non-independence of effect sizes from the same study. Clustering of standard errors would have a marginal impact on the standard errors derived for the full sample averages (see  Supplementary Information) . The models are estimated in R using the metafor package 84 .

To check that no single study exerts undue influence on the average effect sizes measured, we calculate Cook’s distance and DFBETAS. For three studies in the sample the values of these metrics are distinctly different. All three studies assess the effect of emissions from the burning of coal. As these effects likely result from fuel switching without capturing the overall emission effect, 13 effect sizes from five studies with a focus on emissions from coal are excluded in the main assessment. Estimates including these studies are provided in the  Supplementary Information , resulting in an average treatment effect of –12.5%.

To correct for publication bias, we follow the guidance by Stanley et al. 87 and Ioannidis et al. 38 and estimate the model for a reduced set of the adequately powered effect sizes. To assess the power of each effect size we use the standard error of each effect and assume the genuine effect to be the average treatment effect from our full set random effects model. We follow common practise and assume studies with power of above 80% to be adequately powered 88 . We estimate a multilevel random effects model, in line with our main approach, instead of a fixed effects model proposed in the literature 38 , 87 .

Heterogeneity assessment

There is considerable heterogeneity in the effect sizes (I 2 =0.86 in the random effects model). To capture the variation in the response to the policy, we code variables for the carbon pricing schemes as well as information on the sector coverage of the scheme (or the study, where the study focuses on a single sector), the mean carbon price level during the assessment period, and a variable distinguishing carbon taxes from cap-and-trade schemes. The information on sector coverage and the price level was added from external sources 53 , 89 . We furthermore code a set of variables on the study design, estimation methods, and data used from the primary studies. Details on the moderator variables are provided in the  Supplementary Information .

Given the large number of potential explanatory variables, we use the Bayesian model averaging technique (BMA) 22 , 90 , 91 , 92 , employing a Markov chain Monte Carlo (specifically, the Metropolis-Hastings algorithm of the bms package for R 93 ) to walk through the most likely combinations of explanatory variables. In the baseline specification we employ the unit information prior which is recommended by Eicher et al. 94 . This agnostic prior reflects our lack of knowledge regarding the probability of individual parameter values. To test the robustness of our estimates we follow Havranek et al. 22 , 92 and use the dilution prior that adjusts model probabilities by multiplying them by the determinant of the correlation matrix of the variables included in the model. Furthermore, as another robustness check, we follow Ley and Steel and apply the beta-binomial random model prior, which gives the same weight to each model size 95 , as well as Fernández at al. who use the so-called BRIC g-prior 96 . The BMA results using alternative priors are provided in the  Supplementary Information .

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

The study and effect size data collected for this study have been deposited in Github and can be accessed here: https://github.com/doebbeling/carbon_pricing_effectiveness.git .

Code availability

The code used for the meta-analysis has been deposited in Github and can be accessed here: https://github.com/doebbeling/carbon_pricing_effectiveness.git .

IPCC. Climate Change 2022: Mitigation of Climate Change. Contribution of Working Group III to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change (Cambridge University Press, Cambridge, UK and New York, NY, USA, 2022).

UNEP. Emissions Gap Report 2022: The Closing Window - Climate crisis calls for rapid transformation of societies. In The Closing Window: Climate crisis calls for rapid transformation of societies : Emissions Gap Report 2022 , 65 (Nairobi, 2022). https://www.unep.org/emissions-gap-report-2022 . OCLC: 9689774129.

World Bank Carbon Pricing Dashboard: Key statistics for 2023 on initiatives implemented (2023). https://carbonpricingdashboard.worldbank.org/map_data .

Climate Leadership Council. Economists’ Statement on Carbon Dividends Organized by the Climate Leadership Council (2019). https://www.econstatement.org .

High-Level Commission on Carbon Prices. Report of the High-Level Commission on Carbon Prices (World Bank, Washington, DC, 2017).

Baumol, W. J. & Oates, W. E. The Use of Standards and Prices for Protection of the Environment. In Bohm, P. & Kneese, A. V. (eds.) The Economics of Environment , 53–65 (Palgrave Macmillan UK, London, 1971). http://link.springer.com/10.1007/978-1-349-01379-1_4 .

Montgomery, W. Markets in licenses and efficient pollution control programs. J. Econ. Theory 5 , 395–418 (1972).

Article   MathSciNet   Google Scholar  

Sterner, T. et al. Policy design for the Anthropocene. Nat. Sustainability 2 , 14–21 (2019).

Article   Google Scholar  

Rosenbloom, D., Markard, J., Geels, F. W. & Fuenfschilling, L. Why carbon pricing is not sufficient to mitigate climate change-and how “sustainability transition policy” can help. Proc. Natl Acad. Sci. 117 , 8664–8668 (2020).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Savin, I., Drews, S., Maestre-Andrés, S. & Van Den Bergh, J. Public views on carbon taxation and its fairness: a computational-linguistics analysis. Clim. Change 162 , 2107–2138 (2020).

Article   ADS   Google Scholar  

Patt, A. & Lilliestam, J. The Case against Carbon Prices. Joule 2 , 2494–2498 (2018).

Green, J. F. Beyond Carbon Pricing: Tax Reform is Climate Policy. Glob. Policy 12 , 372–379 (2021). Publisher: John Wiley & Sons, Ltd.

Rotaris, L. & Danielis, R. The willingness to pay for a carbon tax in Italy. Transp. Res. Part D: Transp. Environ. 67 , 659–673 (2019).

Hu, Y., Ren, S., Wang, Y. & Chen, X. Can carbon emission trading scheme achieve energy conservation and emission reduction? Evidence from the industrial sector in China. Energy Econ. 85 , 104590 (2020).

Rafaty, R., Dolphin, G. & Pretis, F. Carbon Pricing and the Elasticity of CO2 Emissions. Institute for New Economic Thinking Working Paper Series 1–84 (2020). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3812786 .

Leroutier, M. Carbon pricing and power sector decarbonization: Evidence from the UK. J. Environ. Econ. Manag. 111 , 102580 (2022).

Rivers, N. & Schaufele, B. Salience of carbon taxes in the gasoline market. J. Environ. Econ. Manag. 74 , 23–36 (2015).

Xiang, D. & Lawley, C. The impact of British Columbia’s carbon tax on residential natural gas consumption. Energy Econ. 80 , 206–218 (2019).

Gugler, K., Haxhimusa, A. & Liebensteiner, M. Effectiveness of climate policies: Carbon pricing vs. subsidizing renewables. J. Environ. Econ. Manag. 106 , 102405 (2021).

Espey, M. Gasoline demand revisited: an international meta-analysis of elasticities. Energy Econ. 20 , 273–295 (1998).

Brons, M., Nijkamp, P., Pels, E. & Rietveld, P. A meta-analysis of the price elasticity of gasoline demand. A SUR approach. Energy Econ. 30 , 2105–2122 (2008).

Havranek, T., Irsova, Z. & Janda, K. Demand for gasoline is more price-inelastic than commonly thought. Energy Econ. 34 , 201–207 (2012).

Espey, J. A. & Espey, M. Turning on the lights: A meta-analysis of residential electricity demand elasticities. J. Agric. Appl. Econ. 36 , 65–81 (2004).

Labandeira, X., Labeaga, J. M. & López-Otero, X. A meta-analysis on the price elasticity of energy demand. Energy Policy 102 , 549–568 (2017).

Lilliestam, J., Patt, A. & Bersalli, G. The effect of carbon pricing on technological change for full energy decarbonization: A review of empirical ex-post evidence. WIREs Climate Change 12 (2021). https://onlinelibrary.wiley.com/doi/10.1002/wcc.681 .

Köppl, A. & Schratzenstaller, M. Carbon taxation: A review of the empirical literature. Journal of Economic Surveys joes.12531 (2022). https://onlinelibrary.wiley.com/doi/10.1111/joes.12531 .

Haites, E. Carbon taxes and greenhouse gas emissions trading systems: what have we learned? Clim. Policy 18 , 955–966 (2018).

Green, J. F. Does carbon pricing reduce emissions? A review of ex-post analyses. Environ. Res. Lett. 16 , 043004 (2021).

Article   ADS   CAS   Google Scholar  

Haddaway, N. R. et al. Eight problems with literature reviews and how to fix them. Nat. Ecol. Evolut. 4 , 1582–1589 (2020).

Van Den Bergh, J. & Savin, I. Impact of carbon pricing on low-carbon innovation and deep decarbonisation: Controversies and path forward. Environ. Resour. Econ. 80 , 705–715 (2021).

Minx, J. C., Callaghan, M., Lamb, W. F., Garard, J. & Edenhofer, O. Learning about climate change solutions in the IPCC and beyond. Environ. Sci. Policy 77 , 252–259 (2017).

Minx, J. C., Haddaway, N. R. & Ebi, K. L. Planetary health as a laboratory for enhanced evidence synthesis. Lancet Planet. Health 3 , e443–e445 (2019).

Article   PubMed   Google Scholar  

Berrang-Ford, L. et al. Editorial: Evidence synthesis for accelerated learning on climate solutions. Campbell Syst. Rev. 16 , 1 (2020).

Google Scholar  

Collaboration for Environmental Evidence. Guidelines and Standards for Evidence Synthesis in Environmental Management, vol. Version 5.0 (2018). http://www.environmentalevidence.org/Documents/Guidelines/Guidelines4.2.pdf .

Callaghan, M. & Müller-Hansen, F. Statistical stopping criteria for automated screening in systematic reviews. Syst. Rev. 9 , 273 (2020).

Article   PubMed   PubMed Central   Google Scholar  

Elliott, J. et al. Decision makers need constantly updated evidence synthesis. Nature 600 , 383–385 (2021).

Article   ADS   CAS   PubMed   Google Scholar  

Ioannidis, J. P. A. Why most published research findings are false. PLoS Med. 2 , e124 (2005).

Ioannidis, J. P. A., Stanley, T. D. & Doucouliagos, H. The power of bias in economics research. Econ. J. 127 , F236–F265 (2017).

Rosenthal, R. The file drawer problem and tolerance for null results. Psychol. Bull. 86 , 638–641 (1979).

Brodeur, A., Lé, M., Sangnier, M. & Zylberberg, Y. Star wars: The empirics strike back. Am. Econ. J.: Appl. Econ. 8 , 1–32 (2016).

Egger, M., Smith, G. D., Schneider, M. & Minder, C. Bias in meta-analysis detected by a simple, graphical test. BMJ 315 , 629–634 (1997).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Stanley, T. D. Meta-Regression Methods for Detecting and Estimating Empirical Effects in the Presence of Publication Selection. Oxford Bulletin of Economics and Statistics (2007). https://onlinelibrary.wiley.com/doi/10.1111/j.1468-0084.2007.00487.x .

Weitzman, M. L. Prices vs. Quantities. Rev. Econ. Stud. 41 , 477 (1974).

Hepburn, C. Regulation by prices, quantities, or both: A review of instrument choice. Oxf. Rev. Econ. Policy 22 , 226–247 (2006).

Kalkuhl, M. & Edenhofer, O. Prices vs. Quantities and the Intertemporal Dynamics of the Climate Rent. SSRN Electr. J. (2010). https://www.ssrn.com/abstract=1605112 .

Goulder, L. H. & Schein, A. R. CARBON TAXES VERSUS CAP AND TRADE: A CRITICAL REVIEW. Clim. Change Econ. 04 , 1350010 (2013).

Foramitti, J., Savin, I. & van den Bergh, J. C. Emission tax vs. permit trading under bounded rationality and dynamic markets. Energy Policy 148 , 112009 (2021).

Flachsland, C., Brunner, S., Edenhofer, O. & Creutzig, F. Climate policies for road transport revisited (II): Closing the policy gap with cap-and-trade. Energy Policy 39 , 2100–2110 (2011).

Kesicki, F. & Strachan, N. Marginal abatement cost (MAC) curves: confronting theory and practice. Environ. Sci. Policy 14 , 1195–1204 (2011).

Tang, B.-J., Ji, C.-J., Hu, Y.-J., Tan, J.-X. & Wang, X.-Y. Optimal carbon allowance price in China’s carbon emission trading system: Perspective from the multi-sectoral marginal abatement cost. J. Clean. Prod. 253 , 119945 (2020).

Kaufman, N., Barron, A. R., Krawczyk, W., Marsters, P. & McJeon, H. A near-term to net zero alternative to the social cost of carbon for setting carbon prices. Nat. Clim. Change 10 , 1010–1014 (2020).

Strefler, J. et al. Alternative carbon price trajectories can avoid excessive carbon removal. Nat. Commun. 12 , 2264 (2021).

International Carbon Action Partnership. Allowance Price Explorer (2022). https://icapcarbonaction.com/en/ets-prices .

National and Sub-national Policies and Institutions. In IPCC (ed.) Climate Change 2022 - Mitigation of Climate Change, 1355–1450 (Cambridge University Press, 2023), 1 edn. https://www.cambridge.org/core/product/identifier/9781009157926%23c13/type/book_part .

McKibbin, W. J., Shackleton, R. & Wilcoxen, P. J. What to expect from an international system of tradable permits for carbon emissions. Resour. Energy Econ. 21 , 319–346 (1999).

Flachsland, C., Marschinski, R. & Edenhofer, O. To link or not to link: benefits and disadvantages of linking cap-and-trade systems. Clim. Policy 9 , 358–372 (2009).

Sterner, T. Fuel taxes: An important instrument for climate policy. Energy Policy 35 , 3194–3202 (2007).

Dahl, C. A. Measuring global gasoline and diesel price and income elasticities. Model. Transp. (Energy) Demand Policies 41 , 2–13 (2012).

Morris, J., Paltsev, S. & Reilly, J. Marginal Abatement Costs and Marginal Welfare Costs for Greenhouse Gas Emissions Reductions: Results from the EPPA Model. Environ. Model. Assess. 17 , 325–336 (2012).

Agnolucci, P.et al. Measuring Total Carbon Pricing. The World Bank Research Observer lkad009 (2023). https://academic.oup.com/wbro/advance-article/doi/10.1093/wbro/lkad009/7283905 .

Xiang, C. & van Gevelt, T. Political signalling and emissions trading schemes in China: Insights from Guangdong Province. Energy Sustain. Dev. 71 , 307–314 (2022).

Murray, B. C. & Maniloff, P. T. Why have greenhouse emissions in RGGI states declined? An econometric attribution to economic, energy market, and policy factors. Energy Econ. 51 , 581–589 (2015).

Zhou, Y. & Huang, L. How regional policies reduce carbon emissions in electricity markets: Fuel switching or emission leakage. Energy Econ. 97 , 105209 (2021).

Lawley, C. & Thivierge, V. Refining the evidence: British Columbias carbon tax and household gasoline consumption. The Energy Journal 39 (2018). http://www.iaee.org/en/publications/ejarticle.aspx?id=3056 .

Erutku, C. & Hildebrand, V. Carbon Tax at the Pump in British Columbia and Quebec. Can. Public Policy 44 , 126–133 (2018).

Best, R., Burke, P. J. & Jotzo, F. Carbon Pricing Efficacy: Cross-Country Evidence. Environ. Resour. Econ. 77 , 69–94 (2020).

Kohlscheen, E., Moessner, R. & Takats, E. Effects of Carbon Pricing and Other Climate Policies on CO 2 Emissions (2021). Publisher: CESifo Working Paper.

Runst, P. & Thonipara, A. Dosis facit effectum why the size of the carbon tax matters: Evidence from the Swedish residential sector. Energy Econ. 91 , 104898 (2020).

Gugler, K. P., Haxhimusa, A. & Liebensteiner, M. Carbon Pricing and Emissions: Causal Effects of Britain’s Carbon Tax. SSRN Electronic Journal (2022). https://www.ssrn.com/abstract=4116240 .

Colmer, J., Martin, R., Muûls, M. & Wagner, U. J. Does pricing carbon mitigate climate change? Firm-level evidence from the European Union emissions trading scheme (2022). Publisher: CEPR Discussion Paper No. DP16982.

Fernández Fernández, Y., Fernández López, M., González Hernández, D. & Olmedillas Blanco, B. Institutional change and environment: Lessons from the European emission trading system. Energies 11 , 706 (2018).

Gupta, N., Shah, J., Gupta, S. & Kaul, R. Causal impact of european union emission trading scheme on firm behaviour and economic performance: A study of german manufacturing firms (2021). https://arxiv.org/abs/2108.07163 . Publisher: arXiv Version Number: 1.

Petrick, S. & Wagner, U. J. The impact of carbon trading on industry: Evidence from german manufacturing firms. SSRN Electr. J. (2014). http://www.ssrn.com/abstract=2389800 .

Pawson, R., Greenhalgh, T., Harvey, G. & Walshe, K. Realist synthesis: an introduction. Manchester: ESRC Research Methods Programme Working Paper Series, University of Manchester (2004).

Pawson, R., Greenhalgh, T., Harvey, G. & Walshe, K. Realist review-a new method of systematic review designed for complex policy interventions. J. Health Serv. Res. policy 10 , 21–34 (2005).

Vrolijk, K. & Sato, M. Quasi-experimental evidence on carbon pricing. World Bank Res. Observer 38 , 213–248 (2023).

Ferraro, P. J. et al. Create a culture of experiments in environmental programs. Science 381 , 735–737 (2023).

Koch, N., Naumann, L., Pretis, F., Ritter, N. & Schwarz, M. Attributing agnostically detected large reductions in road CO2 emissions to policy mixes. Nat. Energy 7 , 844–853 (2022).

Döbbeling, N. et al. Protocol: Effectiveness of carbon pricing - A systematic review and meta-analysis of the ex-post literature (2022). https://osf.io/854vp/ . Publisher: Open Science Framework.

Callaghan, M., Müller-Hansen, F., Hilaire, J. & Lee, Y. T. NACSOS: NLP Assisted Classification, Synthesis and Online Screening (2020). https://zenodo.org/record/4121526 .

Sterne, J. A., Hernán, M. A., McAleenan, A., Reeves, B. C. & Higgins, J. P. Assessing risk of bias in a non-randomized study. In Higgins, J. P. et al. (eds.) Cochrane Handbook for Systematic Reviews of Interventions , 621–641 (Wiley, 2019), 1 edn. https://onlinelibrary.wiley.com/doi/10.1002/9781119536604.ch25 .

Ringquist, E.Meta-analysis for public management and policy (John Wiley & Sons, 2013).

Viechtbauer, W. Bias and efficiency of meta-analytic variance estimators in the random-effects model. J. Educ. Behav. Stat. 30 , 261–293 (2005).

Viechtbauer, W. Conducting Meta-Analyses in R with the metafor Package. J. Stat. Softw. 36 (2010). http://www.jstatsoft.org/v36/i03/ .

Tipton, E. Small sample adjustments for robust variance estimation with meta-regression. Psychol. Methods 20 , 375–393 (2015).

Pustejovsky, J. E. & Tipton, E. Small-Sample Methods for Cluster-Robust Variance Estimation and Hypothesis Testing in Fixed Effects Models. J. Bus. Econ. Stat. 36 , 672–683 (2018).

Stanley, T. D., Doucouliagos, H. & Ioannidis, J. P. A. Finding the power to reduce publication bias: Finding the power to reduce publication bias. Stat. Med. (2017). https://onlinelibrary.wiley.com/doi/10.1002/sim.7228 .

Cohen, J. Some statistical issues in psychological research. Handbook of clinical psychology 95–121 (1965).

World Bank. Carbon Pricing Dashboard: Key statistics for 2022 on initiatives implemented (2022). https://carbonpricingdashboard.worldbank.org/map_data .

Raftery, A. E., Madigan, D. & Hoeting, J. A. Bayesian model averaging for linear regression models. J. Am. Stat. Assoc. 92 , 179–191 (1997).

Bajzik, J., Havranek, T., Irsova, Z. & Schwarz, J. Estimating the Armington elasticity: The importance of study design and publication bias. J. Int. Econ. 127 , 103383 (2020).

Havranek, T., Rusnak, M. & Sokolova, A. Habit formation in consumption: A meta-analysis. Eur. Econ. Rev. 95 , 142–167 (2017).

Zeugner, S. & Feldkircher, M. Bayesian model averaging employing fixed and flexible priors: The BMS Package for R . Journal of Statistical Software 68 (2015). http://www.jstatsoft.org/v68/i04/ .

Eicher, T. S., Papageorgiou, C. & Raftery, A. E. Default priors and predictive performance in Bayesian model averaging, with application to growth determinants. J. Appl. Econ. 26 , 30–55 (2011).

Ley, E. & Steel, M. F. On the effect of prior assumptions in Bayesian model averaging with applications to growth regression. J. Appl. Econ. 24 , 651–674 (2009).

Fernández, C., Ley, E. & Steel, M. F. Benchmark priors for Bayesian model averaging. J. Econ. 100 , 381–427 (2001).

Haddaway, N., Macura, B., Whaley, P. & Pullin, A. ROSES flow diagram for systematic reviews. Version 1.0 (2018). https://figshare.com/articles/ROSES_Flow_Diagram_Version_1_0/5897389 .

Download references

Acknowledgements

N.D.H. is supported by a PhD stipend from the Heinrich Böll Stiftung. J.C.M, T.M.K., O.E., N.K., M.C., C.F., and M.K. acknowledge research funding from the German Ministry of Education and Research (ARIADNE project–Grant No. 03SFK5J0). J.C.M., K.M., and M.K. acknowledge funding from the European Union under the Horizon program (CAPABLE project—Grant No. 101056891). Views and opinions expressed are however those of the authors only.

Author information

Authors and affiliations.

Mercator Research Institute on Global Commons and Climate Change, Berlin, Germany

Niklas Döbbeling-Hildebrandt, Klaas Miersch, Tarun M. Khanna, Marion Bachelet, Max Callaghan, Ottmar Edenhofer, Christian Flachsland, Matthias Kalkuhl, Nicolas Koch, William F. Lamb, Nils Ohlendorf, Jan Christoph Steckel & Jan C. Minx

Priestley International Centre for Climate, School of Earth and Environment, University of Leeds, Leeds, UK

Niklas Döbbeling-Hildebrandt, Piers M. Forster, William F. Lamb & Jan C. Minx

Technische Universität, Berlin, Germany

Klaas Miersch, Ottmar Edenhofer & Nils Ohlendorf

University of British Columbia, Vancouver, BC, Canada

Tarun M. Khanna

Centre for Environmental Sciences, Hasselt University, Hasselt, Belgium

Stephan B. Bruns

International Center for Higher Education Research (INCHER), University of Kassel, Kassel, Germany

Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA

Potsdam Institute for Climate Impact Research, Potsdam, Germany

Ottmar Edenhofer

Hertie School Centre for Sustainability, Berlin, Germany

Christian Flachsland

Faculty of Economics and Social Sciences, University of Potsdam, Potsdam, Germany

Matthias Kalkuhl

Institute of Labor Economics (IZA), Bonn, Germany

Nicolas Koch

Brandenburg University of Technology, Cottbus, Germany

Jan Christoph Steckel

You can also search for this author in PubMed   Google Scholar

Contributions

J.C.M., N.D.H., K.M., T.M.K., O.E., N.K., W.F.L., N.O. and J.C.S. designed the research. N.D.H., K.M., M.C. and J.C.M. developed the literature screening strategy. N.D.H., K.M., M.B., N.K., W.F.L., N.O., J.C.S. and J.C.M. manually screened the literature and N.D.H., K.M. and M.B. extracted the data. N.D.H. and M.C. performed the machine learning-enabled screening. N.D.H., K.M., T.M.K. and S.B.B. performed the meta-analysis. T.M.K. and N.D.H. conducted the Bayesian Model Averaging analysis. N.D.H., K.M., T.M.K., S.B.B., O.E., C.F., P.M.F., M.K., N.K., J.C.S. and J.C.M. analysed the results. N.D.H., T.M.K., J.C.M. and K.M. wrote the manuscript with contributions from all authors.

Corresponding authors

Correspondence to Niklas Döbbeling-Hildebrandt or Jan C. Minx .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, peer review file, reporting summary, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Döbbeling-Hildebrandt, N., Miersch, K., Khanna, T.M. et al. Systematic review and meta-analysis of ex-post evaluations on the effectiveness of carbon pricing. Nat Commun 15 , 4147 (2024). https://doi.org/10.1038/s41467-024-48512-w

Download citation

Received : 09 May 2023

Accepted : 02 May 2024

Published : 16 May 2024

DOI : https://doi.org/10.1038/s41467-024-48512-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

systematic review quantitative synthesis

  • Mayo Clinic Libraries
  • Systematic Reviews
  • Synthesis & Meta-Analysis

Systematic Reviews: Synthesis & Meta-Analysis

  • Knowledge Synthesis Comparison
  • Knowledge Synthesis Decision Tree
  • Standards & Reporting Results
  • Materials in the Mayo Clinic Libraries
  • Training Resources
  • Review Teams
  • Develop & Refine Your Research Question
  • Develop a Timeline
  • Project Management
  • Communication
  • PRISMA-P Checklist
  • Eligibility Criteria
  • Register your Protocol
  • Other Resources
  • Other Screening Tools
  • Grey Literature Searching
  • Citation Searching
  • Data Extraction Tools
  • Minimize Bias
  • Critical Appraisal by Study Design
  • Publishing your Systematic Review

Bringing It All Together

systematic review quantitative synthesis

Synthesis involves pooling the extracted data from the included studies and summarizing the findings based on the overall strength of the evidence and consistency of observed effects. All reviews should include a qualitative synthesis and may also include a quantitative synthesis (i.e. meta-analysis). Data from sufficiently comparable and reliable studies are weighted and evaluated to determine the cumulative outcome in a meta-analysis.  Tabulation and graphical display  of the results (e.g. forest plot showing the mean, range and variance from each study visually aligned) are typically included for most forms of synthesis. Generally, conclusions are drawn about the usefulness of an intervention or  the relevant body of literature with suggestions for future research directions.

An AHRQ guide  and c hapters  9 ,  10 ,  11 , and  12 of the Cochrane Handbook  and further address meta-analyses and other synthesis methods.

Consult Cochrane Interactive Learning Module 6: Analyzing the Data and Module 7. Interpreting the Findings for further information.  *Please note you will need to register for a Cochrane account while initially on the Mayo network. You'll receive an email message containing a link to create a password and activate your account.*

References & Recommended Reading

  • Morton SC, Murad MH, O’Connor E, Lee CS, Booth M, Vandermeer BW, Snowden JM, D’Anci KE, Fu R, Gartlehner G, Wang Z, Steele DW. Quantitative Synthesis—An Update . 2018 Feb 23. In: Methods Guide for Effectiveness and Comparative Effectiveness Reviews [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2008–.
  • McKenzie, JE, et al. Chapter 9: Summarizing study characteristics and preparing for synthesis. In: Higgins JPT, et al. (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.2. Cochrane, 2021. Available from  www.training.cochrane.org/handbook See - Section 9 
  • Deeks, JJ, et al. Chapter 10: Analysing data and undertaking meta-analyses. In: Higgins JPT, et al. (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.2. Cochrane, 2021. Available from  www.training.cochrane.org/handbook See - Section 10 
  • Chaimani, A, et al. Chapter 11: Undertaking network meta-analyses. In: Higgins JPT, et al. (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.2. Cochrane, 2021. Available from  www.training.cochrane.org/handbook See - Section 11 
  • McKenzie, JE, Brennan SE. Chapter 12: Synthesizing and presenting findings using other methods. In: Higgins JPT, et al. (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.2. Cochrane, 2021. Available from  www.training.cochrane.org/handbook See - Section 12  
  • Campbell M, McKenzie JE, Sowden A, et al.  Synthesis without meta-analysis (SWiM) in systematic reviews: reporting guideline.  BMJ (Clinical research ed). 2020;368:l6890.
  • Alavi M, Hunt GE, Visentin DC, Watson R, Thapa DK, Cleary M.  Seeing the forest for the trees: How to interpret a meta-analysis forest plot.  Journal of advanced nursing. 2021;77(3):1097-1101. doi:https://dx.doi.org/10.1111/jan.14721
  • << Previous: GRADE
  • Next: Publishing your Systematic Review >>
  • Last Updated: May 10, 2024 7:59 AM
  • URL: https://libraryguides.mayo.edu/systematicreviewprocess

Ohio State nav bar

The Ohio State University

  • BuckeyeLink
  • Find People
  • Search Ohio State

Health Sciences Library

Systematic Reviews

  • What is a Systematic Review?
  • 1. Choose the Right Kind of Review
  • 2. Formulate Your Question
  • 3. Establish a Team
  • 4. Develop a Protocol
  • 5. Conduct the Search
  • 6. Select Studies
  • 7. Extract Data
  • 8. Synthesize Your Results

Synthesize Your Results

Qualtitative synthesis, quantitative synthesis (meta-analysis).

  • 9. Disseminate Your Report
  • Request a Librarian Consultation

Consult With a Librarian

systematic review quantitative synthesis

To make an appointment to consult with an HSL librarian on your systematic review, please read our Systematic Review Policy and submit a Systematic Review Consultation Request .

To ask a question or make an appointment for assistance with a narrative review, please complete the Ask a Librarian Form .

Your collected data must be combined into a coherent whole and accompanied by an analysis that conveys a deeper understanding of the body of evidence. All reviews should include a qualitative synthesis, and may or may not include a quantitative synthesis (also known as a meta-analysis).

A qualitative synthesis is a narrative, textual approach to summarizing, analyzing, and assessing the body of evidence included in your review.  It is a necessary part of all systematic reviews, even those with a focus on quantitative data.

Use the qualitative synthesis to:

  • Provide a general summary of the characteristics and findings of the included studies.
  • Analyze the relationships between studies, exploring patterns and investigating heterogeneity.
  • Discuss the applicability of the body of evidence to the review's question within the PICO structure.
  • Explain the meta-analysis (if one is conducted) and interpret and analyze the robustness of its results.
  • Critique the strengths and weaknesses of the body of evidence as a whole, including a cumulative assessment of the risk of bias across studies.
  • Discuss any gaps in the evidence, such as patient populations that have been inadequately studied or for whom results differ.
  • Compare the review's findings with current conventional wisdom when appropriate.

A quantitative synthesis, or meta-analysis, uses statistical techniques to combine and analyze the results of multiple studies. The feasibility and sensibility of including a meta-analysis as part of your systematic review will depend on the data available.

Requirements for quantitative synthesis:

  • Clinical and methodological similarity between compared studies
  • Consistent study quality among compared studies
  • Statistical expertise from a review team member or consultant
  • << Previous: 7. Extract Data
  • Next: 9. Disseminate Your Report >>
  • Last Updated: May 14, 2024 8:03 AM
  • URL: https://hslguides.osu.edu/systematic_reviews

Jump to navigation

Home

Cochrane Training

Chapter 12: synthesizing and presenting findings using other methods.

Joanne E McKenzie, Sue E Brennan

Key Points:

  • Meta-analysis of effect estimates has many advantages, but other synthesis methods may need to be considered in the circumstance where there is incompletely reported data in the primary studies.
  • Alternative synthesis methods differ in the completeness of the data they require, the hypotheses they address, and the conclusions and recommendations that can be drawn from their findings.
  • These methods provide more limited information for healthcare decision making than meta-analysis, but may be superior to a narrative description where some results are privileged above others without appropriate justification.
  • Tabulation and visual display of the results should always be presented alongside any synthesis, and are especially important for transparent reporting in reviews without meta-analysis.
  • Alternative synthesis and visual display methods should be planned and specified in the protocol. When writing the review, details of the synthesis methods should be described.
  • Synthesis methods that involve vote counting based on statistical significance have serious limitations and are unacceptable.

Cite this chapter as: McKenzie JE, Brennan SE. Chapter 12: Synthesizing and presenting findings using other methods. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.4 (updated August 2023). Cochrane, 2023. Available from www.training.cochrane.org/handbook .

12.1 Why a meta-analysis of effect estimates may not be possible

Meta-analysis of effect estimates has many potential advantages (see Chapter 10 and Chapter 11 ). However, there are circumstances where it may not be possible to undertake a meta-analysis and other statistical synthesis methods may be considered (McKenzie and Brennan 2014).

Some common reasons why it may not be possible to undertake a meta-analysis are outlined in Table 12.1.a . Legitimate reasons include limited evidence; incompletely reported outcome/effect estimates, or different effect measures used across studies; and bias in the evidence. Other commonly cited reasons for not using meta-analysis are because of too much clinical or methodological diversity, or statistical heterogeneity (Achana et al 2014). However, meta-analysis methods should be considered in these circumstances, as they may provide important insights if undertaken and interpreted appropriately.

Table 12.1.a Scenarios that may preclude meta-analysis, with possible solutions

12.2 Statistical synthesis when meta-analysis of effect estimates is not possible

A range of statistical synthesis methods are available, and these may be divided into three categories based on their preferability ( Table 12.2.a ). Preferable methods are the meta-analysis methods outlined in Chapter 10 and Chapter 11 , and are not discussed in detail here. This chapter focuses on methods that might be considered when a meta-analysis of effect estimates is not possible due to incompletely reported data in the primary studies. These methods divide into those that are ‘acceptable’ and ‘unacceptable’. The ‘acceptable’ methods differ in the data they require, the hypotheses they address, limitations around their use, and the conclusions and recommendations that can be drawn (see Section 12.2.1 ). The ‘unacceptable’ methods in common use are described (see Section 12.2.2 ), along with the reasons for why they are problematic.

Compared with meta-analysis methods, the ‘acceptable’ synthesis methods provide more limited information for healthcare decision making. However, these ‘acceptable’ methods may be superior to a narrative that describes results study by study, which comes with the risk that some studies or findings are privileged above others without appropriate justification. Further, in reviews with little or no synthesis, readers are left to make sense of the research themselves, which may result in the use of seemingly simple yet problematic synthesis methods such as vote counting based on statistical significance (see Section 12.2.2.1 ).

All methods first involve calculation of a ‘standardized metric’, followed by application of a synthesis method. In applying any of the following synthesis methods, it is important that only one outcome per study (or other independent unit, for example one comparison from a trial with multiple intervention groups) contributes to the synthesis. Chapter 9 outlines approaches for selecting an outcome when multiple have been measured. Similar to meta-analysis, sensitivity analyses can be undertaken to examine if the findings of the synthesis are robust to potentially influential decisions (see Chapter 10, Section 10.14 and Section 12.4 for examples).

Authors should report the specific methods used in lieu of meta-analysis (including approaches used for presentation and visual display), rather than stating that they have conducted a ‘narrative synthesis’ or ‘narrative summary’ without elaboration. The limitations of the chosen methods must be described, and conclusions worded with appropriate caution. The aim of reporting this detail is to make the synthesis process more transparent and reproducible, and help ensure use of appropriate methods and interpretation.

Table 12.2.a Summary of preferable and acceptable synthesis methods

12.2.1 Acceptable synthesis methods

12.2.1.1 summarizing effect estimates.

Description of method Summarizing effect estimates might be considered in the circumstance where estimates of intervention effect are available (or can be calculated), but the variances of the effects are not reported or are incorrect (and cannot be calculated from other statistics, or reasonably imputed) (Grimshaw et al 2003). Incorrect calculation of variances arises more commonly in non-standard study designs that involve clustering or matching ( Chapter 23 ). While missing variances may limit the possibility of meta-analysis, the (standardized) effects can be summarized using descriptive statistics such as the median, interquartile range, and the range. Calculating these statistics addresses the question ‘What is the range and distribution of observed effects?’

Reporting of methods and results The statistics that will be used to summarize the effects (e.g. median, interquartile range) should be reported. Box-and-whisker or bubble plots will complement reporting of the summary statistics by providing a visual display of the distribution of observed effects (Section 12.3.3 ). Tabulation of the available effect estimates will provide transparency for readers by linking the effects to the studies (Section 12.3.1 ). Limitations of the method should be acknowledged ( Table 12.2.a ).

12.2.1.2 Combining P values

Description of method Combining P values can be considered in the circumstance where there is no, or minimal, information reported beyond P values and the direction of effect; the types of outcomes and statistical tests differ across the studies; or results from non-parametric tests are reported (Borenstein et al 2009). Combining P values addresses the question ‘Is there evidence that there is an effect in at least one study?’ There are several methods available (Loughin 2004), with the method proposed by Fisher outlined here (Becker 1994).

Fisher’s method combines the P values from statistical tests across k studies using the formula:

systematic review quantitative synthesis

One-sided P values are used, since these contain information about the direction of effect. However, these P values must reflect the same directional hypothesis (e.g. all testing if intervention A is more effective than intervention B). This is analogous to standardizing the direction of effects before undertaking a meta-analysis. Two-sided P values, which do not contain information about the direction, must first be converted to one-sided P values. If the effect is consistent with the directional hypothesis (e.g. intervention A is beneficial compared with B), then the one-sided P value is calculated as

systematic review quantitative synthesis

In studies that do not report an exact P value but report a conventional level of significance (e.g. P<0.05), a conservative option is to use the threshold (e.g. 0.05). The P values must have been computed from statistical tests that appropriately account for the features of the design, such as clustering or matching, otherwise they will likely be incorrect.

systematic review quantitative synthesis

Reporting of methods and results There are several methods for combining P values (Loughin 2004), so the chosen method should be reported, along with details of sensitivity analyses that examine if the results are sensitive to the choice of method. The results from the test should be reported alongside any available effect estimates (either individual results or meta-analysis results of a subset of studies) using text, tabulation and appropriate visual displays (Section 12.3 ). The albatross plot is likely to complement the analysis (Section 12.3.4 ). Limitations of the method should be acknowledged ( Table 12.2.a ).

12.2.1.3 Vote counting based on the direction of effect

Description of method Vote counting based on the direction of effect might be considered in the circumstance where the direction of effect is reported (with no further information), or there is no consistent effect measure or data reported across studies. The essence of vote counting is to compare the number of effects showing benefit to the number of effects showing harm for a particular outcome. However, there is wide variation in the implementation of the method due to differences in how ‘benefit’ and ‘harm’ are defined. Rules based on subjective decisions or statistical significance are problematic and should be avoided (see Section 12.2.2 ).

To undertake vote counting properly, each effect estimate is first categorized as showing benefit or harm based on the observed direction of effect alone, thereby creating a standardized binary metric. A count of the number of effects showing benefit is then compared with the number showing harm. Neither statistical significance nor the size of the effect are considered in the categorization. A sign test can be used to answer the question ‘is there any evidence of an effect?’ If there is no effect, the study effects will be distributed evenly around the null hypothesis of no difference. This is equivalent to testing if the true proportion of effects favouring the intervention (or comparator) is equal to 0.5 (Bushman and Wang 2009) (see Section 12.4.2.3 for guidance on implementing the sign test). An estimate of the proportion of effects favouring the intervention can be calculated ( p = u / n , where u = number of effects favouring the intervention, and n = number of studies) along with a confidence interval (e.g. using the Wilson or Jeffreys interval methods (Brown et al 2001)). Unless there are many studies contributing effects to the analysis, there will be large uncertainty in this estimated proportion.

Reporting of methods and results The vote counting method should be reported in the ‘Data synthesis’ section of the review. Failure to recognize vote counting as a synthesis method has led to it being applied informally (and perhaps unintentionally) to summarize results (e.g. through the use of wording such as ‘3 of 10 studies showed improvement in the outcome with intervention compared to control’; ‘most studies found’; ‘the majority of studies’; ‘few studies’ etc). In such instances, the method is rarely reported, and it may not be possible to determine whether an unacceptable (invalid) rule has been used to define benefit and harm (Section 12.2.2 ). The results from vote counting should be reported alongside any available effect estimates (either individual results or meta-analysis results of a subset of studies) using text, tabulation and appropriate visual displays (Section 12.3 ). The number of studies contributing to a synthesis based on vote counting may be larger than a meta-analysis, because only minimal statistical information (i.e. direction of effect) is required from each study to vote count. Vote counting results are used to derive the harvest and effect direction plots, although often using unacceptable methods of vote counting (see Section 12.3.5 ). Limitations of the method should be acknowledged ( Table 12.2.a ).

12.2.2 Unacceptable synthesis methods

12.2.2.1 vote counting based on statistical significance.

Conventional forms of vote counting use rules based on statistical significance and direction to categorize effects. For example, effects may be categorized into three groups: those that favour the intervention and are statistically significant (based on some predefined P value), those that favour the comparator and are statistically significant, and those that are statistically non-significant (Hedges and Vevea 1998). In a simpler formulation, effects may be categorized into two groups: those that favour the intervention and are statistically significant, and all others (Friedman 2001). Regardless of the specific formulation, when based on statistical significance, all have serious limitations and can lead to the wrong conclusion.

The conventional vote counting method fails because underpowered studies that do not rule out clinically important effects are counted as not showing benefit. Suppose, for example, the effect sizes estimated in two studies were identical. However, only one of the studies was adequately powered, and the effect in this study was statistically significant. Only this one effect (of the two identical effects) would be counted as showing ‘benefit’. Paradoxically, Hedges and Vevea showed that as the number of studies increases, the power of conventional vote counting tends to zero, except with large studies and at least moderate intervention effects (Hedges and Vevea 1998). Further, conventional vote counting suffers the same disadvantages as vote counting based on direction of effect, namely, that it does not provide information on the magnitude of effects and does not account for differences in the relative sizes of the studies.

12.2.2.2 Vote counting based on subjective rules

Subjective rules, involving a combination of direction, statistical significance and magnitude of effect, are sometimes used to categorize effects. For example, in a review examining the effectiveness of interventions for teaching quality improvement to clinicians, the authors categorized results as ‘beneficial effects’, ‘no effects’ or ‘detrimental effects’ (Boonyasai et al 2007). Categorization was based on direction of effect and statistical significance (using a predefined P value of 0.05) when available. If statistical significance was not reported, effects greater than 10% were categorized as ‘beneficial’ or ‘detrimental’, depending on their direction. These subjective rules often vary in the elements, cut-offs and algorithms used to categorize effects, and while detailed descriptions of the rules may provide a veneer of legitimacy, such rules have poor performance validity (Ioannidis et al 2008).

A further problem occurs when the rules are not described in sufficient detail for the results to be reproduced (e.g. ter Wee et al 2012, Thornicroft et al 2016). This lack of transparency does not allow determination of whether an acceptable or unacceptable vote counting method has been used (Valentine et al 2010).

12.3 Visual display and presentation of the data

Visual display and presentation of data is especially important for transparent reporting in reviews without meta-analysis, and should be considered irrespective of whether synthesis is undertaken (see Table 12.2.a for a summary of plots associated with each synthesis method). Tables and plots structure information to show patterns in the data and convey detailed information more efficiently than text. This aids interpretation and helps readers assess the veracity of the review findings.

12.3.1 Structured tabulation of results across studies

Ordering studies alphabetically by study ID is the simplest approach to tabulation; however, more information can be conveyed when studies are grouped in subpanels or ordered by a characteristic important for interpreting findings. The grouping of studies in tables should generally follow the structure of the synthesis presented in the text, which should closely reflect the review questions. This grouping should help readers identify the data on which findings are based and verify the review authors’ interpretation.

If the purpose of the table is comparative, grouping studies by any of following characteristics might be informative:

  • comparisons considered in the review, or outcome domains (according to the structure of the synthesis);
  • study characteristics that may reveal patterns in the data, for example potential effect modifiers including population subgroups, settings or intervention components.

If the purpose of the table is complete and transparent reporting of data, then ordering the studies to increase the prominence of the most relevant and trustworthy evidence should be considered. Possibilities include:

  • certainty of the evidence (synthesized result or individual studies if no synthesis);
  • risk of bias, study size or study design characteristics; and
  • characteristics that determine how directly a study addresses the review question, for example relevance and validity of the outcome measures.

One disadvantage of grouping by study characteristics is that it can be harder to locate specific studies than when tables are ordered by study ID alone, for example when cross-referencing between the text and tables. Ordering by study ID within categories may partly address this.

The value of standardizing intervention and outcome labels is discussed in Chapter 3, Section 3.2.2 and Section 3.2.4 ), while the importance and methods for standardizing effect estimates is described in Chapter 6 . These practices can aid readers’ interpretation of tabulated data, especially when the purpose of a table is comparative.

12.3.2 Forest plots

Forest plots and methods for preparing them are described elsewhere ( Chapter 10, Section 10.2 ). Some mention is warranted here of their importance for displaying study results when meta-analysis is not undertaken (i.e. without the summary diamond). Forest plots can aid interpretation of individual study results and convey overall patterns in the data, especially when studies are ordered by a characteristic important for interpreting results (e.g. dose and effect size, sample size). Similarly, grouping studies in subpanels based on characteristics thought to modify effects, such as population subgroups, variants of an intervention, or risk of bias, may help explore and explain differences across studies (Schriger et al 2010). These approaches to ordering provide important techniques for informally exploring heterogeneity in reviews without meta-analysis, and should be considered in preference to alphabetical ordering by study ID alone (Schriger et al 2010).

12.3.3 Box-and-whisker plots and bubble plots

Box-and-whisker plots (see Figure 12.4.a , Panel A) provide a visual display of the distribution of effect estimates (Section 12.2.1.1 ). The plot conventionally depicts five values. The upper and lower limits (or ‘hinges’) of the box, represent the 75th and 25th percentiles, respectively. The line within the box represents the 50th percentile (median), and the whiskers represent the extreme values (McGill et al 1978). Multiple box plots can be juxtaposed, providing a visual comparison of the distributions of effect estimates (Schriger et al 2006). For example, in a review examining the effects of audit and feedback on professional practice, the format of the feedback (verbal, written, both verbal and written) was hypothesized to be an effect modifier (Ivers et al 2012). Box-and-whisker plots of the risk differences were presented separately by the format of feedback, to allow visual comparison of the impact of format on the distribution of effects. When presenting multiple box-and-whisker plots, the width of the box can be varied to indicate the number of studies contributing to each. The plot’s common usage facilitates rapid and correct interpretation by readers (Schriger et al 2010). The individual studies contributing to the plot are not identified (as in a forest plot), however, and the plot is not appropriate when there are few studies (Schriger et al 2006).

A bubble plot (see Figure 12.4.a , Panel B) can also be used to provide a visual display of the distribution of effects, and is more suited than the box-and-whisker plot when there are few studies (Schriger et al 2006). The plot is a scatter plot that can display multiple dimensions through the location, size and colour of the bubbles. In a review examining the effects of educational outreach visits on professional practice, a bubble plot was used to examine visually whether the distribution of effects was modified by the targeted behaviour (O’Brien et al 2007). Each bubble represented the effect size (y-axis) and whether the study targeted a prescribing or other behaviour (x-axis). The size of the bubbles reflected the number of study participants. However, different formulations of the bubble plot can display other characteristics of the data (e.g. precision, risk-of-bias assessments).

12.3.4 Albatross plot

The albatross plot (see Figure 12.4.a , Panel C) allows approximate examination of the underlying intervention effect sizes where there is minimal reporting of results within studies (Harrison et al 2017). The plot only requires a two-sided P value, sample size and direction of effect (or equivalently, a one-sided P value and a sample size) for each result. The plot is a scatter plot of the study sample sizes against two-sided P values, where the results are separated by the direction of effect. Superimposed on the plot are ‘effect size contours’ (inspiring the plot’s name). These contours are specific to the type of data (e.g. continuous, binary) and statistical methods used to calculate the P values. The contours allow interpretation of the approximate effect sizes of the studies, which would otherwise not be possible due to the limited reporting of the results. Characteristics of studies (e.g. type of study design) can be identified using different colours or symbols, allowing informal comparison of subgroups.

The plot is likely to be more inclusive of the available studies than meta-analysis, because of its minimal data requirements. However, the plot should complement the results from a statistical synthesis, ideally a meta-analysis of available effects.

12.3.5 Harvest and effect direction plots

Harvest plots (see Figure 12.4.a , Panel D) provide a visual extension of vote counting results (Ogilvie et al 2008). In the plot, studies based on the categorization of their effects (e.g. ‘beneficial effects’, ‘no effects’ or ‘detrimental effects’) are grouped together. Each study is represented by a bar positioned according to its categorization. The bars can be ‘visually weighted’ (by height or width) and annotated to highlight study and outcome characteristics (e.g. risk-of-bias domains, proximal or distal outcomes, study design, sample size) (Ogilvie et al 2008, Crowther et al 2011). Annotation can also be used to identify the studies. A series of plots may be combined in a matrix that displays, for example, the vote counting results from different interventions or outcome domains.

The methods papers describing harvest plots have employed vote counting based on statistical significance (Ogilvie et al 2008, Crowther et al 2011). For the reasons outlined in Section 12.2.2.1 , this can be misleading. However, an acceptable approach would be to display the results based on direction of effect.

The effect direction plot is similar in concept to the harvest plot in the sense that both display information on the direction of effects (Thomson and Thomas 2013). In the first version of the effect direction plot, the direction of effects for each outcome within a single study are displayed, while the second version displays the direction of the effects for outcome domains across studies . In this second version, an algorithm is first applied to ‘synthesize’ the directions of effect for all outcomes within a domain (e.g. outcomes ‘sleep disturbed by wheeze’, ‘wheeze limits speech’, ‘wheeze during exercise’ in the outcome domain ‘respiratory’). This algorithm is based on the proportion of effects that are in a consistent direction and statistical significance. Arrows are used to indicate the reported direction of effect (for either outcomes or outcome domains). Features such as statistical significance, study design and sample size are denoted using size and colour. While this version of the plot conveys a large amount of information, it requires further development before its use can be recommended since the algorithm underlying the plot is likely to have poor performance validity.

12.4 Worked example

The example that follows uses four scenarios to illustrate methods for presentation and synthesis when meta-analysis is not possible. The first scenario contrasts a common approach to tabulation with alternative presentations that may enhance the transparency of reporting and interpretation of findings. Subsequent scenarios show the application of the synthesis approaches outlined in preceding sections of the chapter. Box 12.4.a summarizes the review comparisons and outcomes, and decisions taken by the review authors in planning their synthesis. While the example is loosely based on an actual review, the review description, scenarios and data are fabricated for illustration.

Box 12.4.a The review

12.4.1 Scenario 1: structured reporting of effects

We first address a scenario in which review authors have decided that the tools used to measure satisfaction measured concepts that were too dissimilar across studies for synthesis to be appropriate. Setting aside three of the 15 studies that reported on the birth partner’s satisfaction with care, a structured summary of effects is sought of the remaining 12 studies. To keep the example table short, only one outcome is shown per study for each of the measurement periods (antenatal, intrapartum or postpartum).

Table 12.4.a depicts a common yet suboptimal approach to presenting results. Note two features.

  • Studies are ordered by study ID, rather than grouped by characteristics that might enhance interpretation (e.g. risk of bias, study size, validity of the measures, certainty of the evidence (GRADE)).
  • Data reported are as extracted from each study; effect estimates were not calculated by the review authors and, where reported, were not standardized across studies (although data were available to do both).

Table 12.4.b shows an improved presentation of the same results. In line with best practice, here effect estimates have been calculated by the review authors for all outcomes, and a common metric computed to aid interpretation (in this case an odds ratio; see Chapter 6 for guidance on conversion of statistics to the desired format). Redundant information has been removed (‘statistical test’ and ‘P value’ columns). The studies have been re-ordered, first to group outcomes by period of care (intrapartum outcomes are shown here), and then by risk of bias. This re-ordering serves two purposes. Grouping by period of care aligns with the plan to consider outcomes for each period separately and ensures the table structure matches the order in which results are described in the text. Re-ordering by risk of bias increases the prominence of studies at lowest risk of bias, focusing attention on the results that should most influence conclusions. Had the review authors determined that a synthesis would be informative, then ordering to facilitate comparison across studies would be appropriate; for example, ordering by the type of satisfaction outcome (as pre-defined in the protocol, starting with global measures of satisfaction), or the comparisons made in the studies.

The results may also be presented in a forest plot, as shown in Figure 12.4.b . In both the table and figure, studies are grouped by risk of bias to focus attention on the most trustworthy evidence. The pattern of effects across studies is immediately apparent in Figure 12.4.b and can be described efficiently without having to interpret each estimate (e.g. difference between studies at low and high risk of bias emerge), although these results should be interpreted with caution in the absence of a formal test for subgroup differences (see Chapter 10, Section 10.11 ). Only outcomes measured during the intrapartum period are displayed, although outcomes from other periods could be added, maximizing the information conveyed.

An example description of the results from Scenario 1 is provided in Box 12.4.b . It shows that describing results study by study becomes unwieldy with more than a few studies, highlighting the importance of tables and plots. It also brings into focus the risk of presenting results without any synthesis, since it seems likely that the reader will try to make sense of the results by drawing inferences across studies. Since a synthesis was considered inappropriate, GRADE was applied to individual studies and then used to prioritize the reporting of results, focusing attention on the most relevant and trustworthy evidence. An alternative might be to report results at low risk of bias, an approach analogous to limiting a meta-analysis to studies at low risk of bias. Where possible, these and other approaches to prioritizing (or ordering) results from individual studies in text and tables should be pre-specified at the protocol stage.

Table 12.4.a Scenario 1: table ordered by study ID, data as reported by study authors

* All scales operate in the same direction; higher scores indicate greater satisfaction. CI = confidence interval; MD = mean difference; OR = odds ratio; POR = proportional odds ratio; RD = risk difference; RR = risk ratio.

Table 12.4.b Scenario 1: intrapartum outcome table ordered by risk of bias, standardized effect estimates calculated for all studies

* Outcomes operate in the same direction. A higher score, or an event, indicates greater satisfaction. ** Mean difference calculated for studies reporting continuous outcomes. † For binary outcomes, odds ratios were calculated from the reported summary statistics or were directly extracted from the study. For continuous outcomes, standardized mean differences were calculated and converted to odds ratios (see Chapter 6 ). CI = confidence interval; POR = proportional odds ratio.

Figure 12.4.b Forest plot depicting standardized effect estimates (odds ratios) for satisfaction

systematic review quantitative synthesis

Box 12.4.b How to describe the results from this structured summary

12.4.2 Overview of scenarios 2–4: synthesis approaches

We now address three scenarios in which review authors have decided that the outcomes reported in the 15 studies all broadly reflect satisfaction with care. While the measures were quite diverse, a synthesis is sought to help decision makers understand whether women and their birth partners were generally more satisfied with the care received in midwife-led continuity models compared with other models. The three scenarios differ according to the data available (see Table 12.4.c ), with each reflecting progressively less complete reporting of the effect estimates. The data available determine the synthesis method that can be applied.

  • Scenario 2: effect estimates available without measures of precision (illustrating synthesis of summary statistics).
  • Scenario 3: P values available (illustrating synthesis of P values).
  • Scenario 4: directions of effect available (illustrating synthesis using vote-counting based on direction of effect).

For studies that reported multiple satisfaction outcomes, one result is selected for synthesis using the decision rules in Box 12.4.a (point 2).

Table 12.4.c Scenarios 2, 3 and 4: available data for the selected outcome from each study

* All scales operate in the same direction. Higher scores indicate greater satisfaction. ** For a particular scenario, the ‘available data’ column indicates the data that were directly reported, or were calculated from the reported statistics, in terms of: effect estimate, direction of effect, confidence interval, precise P value, or statement regarding statistical significance (either statistically significant, or not). CI = confidence interval; direction = direction of effect reported or can be calculated; MD = mean difference; NS = not statistically significant; OR = odds ratio; RD = risk difference; RoB = risk of bias; RR = risk ratio; sig. = statistically significant; SMD = standardized mean difference; Stand. = standardized.

12.4.2.1 Scenario 2: summarizing effect estimates

In Scenario 2, effect estimates are available for all outcomes. However, for most studies, a measure of variance is not reported, or cannot be calculated from the available data. We illustrate how the effect estimates may be summarized using descriptive statistics. In this scenario, it is possible to calculate odds ratios for all studies. For the continuous outcomes, this involves first calculating a standardized mean difference, and then converting this to an odds ratio ( Chapter 10, Section 10.6 ). The median odds ratio is 1.32 with an interquartile range of 1.02 to 1.53 (15 studies). Box-and-whisker plots may be used to display these results and examine informally whether the distribution of effects differs by the overall risk-of-bias assessment ( Figure 12.4.a , Panel A). However, because there are relatively few effects, a reasonable alternative would be to present bubble plots ( Figure 12.4.a , Panel B).

An example description of the results from the synthesis is provided in Box 12.4.c .

Box 12.4.c How to describe the results from this synthesis

12.4.2.2 Scenario 3: combining P values

In Scenario 3, there is minimal reporting of the data, and the type of data and statistical methods and tests vary. However, 11 of the 15 studies provide a precise P value and direction of effect, and a further two report a P value less than a threshold (<0.001) and direction. We use this scenario to illustrate a synthesis of P values. Since the reported P values are two-sided ( Table 12.4.c , column 6), they must first be converted to one-sided P values, which incorporate the direction of effect ( Table 12.4.c , column 7).

Fisher’s method for combining P values involved calculating the following statistic:

systematic review quantitative synthesis

The combination of P values suggests there is strong evidence of benefit of midwife-led models of care in at least one study (P < 0.001 from a Chi 2 test, 13 studies). Restricting this analysis to those studies judged to be at an overall low risk of bias (sensitivity analysis), there is no longer evidence to reject the null hypothesis of no benefit of midwife-led model of care in any studies (P = 0.314, 3 studies). For the five studies reporting continuous satisfaction outcomes, sufficient data (precise P value, direction, total sample size) are reported to construct an albatross plot ( Figure 12.4.a , Panel C). The location of the points relative to the standardized mean difference contours indicate that the likely effects of the intervention in these studies are small.

An example description of the results from the synthesis is provided in Box 12.4.d .

Box 12.4.d How to describe the results from this synthesis

12.4.2.3 Scenario 4: vote counting based on direction of effect

In Scenario 4, there is minimal reporting of the data, and the type of effect measure (when used) varies across the studies (e.g. mean difference, proportional odds ratio). Of the 15 results, only five report data suitable for meta-analysis (effect estimate and measure of precision; Table 12.4.c , column 8), and no studies reported precise P values. We use this scenario to illustrate vote counting based on direction of effect. For each study, the effect is categorized as beneficial or harmful based on the direction of effect (indicated as a binary metric; Table 12.4.c , column 9).

Of the 15 studies, we exclude three because they do not provide information on the direction of effect, leaving 12 studies to contribute to the synthesis. Of these 12, 10 effects favour midwife-led models of care (83%). The probability of observing this result if midwife-led models of care are truly ineffective is 0.039 (from a binomial probability test, or equivalently, the sign test). The 95% confidence interval for the percentage of effects favouring midwife-led care is wide (55% to 95%).

The binomial test can be implemented using standard computer spreadsheet or statistical packages. For example, the two-sided P value from the binomial probability test presented can be obtained from Microsoft Excel by typing =2*BINOM.DIST(2, 12, 0.5, TRUE) into any cell in the spreadsheet. The syntax requires the smaller of the ‘number of effects favouring the intervention’ or ‘the number of effects favouring the control’ (here, the smaller of these counts is 2), the number of effects (here 12), and the null value (true proportion of effects favouring the intervention = 0.5). In Stata, the bitest command could be used (e.g. bitesti 12 10 0.5 ).

A harvest plot can be used to display the results ( Figure 12.4.a , Panel D), with characteristics of the studies represented using different heights and shading. A sensitivity analysis might be considered, restricting the analysis to those studies judged to be at an overall low risk of bias. However, only four studies were judged to be at a low risk of bias (of which, three favoured midwife-led models of care), precluding reasonable interpretation of the count.

An example description of the results from the synthesis is provided in Box 12.4.e .

Box 12.4.e How to describe the results from this synthesis

Figure 12.4.a Possible graphical displays of different types of data. (A) Box-and-whisker plots of odds ratios for all outcomes and separately by overall risk of bias. (B) Bubble plot of odds ratios for all outcomes and separately by the model of care. The colours of the bubbles represent the overall risk of bias judgement (green = low risk of bias; yellow = some concerns; red = high risk of bias). (C) Albatross plot of the study sample size against P values (for the five continuous outcomes in Table 12.4.c , column 6). The effect contours represent standardized mean differences. (D) Harvest plot (height depicts overall risk of bias judgement (tall = low risk of bias; medium = some concerns; short = high risk of bias), shading depicts model of care (light grey = caseload; dark grey = team), alphabet characters represent the studies)

12.5 Chapter information

Authors: Joanne E McKenzie, Sue E Brennan

Acknowledgements: Sections of this chapter build on chapter 9 of version 5.1 of the Handbook , with editors Jonathan J Deeks, Julian PT Higgins and Douglas G Altman.

We are grateful to the following for commenting helpfully on earlier drafts: Miranda Cumpston, Jamie Hartmann-Boyce, Tianjing Li, Rebecca Ryan and Hilary Thomson.

Funding: JEM is supported by an Australian National Health and Medical Research Council (NHMRC) Career Development Fellowship (1143429). SEB’s position is supported by the NHMRC Cochrane Collaboration Funding Program.

12.6 References

Achana F, Hubbard S, Sutton A, Kendrick D, Cooper N. An exploration of synthesis methods in public health evaluations of interventions concludes that the use of modern statistical methods would be beneficial. Journal of Clinical Epidemiology 2014; 67 : 376–390.

Becker BJ. Combining significance levels. In: Cooper H, Hedges LV, editors. A handbook of research synthesis . New York (NY): Russell Sage; 1994. p. 215–235.

Boonyasai RT, Windish DM, Chakraborti C, Feldman LS, Rubin HR, Bass EB. Effectiveness of teaching quality improvement to clinicians: a systematic review. JAMA 2007; 298 : 1023–1037.

Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. Meta-Analysis methods based on direction and p-values. Introduction to Meta-Analysis . Chichester (UK): John Wiley & Sons, Ltd; 2009. pp. 325–330.

Brown LD, Cai TT, DasGupta A. Interval estimation for a binomial proportion. Statistical Science 2001; 16 : 101–117.

Bushman BJ, Wang MC. Vote-counting procedures in meta-analysis. In: Cooper H, Hedges LV, Valentine JC, editors. Handbook of Research Synthesis and Meta-Analysis . 2nd ed. New York (NY): Russell Sage Foundation; 2009. p. 207–220.

Crowther M, Avenell A, MacLennan G, Mowatt G. A further use for the Harvest plot: a novel method for the presentation of data synthesis. Research Synthesis Methods 2011; 2 : 79–83.

Friedman L. Why vote-count reviews don’t count. Biological Psychiatry 2001; 49 : 161–162.

Grimshaw J, McAuley LM, Bero LA, Grilli R, Oxman AD, Ramsay C, Vale L, Zwarenstein M. Systematic reviews of the effectiveness of quality improvement strategies and programmes. Quality and Safety in Health Care 2003; 12 : 298–303.

Harrison S, Jones HE, Martin RM, Lewis SJ, Higgins JPT. The albatross plot: a novel graphical tool for presenting results of diversely reported studies in a systematic review. Research Synthesis Methods 2017; 8 : 281–289.

Hedges L, Vevea J. Fixed- and random-effects models in meta-analysis. Psychological Methods 1998; 3 : 486–504.

Ioannidis JP, Patsopoulos NA, Rothstein HR. Reasons or excuses for avoiding meta-analysis in forest plots. BMJ 2008; 336 : 1413–1415.

Ivers N, Jamtvedt G, Flottorp S, Young JM, Odgaard-Jensen J, French SD, O’Brien MA, Johansen M, Grimshaw J, Oxman AD. Audit and feedback: effects on professional practice and healthcare outcomes. Cochrane Database of Systematic Reviews 2012; 6 : CD000259.

Jones DR. Meta-analysis: weighing the evidence. Statistics in Medicine 1995; 14 : 137–149.

Loughin TM. A systematic comparison of methods for combining p-values from independent tests. Computational Statistics & Data Analysis 2004; 47 : 467–485.

McGill R, Tukey JW, Larsen WA. Variations of box plots. The American Statistician 1978; 32 : 12–16.

McKenzie JE, Brennan SE. Complex reviews: methods and considerations for summarising and synthesising results in systematic reviews with complexity. Report to the Australian National Health and Medical Research Council. 2014.

O’Brien MA, Rogers S, Jamtvedt G, Oxman AD, Odgaard-Jensen J, Kristoffersen DT, Forsetlund L, Bainbridge D, Freemantle N, Davis DA, Haynes RB, Harvey EL. Educational outreach visits: effects on professional practice and health care outcomes. Cochrane Database of Systematic Reviews 2007; 4 : CD000409.

Ogilvie D, Fayter D, Petticrew M, Sowden A, Thomas S, Whitehead M, Worthy G. The harvest plot: a method for synthesising evidence about the differential effects of interventions. BMC Medical Research Methodology 2008; 8 : 8.

Riley RD, Higgins JP, Deeks JJ. Interpretation of random effects meta-analyses. BMJ 2011; 342 : d549.

Schriger DL, Sinha R, Schroter S, Liu PY, Altman DG. From submission to publication: a retrospective review of the tables and figures in a cohort of randomized controlled trials submitted to the British Medical Journal. Annals of Emergency Medicine 2006; 48 : 750–756, 756 e751–721.

Schriger DL, Altman DG, Vetter JA, Heafner T, Moher D. Forest plots in reports of systematic reviews: a cross-sectional study reviewing current practice. International Journal of Epidemiology 2010; 39 : 421–429.

ter Wee MM, Lems WF, Usan H, Gulpen A, Boonen A. The effect of biological agents on work participation in rheumatoid arthritis patients: a systematic review. Annals of the Rheumatic Diseases 2012; 71 : 161–171.

Thomson HJ, Thomas S. The effect direction plot: visual display of non-standardised effects across multiple outcome domains. Research Synthesis Methods 2013; 4 : 95–101.

Thornicroft G, Mehta N, Clement S, Evans-Lacko S, Doherty M, Rose D, Koschorke M, Shidhaye R, O’Reilly C, Henderson C. Evidence for effective interventions to reduce mental-health-related stigma and discrimination. Lancet 2016; 387 : 1123–1132.

Valentine JC, Pigott TD, Rothstein HR. How many studies do you need?: a primer on statistical power for meta-analysis. Journal of Educational and Behavioral Statistics 2010; 35 : 215–247.

For permission to re-use material from the Handbook (either academic or commercial), please see here for full details.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • AIMS Public Health
  • v.3(1); 2016

Logo of aimsph

What Synthesis Methodology Should I Use? A Review and Analysis of Approaches to Research Synthesis

Kara schick-makaroff.

1 Faculty of Nursing, University of Alberta, Edmonton, AB, Canada

Marjorie MacDonald

2 School of Nursing, University of Victoria, Victoria, BC, Canada

Marilyn Plummer

3 College of Nursing, Camosun College, Victoria, BC, Canada

Judy Burgess

4 Student Services, University Health Services, Victoria, BC, Canada

Wendy Neander

Associated data, additional file 1.

When we began this process, we were doctoral students and a faculty member in a research methods course. As students, we were facing a review of the literature for our dissertations. We encountered several different ways of conducting a review but were unable to locate any resources that synthesized all of the various synthesis methodologies. Our purpose is to present a comprehensive overview and assessment of the main approaches to research synthesis. We use ‘research synthesis’ as a broad overarching term to describe various approaches to combining, integrating, and synthesizing research findings.

We conducted an integrative review of the literature to explore the historical, contextual, and evolving nature of research synthesis. We searched five databases, reviewed websites of key organizations, hand-searched several journals, and examined relevant texts from the reference lists of the documents we had already obtained.

We identified four broad categories of research synthesis methodology including conventional, quantitative, qualitative, and emerging syntheses. Each of the broad categories was compared to the others on the following: key characteristics, purpose, method, product, context, underlying assumptions, unit of analysis, strengths and limitations, and when to use each approach.

Conclusions

The current state of research synthesis reflects significant advancements in emerging synthesis studies that integrate diverse data types and sources. New approaches to research synthesis provide a much broader range of review alternatives available to health and social science students and researchers.

1. Introduction

Since the turn of the century, public health emergencies have been identified worldwide, particularly related to infectious diseases. For example, the Severe Acute Respiratory Syndrome (SARS) epidemic in Canada in 2002-2003, the recent Ebola epidemic in Africa, and the ongoing HIV/AIDs pandemic are global health concerns. There have also been dramatic increases in the prevalence of chronic diseases around the world [1] – [3] . These epidemiological challenges have raised concerns about the ability of health systems worldwide to address these crises. As a result, public health systems reform has been initiated in a number of countries. In Canada, as in other countries, the role of evidence to support public health reform and improve population health has been given high priority. Yet, there continues to be a significant gap between the production of evidence through research and its application in practice [4] – [5] . One strategy to address this gap has been the development of new research synthesis methodologies to deal with the time-sensitive and wide ranging evidence needs of policy makers and practitioners in all areas of health care, including public health.

As doctoral nursing students facing a review of the literature for our dissertations, and as a faculty member teaching a research methods course, we encountered several ways of conducting a research synthesis but found no comprehensive resources that discussed, compared, and contrasted various synthesis methodologies on their purposes, processes, strengths and limitations. To complicate matters, writers use terms interchangeably or use different terms to mean the same thing, and the literature is often contradictory about various approaches. Some texts [6] , [7] – [9] did provide a preliminary understanding about how research synthesis had been taken up in nursing, but these did not meet our requirements. Thus, in this article we address the need for a comprehensive overview of research synthesis methodologies to guide public health, health care, and social science researchers and practitioners.

Research synthesis is relatively new in public health but has a long history in other fields dating back to the late 1800s. Research synthesis, a research process in its own right [10] , has become more prominent in the wake of the evidence-based movement of the 1990s. Research syntheses have found their advocates and detractors in all disciplines, with challenges to the processes of systematic review and meta-analysis, in particular, being raised by critics of evidence-based healthcare [11] – [13] .

Our purpose was to conduct an integrative review of the literature to explore the historical, contextual, and evolving nature of research synthesis [14] – [15] . We synthesize and critique the main approaches to research synthesis that are relevant for public health, health care, and social scientists. Research synthesis is the overarching term we use to describe approaches to combining, aggregating, integrating, and synthesizing primary research findings. Each synthesis methodology draws on different types of findings depending on the purpose and product of the chosen synthesis (see Additional File 1 ).

3. Method of Review

Based on our current knowledge of the literature, we identified these approaches to include in our review: systematic review, meta-analysis, qualitative meta-synthesis, meta-narrative synthesis, scoping review, rapid review, realist synthesis, concept analysis, literature review, and integrative review. Our first step was to divide the synthesis types among the research team. Each member did a preliminary search to identify key texts. The team then met to develop search terms and a framework to guide the review.

Over the period of 2008 to 2012 we extensively searched the literature, updating our search at several time points, not restricting our search by date. The dates of texts reviewed range from 1967 to 2015. We used the terms above combined with the term “method* (e.g., “realist synthesis” and “method*) in the database Health Source: Academic Edition (includes Medline and CINAHL). This search yielded very few texts on some methodologies and many on others. We realized that many documents on research synthesis had not been picked up in the search. Therefore, we also searched Google Scholar, PubMed, ERIC, and Social Science Index, as well as the websites of key organizations such as the Joanna Briggs Institute, the University of York Centre for Evidence-Based Nursing, and the Cochrane Collaboration database. We hand searched several nursing, social science, public health and health policy journals. Finally, we traced relevant documents from the references in obtained texts.

We included works that met the following inclusion criteria: (1) published in English; (2) discussed the history of research synthesis; (3) explicitly described the approach and specific methods; or (4) identified issues, challenges, strengths and limitations of the particular methodology. We excluded research reports that resulted from the use of particular synthesis methodologies unless they also included criteria 2, 3, or 4 above.

Based on our search, we identified additional types of research synthesis (e.g., meta-interpretation, best evidence synthesis, critical interpretive synthesis, meta-summary, grounded formal theory). Still, we missed some important developments in meta-analysis, for example, identified by the journal's reviewers that have now been discussed briefly in the paper. The final set of 197 texts included in our review comprised theoretical, empirical, and conceptual papers, books, editorials and commentaries, and policy documents.

In our preliminary review of key texts, the team inductively developed a framework of the important elements of each method for comparison. In the next phase, each text was read carefully, and data for these elements were extracted into a table for comparison on the points of: key characteristics, purpose, methods, and product; see Additional File 1 ). Once the data were grouped and extracted, we synthesized across categories based on the following additional points of comparison: complexity of the process, degree of systematization, consideration of context, underlying assumptions, unit of analysis, and when to use each approach. In our results, we discuss our comparison of the various synthesis approaches on the elements above. Drawing only on documents for the review, ethics approval was not required.

We identified four broad categories of research synthesis methodology: Conventional, quantitative, qualitative, and emerging syntheses. From our dataset of 197 texts, we had 14 texts on conventional synthesis, 64 on quantitative synthesis, 78 on qualitative synthesis, and 41 on emerging syntheses. Table 1 provides an overview of the four types of research synthesis, definitions, types of data used, products, and examples of the methodology.

Although we group these types of synthesis into four broad categories on the basis of similarities, each type within a category has unique characteristics, which may differ from the overall group similarities. Each could be explored in greater depth to tease out their unique characteristics, but detailed comparison is beyond the scope of this article.

Additional File 1 presents one or more selected types of synthesis that represent the broad category but is not an exhaustive presentation of all types within each category. It provides more depth for specific examples from each category of synthesis on the characteristics, purpose, methods, and products than is found in Table 1 .

4.1. Key Characteristics

4.1.1. what is it.

Here we draw on two types of categorization. First, we utilize Dixon Woods et al.'s [49] classification of research syntheses as being either integrative or interpretive . (Please note that integrative syntheses are not the same as an integrative review as defined in Additional File 1 .) Second, we use Popay's [80] enhancement and epistemological models .

The defining characteristics of integrative syntheses are that they involve summarizing the data achieved by pooling data [49] . Integrative syntheses include systematic reviews, meta-analyses, as well as scoping and rapid reviews because each of these focus on summarizing data. They also define concepts from the outset (although this may not always be true in scoping or rapid reviews) and deal with a well-specified phenomenon of interest.

Interpretive syntheses are primarily concerned with the development of concepts and theories that integrate concepts [49] . The analysis in interpretive synthesis is conceptual both in process and outcome, and “the product is not aggregations of data, but theory” [49] , [p.12]. Interpretive syntheses involve induction and interpretation, and are primarily conceptual in process and outcome. Examples include integrative reviews, some systematic reviews, all of the qualitative syntheses, meta-narrative, realist and critical interpretive syntheses. Of note, both quantitative and qualitative studies can be either integrative or interpretive

The second categorization, enhancement versus epistemological , applies to those approaches that use multiple data types and sources [80] . Popay's [80] classification reflects the ways that qualitative data are valued in relation to quantitative data.

In the enhancement model , qualitative data adds something to quantitative analysis. The enhancement model is reflected in systematic reviews and meta-analyses that use some qualitative data to enhance interpretation and explanation. It may also be reflected in some rapid reviews that draw on quantitative data but use some qualitative data.

The epistemological model assumes that quantitative and qualitative data are equal and each has something unique to contribute. All of the other review approaches, except pure quantitative or qualitative syntheses, reflect the epistemological model because they value all data types equally but see them as contributing different understandings.

4.1.2. Data type

By and large, the quantitative approaches (quantitative systematic review and meta-analysis) have typically used purely quantitative data (i.e., expressed in numeric form). More recently, both Cochrane [81] and Campbell [82] collaborations are grappling with the need to, and the process of, integrating qualitative research into a systematic review. The qualitative approaches use qualitative data (i.e., expressed in words). All of the emerging synthesis types, as well as the conventional integrative review, incorporate qualitative and quantitative study designs and data.

4.1.3. Research question

Four types of research questions direct inquiry across the different types of syntheses. The first is a well-developed research question that gives direction to the synthesis (e.g., meta-analysis, systematic review, meta-study, concept analysis, rapid review, realist synthesis). The second begins as a broad general question that evolves and becomes more refined over the course of the synthesis (e.g., meta-ethnography, scoping review, meta-narrative, critical interpretive synthesis). In the third type, the synthesis begins with a phenomenon of interest and the question emerges in the analytic process (e.g., grounded formal theory). Lastly, there is no clear question, but rather a general review purpose (e.g., integrative review). Thus, the requirement for a well-defined question cuts across at least three of the synthesis types (e.g., quantitative, qualitative, and emerging).

4.1.4. Quality appraisal

This is a contested issue within and between the four synthesis categories. There are strong proponents of quality appraisal in the quantitative traditions of systematic review and meta-analysis based on the need for strong studies that will not jeopardize validity of the overall findings. Nonetheless, there is no consensus on pre-defined criteria; many scales exist that vary dramatically in composition. This has methodological implications for the credibility of findings [83] .

Specific methodologies from the conventional, qualitative, and emerging categories support quality appraisal but do so with caveats. In conventional integrative reviews appraisal is recommended, but depends on the sampling frame used in the study [18] . In meta-study, appraisal criteria are explicit but quality criteria are used in different ways depending on the specific requirements of the inquiry [54] . Among the emerging syntheses, meta-narrative review developers support appraisal of a study based on criteria from the research tradition of the primary study [67] , [84] – [85] . Realist synthesis similarly supports the use of high quality evidence, but appraisal checklists are viewed with scepticism and evidence is judged based on relevance to the research question and whether a credible inference may be drawn [69] . Like realist, critical interpretive syntheses do not judge quality using standardized appraisal instruments. They will exclude fatally flawed studies, but there is no consensus on what ‘fatally flawed’ means [49] , [71] . Appraisal is based on relevance to the inquiry, not rigor of the study.

There is no agreement on quality appraisal among qualitative meta-ethnographers with some supporting and others refuting the need for appraisal. [60] , [62] . Opponents of quality appraisal are found among authors of qualitative (grounded formal theory and concept analysis) and emerging syntheses (scoping and rapid reviews) because quality is not deemed relevant to the intention of the synthesis; the studies being reviewed are not effectiveness studies where quality is extremely important. These qualitative synthesis are often reviews of theoretical developments where the concept itself is what is important, or reviews that provide quotations from the raw data so readers can make their own judgements about the relevance and utility of the data. For example, in formal grounded theory, the purpose of theory generation and authenticity of data used to generate the theory is not as important as the conceptual category. Inaccuracies may be corrected in other ways, such as using the constant comparative method, which facilitates development of theoretical concepts that are repeatedly found in the data [86] – [87] . For pragmatic reasons, evidence is not assessed in rapid and scoping reviews, in part to produce a timely product. The issue of quality appraisal is unresolved across the terrain of research synthesis and we consider this further in our discussion.

4.2. Purpose

All research syntheses share a common purpose -- to summarize, synthesize, or integrate research findings from diverse studies. This helps readers stay abreast of the burgeoning literature in a field. Our discussion here is at the level of the four categories of synthesis. Beginning with conventional literature syntheses, the overall purpose is to attend to mature topics for the purpose of re-conceptualization or to new topics requiring preliminary conceptualization [14] . Such syntheses may be helpful to consider contradictory evidence, map shifting trends in the study of a phenomenon, and describe the emergence of research in diverse fields [14] . The purpose here is to set the stage for a study by identifying what has been done, gaps in the literature, important research questions, or to develop a conceptual framework to guide data collection and analysis.

The purpose of quantitative systematic reviews is to combine, aggregate, or integrate empirical research to be able to generalize from a group of studies and determine the limits of generalization [27] . The focus of quantitative systematic reviews has been primarily on aggregating the results of studies evaluating the effectiveness of interventions using experimental, quasi-experimental, and more recently, observational designs. Systematic reviews can be done with or without quantitative meta-analysis but a meta-analysis always takes place within the context of a systematic review. Researchers must consider the review's purpose and the nature of their data in undertaking a quantitative synthesis; this will assist in determining the approach.

The purpose of qualitative syntheses is broadly to synthesize complex health experiences, practices, or concepts arising in healthcare environments. There may be various purposes depending on the qualitative methodology. For example, in hermeneutic studies the aim may be holistic explanation or understanding of a phenomenon [42] , which is deepened by integrating the findings from multiple studies. In grounded formal theory, the aim is to produce a conceptual framework or theory expected to be applicable beyond the original study. Although not able to generalize from qualitative research in the statistical sense [88] , qualitative researchers usually do want to say something about the applicability of their synthesis to other settings or phenomena. This notion of ‘theoretical generalization’ has been referred to as ‘transferability’ [89] – [90] and is an important criterion of rigour in qualitative research. It applies equally to the products of a qualitative synthesis in which the synthesis of multiple studies on the same phenomenon strengthens the ability to draw transferable conclusions.

The overarching purpose of emerging syntheses is challenging the more traditional types of syntheses, in part by using data from both quantitative and qualitative studies with diverse designs for analysis. Beyond this, however, each emerging synthesis methodology has a unique purpose. In meta-narrative review, the purpose is to identify different research traditions in the area, synthesize a complex and diverse body of research. Critical interpretive synthesis shares this characteristic. Although a distinctive approach, critical interpretive synthesis utilizes a modification of the analytic strategies of meta-ethnography [61] (e.g., reciprocal translational analysis, refutational synthesis, and lines of argument synthesis) but goes beyond the use of these to bring a critical perspective to bear in challenging the normative or epistemological assumptions in the primary literature [72] – [73] . The unique purpose of a realist synthesis is to amalgamate complex empirical evidence and theoretical understandings within a diverse body of literature to uncover the operative mechanisms and contexts that affect the outcomes of social interventions. In a scoping review, the intention is to find key concepts, examine the range of research in an area, and identify gaps in the literature. The purpose of a rapid review is comparable to that of a scoping review, but done quickly to meet the time-sensitive information needs of policy makers.

4.3. Method

4.3.1. degree of systematization.

There are varying degrees of systematization across the categories of research synthesis. The most systematized are quantitative systematic reviews and meta-analyses. There are clear processes in each with judgments to be made at each step, although there are no agreed upon guidelines for this. The process is inherently subjective despite attempts to develop objective and systematic processes [91] – [92] . Mullen and Ramirez [27] suggest that there is often a false sense of rigour implied by the terms ‘systematic review’ and ‘meta-analysis’ because of their clearly defined procedures.

In comparison with some types of qualitative synthesis, concept analysis is quite procedural. Qualitative meta-synthesis also has defined procedures and is systematic, yet perhaps less so than concept analysis. Qualitative meta-synthesis starts in an unsystematic way but becomes more systematic as it unfolds. Procedures and frameworks exist for some of the emerging types of synthesis [e.g., [50] , [63] , [71] , [93] ] but are not linear, have considerable flexibility, and are often messy with emergent processes [85] . Conventional literature reviews tend not to be as systematic as the other three types. In fact, the lack of systematization in conventional literature synthesis was the reason for the development of more systematic quantitative [17] , [20] and qualitative [45] – [46] , [61] approaches. Some authors in the field [18] have clarified processes for integrative reviews making them more systematic and rigorous, but most conventional syntheses remain relatively unsystematic in comparison with other types.

4.3.2. Complexity of the process

Some synthesis processes are considerably more complex than others. Methodologies with clearly defined steps are arguably less complex than the more flexible and emergent ones. We know that any study encounters challenges and it is rare that a pre-determined research protocol can be followed exactly as intended. Not even the rigorous methods associated with Cochrane [81] systematic reviews and meta-analyses are always implemented exactly as intended. Even when dealing with numbers rather than words, interpretation is always part of the process. Our collective experience suggests that new methodologies (e.g., meta-narrative synthesis and realist synthesis) that integrate different data types and methods are more complex than conventional reviews or the rapid and scoping reviews.

4.4. Product

The products of research syntheses usually take three distinct formats (see Table 1 and Additional File 1 for further details). The first representation is in tables, charts, graphical displays, diagrams and maps as seen in integrative, scoping and rapid reviews, meta-analyses, and critical interpretive syntheses. The second type of synthesis product is the use of mathematical scores. Summary statements of effectiveness are mathematically displayed in meta-analyses (as an effect size), systematic reviews, and rapid reviews (statistical significance).

The third synthesis product may be a theory or theoretical framework. A mid-range theory can be produced from formal grounded theory, meta-study, meta-ethnography, and realist synthesis. Theoretical/conceptual frameworks or conceptual maps may be created in meta-narrative and critical interpretive syntheses, and integrative reviews. Concepts for use within theories are produced in concept analysis. While these three product types span the categories of research synthesis, narrative description and summary is used to present the products resulting from all methodologies.

4.5. Consideration of context

There are diverse ways that context is considered in the four broad categories of synthesis. Context may be considered to the extent that it features within primary studies for the purpose of the review. Context may also be understood as an integral aspect of both the phenomenon under study and the synthesis methodology (e.g., realist synthesis). Quantitative systematic reviews and meta-analyses have typically been conducted on studies using experimental and quasi-experimental designs and more recently observational studies, which control for contextual features to allow for understanding of the ‘true’ effect of the intervention [94] .

More recently, systematic reviews have included covariates or mediating variables (i.e., contextual factors) to help explain variability in the results across studies [27] . Context, however, is usually handled in the narrative discussion of findings rather than in the synthesis itself. This lack of attention to context has been one criticism leveled against systematic reviews and meta-analyses, which restrict the types of research designs that are considered [e.g., [95] ].

When conventional literature reviews incorporate studies that deal with context, there is a place for considering contextual influences on the intervention or phenomenon. Reviews of quantitative experimental studies tend to be devoid of contextual considerations since the original studies are similarly devoid, but context might figure prominently in a literature review that incorporates both quantitative and qualitative studies.

Qualitative syntheses have been conducted on the contextual features of a particular phenomenon [33] . Paterson et al. [54] advise researchers to attend to how context may have influenced the findings of particular primary studies. In qualitative analysis, contextual features may form categories by which the data can be compared and contrasted to facilitate interpretation. Because qualitative research is often conducted to understand a phenomenon as a whole, context may be a focus, although this varies with the qualitative methodology. At the same time, the findings in a qualitative synthesis are abstracted from the original reports and taken to a higher level of conceptualization, thus removing them from the original context.

Meta-narrative synthesis [67] , [84] , because it draws on diverse research traditions and methodologies, may incorporate context into the analysis and findings. There is not, however, an explicit step in the process that directs the analyst to consider context. Generally, the research question guiding the synthesis is an important factor in whether context will be a focus.

More recent iterations of concept analysis [47] , [96] – [97] explicitly consider context reflecting the assumption that a concept's meaning is determined by its context. Morse [47] points out, however, that Wilson's [98] approach to concept analysis, and those based on Wilson [e.g., [45] ], identify attributes that are devoid of context, while Rodgers' [96] , [99] evolutionary method considers context (e.g., antecedents, consequences, and relationships to other concepts) in concept development.

Realist synthesis [69] considers context as integral to the study. It draws on a critical realist logic of inquiry grounded in the work of Bhaskar [100] , who argues that empirical co-occurrence of events is insufficient for inferring causation. One must identify generative mechanisms whose properties are causal and, depending on the situation, may nor may not be activated [94] . Context interacts with program/intervention elements and thus cannot be differentiated from the phenomenon [69] . This approach synthesizes evidence on generative mechanisms and analyzes contextual features that activate them; the result feeds back into the context. The focus is on what works, for whom, under what conditions, why and how [68] .

4.6. Underlying Philosophical and Theoretical Assumptions

When we began our review, we ‘assumed’ that the assumptions underlying synthesis methodologies would be a distinguishing characteristic of synthesis types, and that we could compare the various types on their assumptions, explicit or implicit. We found, however, that many authors did not explicate the underlying assumptions of their methodologies, and it was difficult to infer them. Kirkevold [101] has argued that integrative reviews need to be carried out from an explicit philosophical or theoretical perspective. We argue this should be true for all types of synthesis.

Authors of some emerging synthesis approaches have been very explicit about their assumptions and philosophical underpinnings. An implicit assumption of most emerging synthesis methodologies is that quantitative systematic reviews and meta-analyses have limited utility in some fields [e.g., in public health – [13] , [102] ] and for some kinds of review questions like those about feasibility and appropriateness versus effectiveness [103] – [104] . They also assume that ontologically and epistemologically, both kinds of data can be combined. This is a significant debate in the literature because it is about the commensurability of overarching paradigms [105] but this is beyond the scope of this review.

Realist synthesis is philosophically grounded in critical realism or, as noted above, a realist logic of inquiry [93] , [99] , [106] – [107] . Key assumptions regarding the nature of interventions that inform critical realism have been described above in the section on context. See Pawson et al. [106] for more information on critical realism, the philosophical basis of realist synthesis.

Meta-narrative synthesis is explicitly rooted in a constructivist philosophy of science [108] in which knowledge is socially constructed rather than discovered, and what we take to be ‘truth’ is a matter of perspective. Reality has a pluralistic and plastic character, and there is no pre-existing ‘real world’ independent of human construction and language [109] . See Greenhalgh et al. [67] , [85] and Greenhalgh & Wong [97] for more discussion of the constructivist basis of meta-narrative synthesis.

In the case of purely quantitative or qualitative syntheses, it may be an easier matter to uncover unstated assumptions because they are likely to be shared with those of the primary studies in the genre. For example, grounded formal theory shares the philosophical and theoretical underpinnings of grounded theory, rooted in the theoretical perspective of symbolic interactionism [110] – [111] and the philosophy of pragmatism [87] , [112] – [114] .

As with meta-narrative synthesis, meta-study developers identify constructivism as their interpretive philosophical foundation [54] , [88] . Epistemologically, constructivism focuses on how people construct and re-construct knowledge about a specific phenomenon, and has three main assumptions: (1) reality is seen as multiple, at times even incompatible with the phenomenon under consideration; (2) just as primary researchers construct interpretations from participants' data, meta-study researchers also construct understandings about the primary researchers' original findings. Thus, meta-synthesis is a construction of a construction, or a meta-construction; and (3) all constructions are shaped by the historical, social and ideological context in which they originated [54] . The key message here is that reports of any synthesis would benefit from an explicit identification of the underlying philosophical perspectives to facilitate a better understanding of the results, how they were derived, and how they are being interpreted.

4.7. Unit of Analysis

The unit of analysis for each category of review is generally distinct. For the emerging synthesis approaches, the unit of analysis is specific to the intention. In meta-narrative synthesis it is the storyline in diverse research traditions; in rapid review or scoping review, it depends on the focus but could be a concept; and in realist synthesis, it is the theories rather than programs that are the units of analysis. The elements of theory that are important in the analysis are mechanisms of action, the context, and the outcome [107] .

For qualitative synthesis, the units of analysis are generally themes, concepts or theories, although in meta-study, the units of analysis can be research findings (“meta-data-analysis”), research methods (“meta-method”) or philosophical/theoretical perspectives (“meta-theory”) [54] . In quantitative synthesis, the units of analysis range from specific statistics for systematic reviews to effect size of the intervention for meta-analysis. More recently, some systematic reviews focus on theories [115] – [116] , therefore it depends on the research question. Similarly, within conventional literature synthesis the units of analysis also depend on the research purpose, focus and question as well as on the type of research methods incorporated into the review. What is important in all research syntheses, however, is that the unit of analysis needs to be made explicit. Unfortunately, this is not always the case.

4.8. Strengths and Limitations

In this section, we discuss the overarching strengths and limitations of synthesis methodologies as a whole and then highlight strengths and weaknesses across each of our four categories of synthesis.

4.8.1. Strengths of Research Syntheses in General

With the vast proliferation of research reports and the increased ease of retrieval, research synthesis has become more accessible providing a way of looking broadly at the current state of research. The availability of syntheses helps researchers, practitioners, and policy makers keep up with the burgeoning literature in their fields without which evidence-informed policy or practice would be difficult. Syntheses explain variation and difference in the data helping us identify the relevance for our own situations; they identify gaps in the literature leading to new research questions and study designs. They help us to know when to replicate a study and when to avoid excessively duplicating research. Syntheses can inform policy and practice in a way that well-designed single studies cannot; they provide building blocks for theory that helps us to understand and explain our phenomena of interest.

4.8.2. Limitations of Research Syntheses in General

The process of selecting, combining, integrating, and synthesizing across diverse study designs and data types can be complex and potentially rife with bias, even with those methodologies that have clearly defined steps. Just because a rigorous and standardized approach has been used does not mean that implicit judgements will not influence the interpretations and choices made at different stages.

In all types of synthesis, the quantity of data can be considerable, requiring difficult decisions about scope, which may affect relevance. The quantity of available data also has implications for the size of the research team. Few reviews these days can be done independently, in particular because decisions about inclusion and exclusion may require the involvement of more than one person to ensure reliability.

For all types of synthesis, it is likely that in areas with large, amorphous, and diverse bodies of literature, even the most sophisticated search strategies will not turn up all the relevant and important texts. This may be more important in some synthesis methodologies than in others, but the omission of key documents can influence the results of all syntheses. This issue can be addressed, at least in part, by including a library scientist on the research team as required by some funding agencies. Even then, it is possible to miss key texts. In this review, for example, because none of us are trained in or conduct meta-analyses, we were not even aware that we had missed some new developments in this field such as meta-regression [117] – [118] , network meta-analysis [119] – [121] , and the use of individual patient data in meta-analyses [122] – [123] .

One limitation of systematic reviews and meta-analyses is that they rapidly go out of date. We thought this might be true for all types of synthesis, although we wondered if those that produce theory might not be somewhat more enduring. We have not answered this question but it is open for debate. For all types of synthesis, the analytic skills and the time required are considerable so it is clear that training is important before embarking on a review, and some types of review may not be appropriate for students or busy practitioners.

Finally, the quality of reporting in primary studies of all genres is variable so it is sometimes difficult to identify aspects of the study essential for the synthesis, or to determine whether the study meets quality criteria. There may be flaws in the original study, or journal page limitations may necessitate omitting important details. Reporting standards have been developed for some types of reviews (e.g., systematic review, meta-analysis, meta-narrative synthesis, realist synthesis); but there are no agreed upon standards for qualitative reviews. This is an important area for development in advancing the science of research synthesis.

4.8.3. Strengths and Limitations of the Four Synthesis Types

The conventional literature review and now the increasingly common integrative review remain important and accessible approaches for students, practitioners, and experienced researchers who want to summarize literature in an area but do not have the expertise to use one of the more complex methodologies. Carefully executed, such reviews are very useful for synthesizing literature in preparation for research grants and practice projects. They can determine the state of knowledge in an area and identify important gaps in the literature to provide a clear rationale or theoretical framework for a study [14] , [18] . There is a demand, however, for more rigour, with more attention to developing comprehensive search strategies and more systematic approaches to combining, integrating, and synthesizing the findings.

Generally, conventional reviews include diverse study designs and data types that facilitate comprehensiveness, which may be a strength on the one hand, but can also present challenges on the other. The complexity inherent in combining results from studies with diverse methodologies can result in bias and inaccuracies. The absence of clear guidelines about how to synthesize across diverse study types and data [18] has been a challenge for novice reviewers.

Quantitative systematic reviews and meta-analyses have been important in launching the field of evidence-based healthcare. They provide a systematic, orderly and auditable process for conducting a review and drawing conclusions [25] . They are arguably the most powerful approaches to understanding the effectiveness of healthcare interventions, especially when intervention studies on the same topic show very different results. When areas of research are dogged by controversy [25] or when study results go against strongly held beliefs, such approaches can reduce the uncertainty and bring strong evidence to bear on the controversy.

Despite their strengths, they also have limitations. Systematic reviews and meta-analyses do not provide a way of including complex literature comprising various types of evidence including qualitative studies, theoretical work, and epidemiological studies. Only certain types of design are considered and qualitative data are used in a limited way. This exclusion limits what can be learned in a topic area.

Meta-analyses are often not possible because of wide variability in study design, population, and interventions so they may have a narrow range of utility. New developments in meta-analysis, however, can be used to address some of these limitations. Network meta-analysis is used to explore relative efficacy of multiple interventions, even those that have never been compared in more conventional pairwise meta-analyses [121] , allowing for improved clinical decision making [120] . The limitation is that network meta-analysis has only been used in medical/clinical applications [119] and not in public health. It has not yet been widely accepted and many methodological challenges remain [120] – [121] . Meta-regression is another development that combines meta-analytic and linear regression principles to address the fact that heterogeneity of results may compromise a meta-analysis [117] – [118] . The disadvantage is that many clinicians are unfamiliar with it and may incorrectly interpret results [117] .

Some have accused meta-analysis of combining apples and oranges [124] raising questions in the field about their meaningfulness [25] , [28] . More recently, the use of individual rather than aggregate data has been useful in facilitating greater comparability among studies [122] . In fact, Tomas et al. [123] argue that meta-analysis using individual data is now the gold standard although access to the raw data from other studies may be a challenge to obtain.

The usefulness of systematic reviews in synthesizing complex health and social interventions has also been challenged [102] . It is often difficult to synthesize their findings because such studies are “epistemologically diverse and methodologically complex” [ [69] , p.21]. Rigid inclusion/exclusion criteria may allow only experimental or quasi-experimental designs into consideration resulting in lost information that may well be useful to policy makers for tailoring an intervention to the context or understanding its acceptance by recipients.

Qualitative syntheses may be the type of review most fraught with controversy and challenge, while also bringing distinct strengths to the enterprise. Although these methodologies provide a comprehensive and systematic review approach, they do not generally provide definitive statements about intervention effectiveness. They do, however, address important questions about the development of theoretical concepts, patient experiences, acceptability of interventions, and an understanding about why interventions might work.

Most qualitative syntheses aim to produce a theoretically generalizable mid-range theory that explains variation across studies. This makes them more useful than single primary studies, which may not be applicable beyond the immediate setting or population. All provide a contextual richness that enhances relevance and understanding. Another benefit of some types of qualitative synthesis (e.g., grounded formal theory) is that the concept of saturation provides a sound rationale for limiting the number of texts to be included thus making reviews potentially more manageable. This contrasts with the requirements of systematic reviews and meta-analyses that require an exhaustive search.

Qualitative researchers debate about whether the findings of ontologically and epistemological diverse qualitative studies can actually be combined or synthesized [125] because methodological diversity raises many challenges for synthesizing findings. The products of different types of qualitative syntheses range from theory and conceptual frameworks, to themes and rich descriptive narratives. Can one combine the findings from a phenomenological study with the theory produced in a grounded theory study? Many argue yes, but many also argue no.

Emerging synthesis methodologies were developed to address some limitations inherent in other types of synthesis but also have their own issues. Because each type is so unique, it is difficult to identify overarching strengths of the entire category. An important strength, however, is that these newer forms of synthesis provide a systematic and rigorous approach to synthesizing a diverse literature base in a topic area that includes a range of data types such as: both quantitative and qualitative studies, theoretical work, case studies, evaluations, epidemiological studies, trials, and policy documents. More than conventional literature reviews and systematic reviews, these approaches provide explicit guidance on analytic methods for integrating different types of data. The assumption is that all forms of data have something to contribute to knowledge and theory in a topic area. All have a defined but flexible process in recognition that the methods may need to shift as knowledge develops through the process.

Many emerging synthesis types are helpful to policy makers and practitioners because they are usually involved as team members in the process to define the research questions, and interpret and disseminate the findings. In fact, engagement of stakeholders is built into the procedures of the methods. This is true for rapid reviews, meta-narrative syntheses, and realist syntheses. It is less likely to be the case for critical interpretive syntheses.

Another strength of some approaches (realist and meta-narrative syntheses) is that quality and publication standards have been developed to guide researchers, reviewers, and funders in judging the quality of the products [108] , [126] – [127] . Training materials and online communities of practice have also been developed to guide users of realist and meta-narrative review methods [107] , [128] . A unique strength of critical interpretive synthesis is that it takes a critical perspective on the process that may help reconceptualize the data in a way not considered by the primary researchers [72] .

There are also challenges of these new approaches. The methods are new and there may be few published applications by researchers other than the developers of the methods, so new users often struggle with the application. The newness of the approaches means that there may not be mentors available to guide those unfamiliar with the methods. This is changing, however, and the number of applications in the literature is growing with publications by new users helping to develop the science of synthesis [e.g., [129] ]. However, the evolving nature of the approaches and their developmental stage present challenges for novice researchers.

4.9. When to Use Each Approach

Choosing an appropriate approach to synthesis will depend on the question you are asking, the purpose of the review, and the outcome or product you want to achieve. In Additional File 1 , we discuss each of these to provide guidance to readers on making a choice about review type. If researchers want to know whether a particular type of intervention is effective in achieving its intended outcomes, then they might choose a quantitative systemic review with or without meta-analysis, possibly buttressed with qualitative studies to provide depth and explanation of the results. Alternately, if the concern is about whether an intervention is effective with different populations under diverse conditions in varying contexts, then a realist synthesis might be the most appropriate.

If researchers' concern is to develop theory, they might consider qualitative syntheses or some of the emerging syntheses that produce theory (e.g., critical interpretive synthesis, realist review, grounded formal theory, qualitative meta-synthesis). If the aim is to track the development and evolution of concepts, theories or ideas, or to determine how an issue or question is addressed across diverse research traditions, then meta-narrative synthesis would be most appropriate.

When the purpose is to review the literature in advance of undertaking a new project, particularly by graduate students, then perhaps an integrative review would be appropriate. Such efforts contribute towards the expansion of theory, identify gaps in the research, establish the rationale for studying particular phenomena, and provide a framework for interpreting results in ways that might be useful for influencing policy and practice.

For researchers keen to bring new insights, interpretations, and critical re-conceptualizations to a body of research, then qualitative or critical interpretive syntheses will provide an inductive product that may offer new understandings or challenges to the status quo. These can inform future theory development, or provide guidance for policy and practice.

5. Discussion

What is the current state of science regarding research synthesis? Public health, health care, and social science researchers or clinicians have previously used all four categories of research synthesis, and all offer a suitable array of approaches for inquiries. New developments in systematic reviews and meta-analysis are providing ways of addressing methodological challenges [117] – [123] . There has also been significant advancement in emerging synthesis methodologies and they are quickly gaining popularity. Qualitative meta-synthesis is still evolving, particularly given how new it is within the terrain of research synthesis. In the midst of this evolution, outstanding issues persist such as grappling with: the quantity of data, quality appraisal, and integration with knowledge translation. These topics have not been thoroughly addressed and need further debate.

5.1. Quantity of Data

We raise the question of whether it is possible or desirable to find all available studies for a synthesis that has this requirement (e.g., meta-analysis, systematic review, scoping, meta-narrative synthesis [25] , [27] , [63] , [67] , [84] – [85] ). Is the synthesis of all available studies a realistic goal in light of the burgeoning literature? And how can this be sustained in the future, particularly as the emerging methodologies continue to develop and as the internet facilitates endless access? There has been surprisingly little discussion on this topic and the answers will have far-reaching implications for searching, sampling, and team formation.

Researchers and graduate students can no longer rely on their own independent literature search. They will likely need to ask librarians for assistance as they navigate multiple sources of literature and learn new search strategies. Although teams now collaborate with library scientists, syntheses are limited in that researchers must make decisions on the boundaries of the review, in turn influencing the study's significance. The size of a team may also be pragmatically determined to manage the search, extraction, and synthesis of the burgeoning data. There is no single answer to our question about the possibility or necessity of finding all available articles for a review. Multiple strategies that are situation specific are likely to be needed.

5.2. Quality Appraisal

While the issue of quality appraisal has received much attention in the synthesis literature, scholars are far from resolution. There may be no agreement about appraisal criteria in a given tradition. For example, the debate rages over the appropriateness of quality appraisal in qualitative synthesis where there are over 100 different sets of criteria and many do not overlap [49] . These differences may reflect disciplinary and methodological orientations, but diverse quality appraisal criteria may privilege particular types of research [49] . The decision to appraise is often grounded in ontological and epistemological assumptions. Nonetheless, diversity within and between categories of synthesis is likely to continue unless debate on the topic of quality appraisal continues and evolves toward consensus.

5.3. Integration with Knowledge Translation

If research syntheses are to make a difference to practice and ultimately to improve health outcomes, then we need to do a better job of knowledge translation. In the Canadian Institutes of Health Research (CIHR) definition of knowledge translation (KT), research or knowledge synthesis is an integral component [130] . Yet, with few exceptions [131] – [132] , very little of the research synthesis literature even mentions the relationship of synthesis to KT nor does it discuss strategies to facilitate the integration of synthesis findings into policy and practice. The exception is in the emerging synthesis methodologies, some of which (e.g., realist and meta-narrative syntheses, scoping reviews) explicitly involve stakeholders or knowledge users. The argument is that engaging them in this way increases the likelihood that the knowledge generated will be translated into policy and practice. We suggest that a more explicit engagement with knowledge users in all types of synthesis would benefit the uptake of the research findings.

Research synthesis neither makes research more applicable to practice nor ensures implementation. Focus must now turn seriously towards translation of synthesis findings into knowledge products that are useful for health care practitioners in multiple areas of practice and develop appropriate strategies to facilitate their use. The burgeoning field of knowledge translation has, to some extent, taken up this challenge; however, the research-practice gap continues to plague us [133] – [134] . It is a particular problem for qualitative syntheses [131] . Although such syntheses have an important place in evidence-informed practice, little effort has gone into the challenge of translating the findings into useful products to guide practice [131] .

5.4. Limitations

Our study took longer than would normally be expected for an integrative review. Each of us were primarily involved in our own dissertations or teaching/research positions, and so this study was conducted ‘off the sides of our desks.’ A limitation was that we searched the literature over the course of 4 years (from 2008–2012), necessitating multiple search updates. Further, we did not do a comprehensive search of the literature after 2012, thus the more recent synthesis literature was not systematically explored. We did, however, perform limited database searches from 2012–2015 to keep abreast of the latest methodological developments. Although we missed some new approaches to meta-analysis in our search, we did not find any new features of the synthesis methodologies covered in our review that would change the analysis or findings of this article. Lastly, we struggled with the labels used for the broad categories of research synthesis methodology because of our hesitancy to reinforce the divide between quantitative and qualitative approaches. However, it was very difficult to find alternative language that represented the types of data used in these methodologies. Despite our hesitancy in creating such an obvious divide, we were left with the challenge of trying to find a way of characterizing these broad types of syntheses.

6. Conclusion

Our findings offer methodological clarity for those wishing to learn about the broad terrain of research synthesis. We believe that our review makes transparent the issues and considerations in choosing from among the four broad categories of research synthesis. In summary, research synthesis has taken its place as a form of research in its own right. The methodological terrain has deep historical roots reaching back over the past 200 years, yet research synthesis remains relatively new to public health, health care, and social sciences in general. This is rapidly changing. New developments in systematic reviews and meta-analysis, and the emergence of new synthesis methodologies provide a vast array of options to review the literature for diverse purposes. New approaches to research synthesis and new analytic methods within existing approaches provide a much broader range of review alternatives for public health, health care, and social science students and researchers.

Acknowledgments

KSM is an assistant professor in the Faculty of Nursing at the University of Alberta. Her work on this article was largely conducted as a Postdoctoral Fellow, funded by KRESCENT (Kidney Research Scientist Core Education and National Training Program, reference #KRES110011R1) and the Faculty of Nursing at the University of Alberta.

MM's work on this study over the period of 2008-2014 was supported by a Canadian Institutes of Health Research Applied Public Health Research Chair Award (grant #92365).

We thank Rachel Spanier who provided support with reference formatting.

List of Abbreviations (in Additional File 1 )

Conflict of interest: The authors declare that they have no conflicts of interest in this article.

Authors' contributions: KSM co-designed the study, collected data, analyzed the data, drafted/revised the manuscript, and managed the project.

MP contributed to searching the literature, developing the analytic framework, and extracting data for the Additional File.

JB contributed to searching the literature, developing the analytic framework, and extracting data for the Additional File.

WN contributed to searching the literature, developing the analytic framework, and extracting data for the Additional File.

All authors read and approved the final manuscript.

Additional Files: Additional File 1 – Selected Types of Research Synthesis

This Additional File is our dataset created to organize, analyze and critique the literature that we synthesized in our integrative review. Our results were created based on analysis of this Additional File.

UCI Libraries Mobile Site

  • Langson Library
  • Science Library
  • Grunigen Medical Library
  • Law Library
  • Connect From Off-Campus
  • Accessibility
  • Gateway Study Center

Libaries home page

Email this link

Systematic reviews & evidence synthesis methods.

  • Schedule a Consultation / Meet our Team
  • What is Evidence Synthesis?
  • Types of Evidence Synthesis
  • Evidence Synthesis Across Disciplines
  • Finding and Appraising Existing Systematic Reviews
  • 0. Preliminary Searching
  • 1. Develop a Protocol
  • 2. Draft your Research Question
  • 3. Select Databases
  • 4. Select Grey Literature Sources
  • 5. Write a Search Strategy
  • 6. Register a Protocol
  • 7. Translate Search Strategies
  • 8. Citation Management
  • 9. Article Screening
  • 10. Risk of Bias Assessment
  • 11. Data Extraction
  • 12. Synthesize, Map, or Describe the Results
  • Open Access Evidence Synthesis Resources

Preliminary Searching

Preliminary search of the literature.

Before beginning any evidence synthesis project, you will need to search the literature in your topic area for 2 purposes:

  • To determine if there is enough evidence to support an evidence synthesis review on this topic
  • To determine if other teams have recently published or are already working on a similar review, in which case you may want to adjust your research question or pursue a different topic

Medicine and health sciences reviews

Search at least  PubMed and Cochrane Library for published studies and systematic reviews, and PROSPERO , and the JBI Systematic Review Register for registered protocols. Note that PROSPERO only accepts protocols for systematic reviews, rapid reviews, and umbrella reviews, so if you plan to do another type of review, search OSF and the JBI EBP Database instead. If you plan to include qualitative evidence or topics related to nursing and allied health, you should also search CINAHL and any relevant subject-specific databases, such as PsycInfo .

Interdisciplinary reviews or outside health sciences

Search one or more subject-specific databases as well as an interdisciplinary database, such as Scopus or Web of Science . Search OSF for projects and protocol registrations. For social sciences topics, review this list of Campbell Collaboration Title Registrations .

For more context, see also Chapter 2 of Finding What Works in Health Care: Standards for Systematic Reviews: Standards for Initiating a Systematic Review: "Standards for Initiating a Systematic Review."

  • << Previous: Steps in a Systematic Review
  • Next: 1. Develop a Protocol >>
  • Last Updated: May 25, 2024 10:49 AM
  • URL: https://guides.lib.uci.edu/evidence-synthesis

Off-campus? Please use the Software VPN and choose the group UCIFull to access licensed content. For more information, please Click here

Software VPN is not available for guests, so they may not have access to some content when connecting from off-campus.

  • Open access
  • Published: 21 May 2024

Efficacy of interventions and techniques on adherence to physiotherapy in adults: an overview of systematic reviews and panoramic meta-analysis

  • Clemens Ley   ORCID: orcid.org/0000-0003-1700-3905 1 &
  • Peter Putz   ORCID: orcid.org/0000-0003-2314-3293 2  

Systematic Reviews volume  13 , Article number:  137 ( 2024 ) Cite this article

258 Accesses

3 Altmetric

Metrics details

Adherence to physiotherapeutic treatment and recommendations is crucial to achieving planned goals and desired health outcomes. This overview of systematic reviews synthesises the wide range of additional interventions and behaviour change techniques used in physiotherapy, exercise therapy and physical therapy to promote adherence and summarises the evidence of their efficacy.

Seven databases (PEDro, PubMed, Cochrane Library, Web of Science, Scopus, PsycINFO and CINAHL) were systematically searched with terms related to physiotherapy, motivation, behaviour change, adherence and efficacy (last searched on January 31, 2023). Only systematic reviews of randomised control trials with adults were included. The screening process and quality assessment with AMSTAR-2 were conducted independently by the two authors. The extracted data was synthesised narratively. In addition, four meta-analyses were pooled in a panoramic meta-analysis.

Of 187 reviews identified in the search, 19 were included, comprising 205 unique trials. Four meta-analyses on the effects of booster sessions, behaviour change techniques, goal setting and motivational interventions showed a significantly small overall effect (SMD 0.24, 95% CI 0.13, 0.34) and no statistical heterogeneity ( I 2  = 0%) in the panoramic meta-analysis. Narrative synthesis revealed substantial clinical and methodological diversity. In total, the certainty of evidence is low regarding the efficacy of the investigated interventions and techniques on adherence, due to various methodological flaws. Most of the RCTs that were included in the reviews analysed cognitive and behavioural interventions in patients with musculoskeletal diseases, indicating moderate evidence for the efficacy of some techniques, particularly, booster sessions, supervision and graded exercise. The reviews provided less evidence for the efficacy of educational and psychosocial interventions and partly inconsistent findings. Most of the available evidence refers to short to medium-term efficacy. The combination of a higher number of behaviour change techniques was more efficacious.

Conclusions

The overview of reviews synthesised various potentially efficacious techniques that may be combined for a holistic and patient-centred approach and may support tailoring complex interventions to the patient’s needs and dispositions. It also identifies various research gaps and calls for a more holistic approach to define and measure adherence in physiotherapy.

Systematic review registration

PROSPERO CRD42021267355.

Peer Review reports

Adherence to physiotherapeutic1 treatment and recommendations is crucial to achieving the planned goals and desired effects [ 1 , 2 ]. This is because the desired effects are usually only achieved in the long term if the recommended treatment and home-based exercises are carried out regularly. However, non-adherence in physiotherapy can be as high as 70%, particularly in unsupervised home exercise programmes [ 1 , 3 ] and may differ among medical conditions [ 4 ]. The World Health Organization defines adherence to therapy as ‘the extent to which a person’s behaviour—taking medication, following a diet and/or executing lifestyle changes, corresponds with agreed recommendations from a health care provider’ [ 5 ]. Long-term adherence often requires lifestyle changes, which can be supported by behaviour change techniques (BCTs). BCTs are considered the ‘active, replicable and measurable component of any intervention designed to modify behaviour’ ([ 6 ],cf. [ 7 ]). BCTs are defined and operationalised in the behaviour change taxonomy [ 8 ], based on theoretical underpinnings and a Delphi study. Theoretical models to explain (non-)adherence and (a) motivation as well as techniques to promote behaviour change have been extensively studied in health and exercise psychology [ 9 , 10 , 11 ]. Rhodes and Fiala [ 12 ] argue that despite several strong psychological theories that have been developed to explain behaviour, few provide guidance for the design and development of interventions. Furthermore, theories may not be equally applicable to all behavioural domains, therapeutic regimes and settings. For example, the factors determining adherence to (passive) medication use differ from those influencing adherence to (active) physical therapies and exercise behaviour (cf. [ 5 ]). This review specifically addresses the domain of physiotherapy and therapeutic exercise.

Existing reviews of predictive studies identified factors influencing adherence positively or negatively, showing the predominately conflicting and low evidence of a wide range of predictive factors for adherence [ 1 , 2 , 13 ]. Moderate to strong evidence was shown for some factors, referring to previous (adherence) behaviour and treatment experiences, physical activity level, social support and psychosocial conditions, number of exercises and motivational dispositions. Such predictive studies have identified the possible targets for intervention but do not provide evidence on the efficacy of interventions. In contrast, randomised control trials (RCTs) are recognized as the preferred study design for investigating the efficacy of interventions. Thus, this overview of reviews Footnote 1 aimed at providing a synthesis of reviews that examined RCTs, allowing for the discussion of the efficacy of different interventions and BCTs on adherence-related outcomes.

There are numerous reviews on adherence to physiotherapy and (home-based) exercise, and on BCTs to increase physical activity levels, therapeutic exercise or self-organised exercise [ 1 , 2 , 3 , 14 , 15 , 16 , 17 , 18 ]. Yet, no systematic overview of reviews has been identified that specifically synthesised the efficacy of interventions and techniques to enhance adherence to physiotherapy.

Objectives and research questions

Therefore, the aim of this overview of reviews was to synthesise the evidence on the efficacy of interventions and techniques on adherence in physiotherapy, to explore heterogeneity regarding the theoretical underpinnings, types of interventions used, and the adherence-related measures and outcomes reported, and finally to identify research gaps. Thus, the primary research question is the following: How efficacious are interventions and techniques in increasing adherence to physiotherapy? Secondary research questions are as follows: What types of intervention and behaviour change techniques were investigated? Which theoretical underpinning was reported? How was adherence defined and related outcomes measured?

This overview of reviews is guided by the research questions and aligns with the common purposes of overviews [ 19 , 20 ] and the three functions for overviews proposed by Ballard and Montgomery [ 21 ], i.e. to explore heterogeneity, to summarize the evidence and to identify gaps. This overview approach is appropriate for addressing the research questions specified above by exploring different types of interventions and behaviour change techniques and by synthesising the evidence from systematic reviews of RCTs on their efficacy. The review protocol was registered ahead of the screening process in PROSPERO (reg.nr. CRD42021267355). The only deviations from the registration were that we excluded reviews of only cohort studies, due to the already broad heterogeneity of intervention and outcome measures, and that we additionally performed a panoramic meta-analysis.

Information sources, search strategy and eligibility criteria

The search in seven databases, PEDro, PubMed, Cochrane Library, Web of Science, Scopus, PsycInfo and CINAHL (Cumulative Index to Nursing and Allied Health Literature), was last updated on January 31, 2023. The search strategy was structured according to the PICOS (Population, Intervention, Comparison, Outcome and Study Type) scheme. The search terms related to physiotherapy and motivation or behaviour change and adherence and effectiveness/efficacy (details on the searches are listed in Additional file 1 ). A filter was applied limiting the search to (systematic) reviews. No publication date restrictions were applied.

Table 1 outlines the study inclusion and exclusion criteria. Only studies published in peer-reviewed journals were included. The review addressed adult patients, with any illness, disease or injury, and thus excluded studies on healthy populations. Reviews in the field of physiotherapy, physical therapy or the therapeutic use of exercise or physical activity were included if they investigated adherence as a primary outcome. Studies measuring adherence as a secondary outcome were excluded as they do analyse interventions that were not primarily designed to promote adherence and thus are outside the scope of this overview. Reviews that analysed only studies on digital apps or tools (e.g. virtual reality, gamification, exergames or tele-rehabilitation) were excluded from this overview, as they were outside of the scope of this overview. Only systematic reviews that appraised RCTs were included. Reviews appraising RCTs and other study designs were included if RCT results could be extracted separately. Systematic reviews are in our understanding literature reviews of primary studies with a comprehensive description of objectives, materials and methods; considering the risk of bias and confidence in the findings; and reporting according to the PRISMA statement [ 22 , 23 , 24 ]. Adherence is defined as the extent to which a person’s behaviour corresponds with treatment goals, plans or recommendations [ 5 ]. Related terms used in the literature are compliance, maintenance, attendance, participation and behaviour change or lifestyle modification and were thus included in the search strategy.

Screening and selection process

Author CL conducted the search in the seven different databases and removed duplicates, using the Zotero bibliography management tool. Following this, authors CL and PP both independently screened the titles and abstracts of the resulting sources (see Fig.  1 Flow diagram). After removing the excluded studies, PP and CL independently screened the remaining full texts in an unblinded standardised manner. Reasons for exclusion were noted in a screening spreadsheet. Any discrepancy was discussed, verified and resolved by consensus.

Data collection process and data items

Data extraction was done by CL after agreeing with PP on the criteria. A spreadsheet was created with the following data extraction components: (i) objectives and main topic of the review; (ii) study design(s) and number of studies included and excluded; (iii) search strategies (incl. PICO); (iv) population including diagnosis, sample sizes and age; (v) intervention and comparison, theoretical foundations and models used for designing the intervention; (vi) time frames, including follow-up; (vii) adherence-related outcome and outcome measures; (viii) key findings; (ix) analysis of primary studies (meta-analytical, other statistical or narrative analysis); and (x) tools used for the quality assessment, risk of bias and evidence grading. Primary outcomes on adherence included, adherence rates or categories, engagement, attendance and participation, and accomplished physical activity levels. PP verified the data extraction results. The data was extracted as reported in the systematic reviews, then reformatted and displayed in the tables and used for the narrative synthesis.

Assessment of risk of bias across reviews

Systematic reviews of RCTs are ranked highest in the evidence level [ 25 ], but are subjected to risk of bias (RoB). In an overview of reviews of systematic reviews, there are further risks of bias, in addition to those deriving from the primary studies and those deriving from the review of those studies. Particularly, the overlap of reviews regarding the included individual studies may bias the findings. According to the purpose of this overview, i.e. to synthesise the wide range of interventions and behaviour change techniques used to promote adherence and to summarise the evidence of their efficacy, the overlap of reviews regarding intervention or population was not an exclusion criterion. For considering the overlap of primary studies among the reviews, CL extracted the primary RCTs from the included reviews, identified the unique trials and compared the frequency of their use across the reviews (see results overlap of review and Additional file 2 ). Furthermore, where two or more reviews provided findings on the same technique (e.g. on the efficacy of behavioural graded activities), the overlap of primary studies was assessed specifically for that finding. If the evidence came from the same study, this was taken into account and marked accordingly in Table  5 to avoid double counting and overestimation of evidence.

Assessment of risk of bias within the reviews

CL and PP independently assessed the quality and risk of bias of the systematic reviews included, using the AMSTAR-2 tool [ 26 ]. Any discrepancy was discussed and resolved by consensus. AMSTAR (A MeaSurement Tool to Assess systematic Reviews) was developed to evaluate systematic reviews of randomised trials. The AMSTAR-2 revision enables a more detailed assessment of systematic reviews which may also include non-randomised studies of healthcare interventions. The applied AMSTAR-2 checklist consists of 16 items, whereof seven are classified as critical, and the appraisal results in an overall confidence rating distinguishing between critically low, low, moderate or high [ 26 ]. In addition, the overall confidence in the review was stipulated by the number of positive assessments in relation to the applicable domains (depending if meta-analysis was performed or not) and considering whether an item represents a critical domain or not [ 26 ].

Synthesis methods

Panoramic meta-analysis.

Among the included reviews, there were four meta-analyses [ 7 , 16 , 27 , 28 ], which were pooled as a panoramic meta-analysis based on the reported effect sizes and standard errors using IBM SPSS Version 29 (IBM Corp., Armonk, NY, USA). All four meta-analyses used the standardized mean difference as effect size. Standard errors were calculated from the reported 95% CI as \(\frac{\mathrm{upper bound }-\mathrm{ lower bound}}{3.92}\) . Inverse variance was used to weight the meta-analyses, statistical heterogeneity was assessed by I -squared and a fixed-effects model was selected based on the absence of statistical heterogeneity of true effects. Eisele et al. [ 7 ] included 15 primary trials that examined the effect of BCTs on physical activity adherence. They pooled results for medium-term (3–6 months) and long-term (7–12 months) interventions, from which we selected the medium-term model that best matched the eligibility criteria of the other included meta-analyses. Levack et al. [ 27 ] included nine primary trials that examined the effect of goal-setting strategies on engagement in rehabilitation. Among models with other outcomes, we selected this model because it best matched the aim of this overview, and it was most consistent with the outcomes of the other included meta-analyses. McGrane et al. [ 28 ] included six primary trials, representing 378 subjects that examined the effects of motivational interventions on physiotherapy session attendance. They reported another model with perceived self-efficacy as an outcome, but we selected the attendance model because it best matched the aim of this overview, and it was most consistent with the outcomes of the other included meta-analyses. Nicolson et al. [ 16 ] included two primary trials that examined the effect of booster sessions on self-rated adherence. Results were summarized by a forest plot and publication bias was assessed graphically by a funnel plot, although the small number of individual meta-analyses included limits its interpretability. Alpha was set at 0.05.

Narrative synthesis

The narrative synthesis was performed by CL in constant dialogue with and verification of PP. Guided by the research questions, the narrative synthesis of the extracted data was manifold. First, we explored the heterogeneity of interventions, measures and adherence-related outcomes across and within the reviews using the data extraction table. Definitions and measures of adherence were compared among the reviews and discussed. Second, analysis of the descriptions of the interventions and their respective components/techniques, their theoretical underpinning and their objectives was used to classify the interventions according to different types of intervention, namely the informational/educational, the cognitive/behavioural/motivational and the relational/psychosocial intervention. Consequently, for each type of intervention, the results on the efficacy were narratively synthesised. In addition, reported differences in efficacy among medical conditions, theoretical underpinnings and physiotherapeutic settings were summarised based on the data extraction table. Third, the results on the efficacy of the interventions and BCTs were further summarised in a table and then restructured according to the evidence level as reported in the systematic reviews and the confidence in the reviews as analysed by the AMSTAR-2. Therefore, the levels of evidence were extracted as reported in the reviews, which are based on different evidence appraisal schemes: GRADE (high, moderate, low, very low certainty of evidence), Cochrane Collaboration Back Review Group Evidence Levels (strong, moderate, conflicting, limited, no evidence) and self-developed tools. Afterwards, they were compared for the respective intervention/technique across the relevant reviews, considering the confidence in the review and the comprehensiveness of the review as well. The levels of evidence are presented in the table with the categories high, moderate, low and very low. The efficacy supported by the evidence is also based on the results reported in the reviews. In case of overlapping reviews or discrepancies between the reviews, the primary studies were consulted. The category yes refers to results of merely positive effects, and inconsistent refers to findings of positive and no effects of the intervention (techniques) analysed. The category no indicates that the intervention was not efficacious. No negative effects (i.e. favouring the control condition) were reported for the intervention (techniques) shown.

The reporting of findings followed the PRIOR reporting guideline for overviews of reviews of healthcare interventions [ 29 ].

Study selection results

Of the 187 records screened, 19 were included (see Fig.  1 ). Main reasons for exclusion were not a systematic review of RCTs ( n  = 79), adherence not the primary outcome ( n  = 60), and lack of physiotherapy relevance ( n  = 39) (see Fig.  1 ).

figure 1

Flow diagram, based on PRISMA [ 24 ] and PRIOR [ 29 ] guidelines. Legend: *Multiple reasons for exclusion were possible

Characteristics and diversity of included reviews

The selection strategy resulted in a broad heterogeneity of included reviews. The 19 included reviews differed in their eligibility criteria of the primary studies as well, resulting in substantial clinical diversity, i.e. the inclusion of heterogenous conditions, intervention types and settings (see Table  2 ) and methodological diversity, i.e. the variability in study design, outcome measurements and risk of bias (see Tables 3 , 4 and 5 ). Musculoskeletal diseases [ 6 , 7 , 17 , 30 , 31 , 32 ] and pain [ 13 , 16 , 33 , 34 , 35 ] were the most investigated medical conditions. Those reviews that did not limit their search to a specific disease [ 12 , 27 , 28 , 36 , 37 , 38 , 39 , 40 ] yielded predominantly studies on musculoskeletal diseases. All reviews included adults only (18 and older). One focused on elderly (65 and older) people [ 40 ] and one on older (45 and older) adults [ 16 ]. Fourteen of the 19 reviews analysed RCTs only [ 6 , 7 , 16 , 17 , 27 , 28 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 39 , 40 ]; one also included besides RCT cohort studies [ 13 ] and three [ 12 , 37 , 38 ] also included any other quantitative study design (see Table  3 ). Four reviews performed a meta-analysis [ 7 , 16 , 27 , 28 ], and two studies were Cochrane Reviews [ 27 , 35 ]. Four reviews [ 6 , 7 , 17 , 40 ] analysed the use of BCTs and rated the interventions according to a BCT taxonomy [ 8 ].

Results of the individual reviews

The 19 reviews contained a total of 205 unique RCTs. Table 3 shows the main results of each review.

Results of quality assessment and confidence in the reviews

The critical appraisal with the AMSTAR-2 tool (see Table  4 ) showed that four reviews were rated with moderate to high quality [ 7 , 16 , 27 , 35 ], whereas all others resulted in a critically low to low overall confidence in the review. Frequent shortcomings were not explaining the reasons for the inclusion of primary study designs, and an insufficient discussion of the heterogeneity observed. Furthermore, as many reviews did not explicitly mention a pre-established, published or registered protocol or study plan, it is uncertain whether the research followed a pre-specified protocol and whether there were changes and/or deviations from it, and, if so, whether decisions during the review process may have biased the results [ 26 ].

Risk of bias and evidence assessment within reviews

The reviews used various approaches to appraise the evidence, particularly the GRADE (Grades of Recommendation, Assessment, Development and Evaluation) system [ 13 , 16 , 26 , 27 ], the evidence levels by the Oxford Centre for Evidence-Based Medicine [ 28 ] or the system by Cochrane Collaboration Back Review Group [published by 25,30] [ 31 , 32 , 33 , 34 ]. Three reviews modified existing or developed their own tool or checklist [ 12 , 35 , 36 ]. For the assessment of the risk of bias and/or quality of the individual studies, the reviews used the following tools: PEDro Scale [ 7 , 13 , 26 , 32 , 37 ], Cochrane Collaboration Back Review Group Quality Assessment Tool [ 31 , 34 ], Cochrane Risk of Bias criteria [ 6 , 16 , 17 , 27 , 33 , 37 , 38 , 39 ], the Delphi List [ 40 ] or modified or developed own tools [ 12 , 35 , 36 ].

A recurring concern regarding potential performance bias was the lack of therapist blinding, which is almost impossible to implement in this research field [ 7 ]. Attrition bias, due to low sample size or drop-outs, and measurement bias, due to the mere use of subjective measures, were also highlighted in the reviews. Another concern was the availability and selection of adequate control groups. Control groups, such as usual practice, unspecific exercise group or alternative intervention commonly include varying numbers of BCTs which must be considered when assessing and comparing contents of interventions [ 7 ]. The comparability of the intervention and control group regarding adherence-related outcomes is further hindered by poor descriptions of the intervention, uncertainty about treatment fidelity and implementation processes, varying competences and proficiency of the therapist, and the diverse translation of theoretical models and use of intervention techniques [ 7 , 34 , 39 ]. Rhodes and Fiala [ 12 ] pointed out that procedures of RCTs, such as several pre-screenings and measurement batteries, may lead to a potential self-selecting of only the most motivated individuals. This may limit the ability to compare intervention to the control group, as both groups are (already) highly motivated, and to detect changes, due to the already high motivation and disposition to adhere. This may explain in part, that the reviews reported many studies that failed to provide evidence for intervention efficacy on adherence. In addition, the restricted timeline (limited duration for observation and follow-up) of the studies may confound/skew the results, as drop-out may occur shortly after the end of the study and long-term adherence is not measured [ 12 ].

Overlap of reviews

The 19 reviews included from 3 to 42 individual RCTs. In sum, the reviews included 261 RCTs (multiple publications on the same trial were counted as one; thus, the number of trials was counted), whereby 34 trials were included in various reviews (see Additional file 2 , Overlap of reviews), resulting in 205 unique RCTs. Of these 34 trials included in multiple reviews, 25 were included in two different reviews. The following trials were included more than twice: Basler et al. 2007 (8x), Friedrich et al. 1998 (7x), Schoo et al. 2005 (4x), Vong et al. 2011 (4x), Asenlof et al. 2005 (3x), Bassett and Petrie 1999 (3x), Brosseau et al. 2012 (3x), Bennell et al. 2017 (3x), Gohner and Schlicht 2006 (3x) and Duncan and Pozehl 2002, 2003 (3x).

In total, the overlap of primary trials in the reviews is considered low; except among reviews [ 27 , 39 ] and among reviews [ 12 , 16 , 28 , 30 ]. Two reviews [ 27 ] and [ 39 ] were conducted by the same authors, within the same field, i.e. goal planning and setting, however with a different approach and research question. Reviews [ 12 , 16 , 28 , 30 ] have a considerable amount of overlap. Still, each of these reviews included unique RCTs, not analysed in any of the other reviews, and they do focus on different research questions, foci and analyses. Therefore, we did not exclude an entire review due to an overlap of studies.

Synthesis of results

The synthesis focused on answering the research questions. We began by presenting the narrative synthesis findings on how adherence was measured, what types of intervention and BCTs were investigated, and which theoretical underpinnings were reported. Afterwards, we synthesised the evidence on the efficacy of the interventions and BCTs, both meta-analytically and narratively.

Measures of adherence and related outcomes

The reviews included studies with a heterogeneous use, breadth and measures of adherence. Mostly, they refer to adherence as the extent to which a person’s behaviour corresponds with treatment goals, plans or recommendations ([ 30 ],cf. [ 5 ]). McLean and colleagues [ 30 ] expressed that within physiotherapy, the concept of adherence is multi-dimensional and could refer to attending appointments, following advice or undertaking prescribed exercises. The terms adherence and compliance were sometimes used interchangeably, referring to the degree of treatment attendance or accomplishment of physical activity levels, participation and recommendations, irrespective of how the treatment goals and plans were established. Yet, for definition purposes, the distinction between agreed and prescribed goals and plans was occasionally used in the reviews to distinguish adherence from compliance .

For analytical purposes, adherence was frequently dichotomised, establishing a cutoff point or percentage used to distinguish adherence from non-adherence. One was considered adherent, for example, if he/she achieved more than 70% or 80% of the targeted, recommended or prescribed sessions. Few studies graded the degree of adherence according to multi-categorical cut-off points (e.g. very low, low, moderate and high adherence). Only in one review [ 13 ], one study was named that distinguished a certain fluctuation in the adherence pattern, i.e. Dalager et al. [ 41 ] included besides the minutes exercised in a week the regularity of participation, distinguishing regular from irregular participation. Self-reported diaries, exercise logs and attendance lists were the most commonly used data recording instruments [ 33 , 35 , 37 ]. Adherence to home-based programmes was mainly measured with self-reported diaries, which are problematic as the only source, due to poor completion rates, and the possibility of inaccurate recall and self-presentation bias [ 18 , 33 ]. Digital devices (e.g. accelerometers or pedometers) may be used additionally to measure adherence; however, their use may also be problematic, as they require certain adherence to a systematic use of the device and the mere use of the device also may increase adherence [ 18 , 33 ]. One study reported the use of the Sport Injury Rehabilitation Adherence Scale (SIRAS) [ 42 ], which measures the patients’ degree and manner of participation in a session and compliance with the therapist’s instructions and plan. Thus, it does not measure adherence over a certain period of time nor adherence to recommendations or home-based exercise, but it can be used to assess the intensity of rehabilitation exercises, the frequency with which they follow the practitioner’s instructions and advice, and their receptivity to changes in the rehabilitation programme during that day’s appointment [ 42 ].

Interventions used to promote adherence

The reviews included a wide range of different interventions, which we grouped into three different intervention types:

Information provision and patient education were investigated in seven reviews [ 12 , 13 , 30 , 31 , 33 , 34 , 36 ], including (i) video- and audio-assisted patient education, (ii) phone calls, (iii) use of supporting materials and spoken or graphically presented information or (iv) other didactical interventions. Patient education has been defined as ‘any combination of learning experiences designed to facilitate voluntary adoption of behaviour conducive to health’ [ 43 ]. Niedermann et al. [ 31 ] distinguished between ‘purely’ educational programs based on knowledge transfer and psychoeducational programs. In the latter, motivational techniques and shared knowledge-building processes are added to the educational programme, which is done similarly in health coaching [ 34 ], and thus also relate to the cognitive, behavioural and relational/psychosocial interventions.

Cognitive and behavioural motivational interventions were relating frequently to cognitive-behavioural and social-cognitive theories, and applied (i) behavioural graded exercise; (ii) booster sessions, refresher or follow-up in situ by the therapist or via phone call; (iii) behavioural counselling (focusing on readiness to change); (iv) psychoeducational counselling; (v) supervision; (vi) (unspecified) motivational intervention; (vii) positive reinforcement; (viii) action and coping planning; and (ix) goal setting [ 7 , 12 , 13 , 16 , 27 , 28 , 30 , 32 , 33 , 34 , 39 ].

Relational and psychosocial interventions were less investigated overall. Related aspects included (i) social support; (ii) patient-centeredness, in particular patient-led goal setting, motivational interviewing and the therapeutic or working alliance; and (iii) emotional components [ 6 , 13 , 17 , 33 ].

The included reviews focused either on one particular or several types of intervention. Particularly, four reviews [ 6 , 7 , 17 , 40 ], which used a BCT taxonomy to analyse the interventions of the primary studies, described BCTs relating to all three intervention types. While this distinction of different types of interventions is useful to showcase the range of diverse interventions and techniques, they do have a great overlap and include a mix of different BCTs. For example, the way of facilitation of information, supervision or goal setting was approached differently according to the relational approach, i.e. being more instructive, directive or more collaborative, participatory, patient-led ([ 31 ],cf. [ 34 ]).

Theoretical underpinning of interventions

No review focused on only one theoretical foundation or excluded studies based on any theoretical model or not underpinning the intervention. In total, the reviews included studies with diverse theoretical models and varying degrees of theoretical underpinning. References to the cognitive behavioural theory (CBT) and to the social-cognitive theory were frequent in the individual studies. Furthermore, the self-determination theory, the transtheoretical model, the health belief model, the social learning theory and the socioemotional selectivity theory were used in some individual studies (cf. [ 11 ]). The heterogeneity in the theoretical underpinning of the interventions is reinforced by the given overlap of the theories and models (cf. [ 11 ],[ 28 ]) and various BCTs are key components of several theories [ 17 ]. Furthermore, theories were not used enough to explicitly inform and underpin interventions and they were translated into practise in different ways; thus, interventions based on the same theory may differ substantially [ 17 ].

The BCT Taxonomy v1 [ 8 ], which relates to various theoretical models, was used in four reviews [ 6 , 7 , 17 , 40 ] to identify BCTs in interventions in a standardized manner. The Behaviour Change Wheel [ 44 ], which is linked to the BCT Taxonomy v1, was referred to in one review [ 40 ] pointing to its usefulness for designing a behaviour change intervention. The number of BCTs used appears to be relevant, as interventions using a higher number (≥ 8) of BCTs achieved a significant effect (pooled SMD = 0.29, 95% CI 0.19–0.40, p  < 0.001), whereas interventions using a lower number (< 8) of BCTs did not (pooled SMD = 0.08, 95% CI -0.11 to 0.27, p  = 0.41).

Overall efficacy and heterogeneity according to the panoramic meta-analysis

Although there was statistical heterogeneity ( I 2 from 41 to 63%) between the primary studies included in each meta-analysis [ 7 , 16 , 27 , 28 ], there was no heterogeneity between the pooled effects of these four meta-analyses ( I 2 0%). This means that all variability in the effect size estimates (SMD from 0.20 to 0.39) was attributable to sampling error, but there was no variability in the true effects. Although the interventions were selected based on different eligibility criteria (BCTs, goal-setting strategies, motivational interventions and booster sessions), they appear to be very similar in terms of the effects they trigger. There was no overlap between the primary trials included in the meta-analyses. The pooled SMD was 0.24 (95% CI 0.13, 0.34) (Fig.  2 ). Effect size estimates were somewhat larger in those meta-analyses with less weight in the model (i.e. due to a larger standard error). However, no obvious publication bias could be detected in the funnel plot (Fig.  3 ). Sensitivity analyses in the meta-analysis in Eisele et al. [ 7 ], considering only studies with PEDro scores of 6 or more, revealed slightly lower effect sizes but still statistically significant effect sizes regarding medium-term effects (SMD PEDro>=6 0.16, 95% CI 0.04–0.28, p  < 0.01 versus SMD all 0.20, 95% CI 0.08–0.33, p  < 0.01) and higher numbers of BCTs (SMD PEDro>=6  = 0.26, 95% CI 0.16–0.37, p  < 0.001 versus SMD all  = 0.29, 95% CI 0.19–0.40, p  < 0.001), indicating that low-quality studies may tend to overestimate the efficacy ([ 7 ],cf. [ 31 ]).

figure 2

Forest plot of panoramic meta-analysis: interventions aiming at improving adherence, adherence-related outcomes

Legend: Eisele 2019. Intervention: Interventions aiming at improving physical activity levels or adherence, containing at least one BCT. Comparison: Usual care, minimal intervention, placebo or no intervention. Outcome: Any measure of physical activity level or adherence to any kind of physical activity. Levack 2015. Intervention: Goal setting (with or without strategies to enhance goal pursuit). Comparison: No goal setting. Outcome: Engagement in rehabilitation. McGrane 2015. Intervention: Motivational interventions as part of a package, psychological strategies, theory-based instructional manuals, Internet-based behavioural programmes and relapse prevention, and re-inforcement strategies. Comparison: Any comparison (not specified). Outcome: Attendance at physiotherapy sessions/exercise classes. Nicolson 2017. Intervention: Booster sessions to increase adherence to therapeutic exercise. Comparison: Contextually equivalent control treatments. Outcome: Self-rated adherence

figure 3

Funnel plot of publication bias

Efficacy of informational and educational interventions

The results of five—partly overlapping—reviews [ 12 , 30 , 31 , 34 , 36 ] showed, with a very low evidence base, that interventions that primarily aimed at information provision and knowledge transfer to the patient had limited efficacy on adherence-related outcomes. There was conflicting evidence and inconsistent efficacy of video-assisted patient education [ 36 ] and individualised exercise videos [ 12 , 30 ] in modifying behaviour or adherence. However, the authors identified the format in which the educational information is presented and the complexity of the addressed behaviour as crucial factors [ 36 ]. Videos that provide only spoken or graphically presented health information are inappropriate tools for changing patient behaviour. However, videos with a narrative format appear to be a powerful education tool [ 36 ]. Low evidence based on one study [ 12 , 30 ] indicates that additional written information seems superior to verbal instructions alone (mean difference between groups 39.3%, p  < 0.001). With a high overlap of studies, two reviews [ 30 , 31 ] showed that there is limited evidence for long-term effects of patient education targeting knowledge acquisition. While the informative and instructive educational approach is an essential part of patient education, patient education often involves more than the transfer of knowledge [ 30 , 31 , 34 ]. Niedermann et al. [ 31 ] compared educational and psychoeducational interventions and provided arguments in favour of psychoeducational approaches that enrich patient education with motivational strategies and techniques (cf. [ 34 ]).

Efficacy of cognitive and behavioural motivational interventions

Several (though partly overlapping) reviews [ 12 , 16 , 28 , 30 , 33 , 37 ] examined studies on additional motivational interventions that were based on social-cognitive or cognitive-behavioural theories. McGrane et al. [ 28 ] concluded heterogeneity of motivational interventions, outcomes and measurements as potential causes for conflicting evidence regarding effects on exercise attendance and PT adherence, as they found no significant difference ( p  = 0.07) in exercise attendance between additional motivational intervention groups and their controls (pooled SMD 0.33, 95% CI -0.03 to 0.68, I 2 62%), but a significant ( p  < 0.01) medium-sized effect of additional motivational interventions on self-efficacy beliefs (pooled SMD 0.71, 95% CI 0.55 to 0.87, I 2 41%). The heterogeneity hindered in this meta-analysis the statistical analysis of subgroups to determine and compare the efficacy of different components and approaches to motivational interventions [ 28 ]. Another meta-analysis [ 16 ] found moderate-quality evidence that booster sessions with a physiotherapist helped people with hip/knee osteoarthritis to better adhere to therapeutic exercise (pooled SMD 0.39, 95% CI 0.05 to 0.72, p  = 0.02, I 2 35%). Moderate evidence for the efficacy of supervision (2 studies, n  = 193) favouring adherence was shown [ 13 , 33 , 35 ].

In four reviews [ 16 , 32 , 33 , 35 ], four unique high-quality trials supported the use of motivational strategies and behavioural graded exercise to improve adherence to exercise (effect sizes 0.26–1.23)[ 16 ]. Behavioural graded exercise includes a preset gradual increase of the physical activity through facility-based interventions followed by booster sessions [ 45 ] and uses principles of operant conditioning and self-regulation [ 16 ].

While cognitive behavioural programmes seem superior to exercise alone for short-term adherence and clinical attendance [ 30 ], behavioural counselling focusing on readiness to change, action and coping plans and/or audio/video exercise cues seem not to improve adherence significantly [ 16 ]. Holden [ 34 ] concludes inconsistent evidence for health coaching based on the transtheoretical model of change, with one RCT showing some efficacy on exercise compliance (SMD = 1.3). However, the frequently referred to study of Göhner and Schlicht [ 46 ], who analysed a cognitive-behavioural intervention with a strong emphasis on action and coping planning [ 12 ], showed no difference between experimental and control groups in the first 11 weeks, but a significant difference 5 months later on behaviour (SMD = 0.83) as well as differences over all time-points on self-efficacy (interaction effect of time by group, F (3, 43) 10.36, p  < 0.001, n  = 47) favouring the intervention [ 46 ]. Motivational interventions, including positive reinforcement, increased (i) adherence to home exercise in one RCT [ 33 ], (ii) reported frequency of exercise in two RCTs [ 35 ] and (iii) self-efficacy beliefs in two RCTs, in the short-term (SMD = 1.23) and in the long-term (SMD = 0.44) ([ 16 ],cf. [ 30 ]). Self-efficacy beliefs relate to the trust in one’s capacities/competencies to cope with daily demands [ 47 ] and are associated (moderate evidence) with adherence [ 13 , 48 ].

Levack et al. [ 27 ] conclude some evidence that goal planning/setting improves engagement in rehabilitation (motivation, involvement and adherence) over the duration of the programme (9 studies, 369 participants, SMD 0.30, 95% CI -0.07 to 0.66). Furthermore, they show a low-quality evidence for effects on patient self-efficacy from more structured goal setting compared to usual care with or without goal setting (2 studies, 134 participants; SMD 0.37, 95% CI 0.02 to 0.71) and from goal setting compared to no goal setting (3 studies; 108 participants; SMD 1.07, 95% CI 0.64 to 1.49). The review did not detect differences in efficacy between the approach taken to goal planning. However and similar to patient education [ 34 ], the review authors argue that the lack of clarity about the effects and the low evidence is due to the heterogeneity of the implementation of goal planning, lack of detailed descriptions of the goal-setting process in the intervention groups but also in the control groups, and methodological flaws ([ 27 , 39 ],cf. [ 13 ]).

The BCTs from the cluster goals and planning showed various positive effects, although not fully consistently [ 6 , 7 , 40 ]. Eisele et al. [ 7 ] identified goal setting (behaviour) , problem-solving , goal setting (outcome) , action planning and reviewing behaviour goal(s) as often used in non-effective interventions but also in effective ones. A trial that showed negative effects included problem-solving and goal setting (outcome) as well. Room et al. [ 40 ] found one study on older people and Thacker et al. [ 6 ] two home-exercise-related studies that used BCTs from the goals and planning cluster (i.e. problem-solving and action planning), but none of the studies found differences in favour of the intervention. Willett et al. [ 17 ] adjusted the BCTv1 taxonomy to differentiate patient-led and therapist-led goal setting and showed that patient-led goal setting (behaviour) achieved among the highest efficacy ratios across time points.

Efficacy of relational and psychosocial interventions

The BCT Social Support (unspecified) refers to ‘advise on, arrange or provide social support (e.g. from friends, relatives, colleagues, ’buddies’ or staff) or non-contingent praise or reward for the performance of the behaviour . It includes encouragement and counselling, but only when it is directed at the behaviour’ [8, Supplementary Material]. Eisele et al. [ 7 ] identified this BCT in 19 interventions and 10 control conditions. They found this BCT in three trials supporting efficacy and in seven trials supporting inefficacy. In contrast, Thacker et al. [ 6 ] found this BCT in all effective interventions but not in the non-effective ones. Willet et al. [ 17 ] concluded from their review that this BCT has among the highest efficacy ratios across time points to promote adherence to physical activity.

Social support may come along with monitoring and feedback, which can be graphically or narratively presented by the therapist. Willett et al. [ 17 ] recommend that self-monitoring (e.g. activity diaries), feedback on behaviour as well as social support should be used—beyond monitoring purposes—for explicit intervention purposes (e.g. to foster self-efficacy beliefs). Feedback on behaviour alone does not seem to be efficacious [ 6 ], but feedback can be efficacious for instance in combination with social support or goal setting and planning [ 17 , 40 ].

Patient-centred approaches were also included in the relational/psychosocial intervention type. Motivational interviewing, which is a collaborative, patient-centred communication style to promote behaviour change [ 49 ], was used in three studies, indicating positive effects on exercise compliance, physical activity and exercise at home in two trials, whereas no effect in a pilot study [ 28 ]. There is low evidence from three RCTs for positive effects of the therapist-patient alliance on global assessments; however, the efficacy on adherence-related outcomes is unclear [ 36 ]. The terms working or therapeutic alliance refer to the social connection or bond between therapist and patient/client, including reciprocal positive feelings, (assertive) communication, empathy, and mutual respect as well as collaboration, shared decision-making, agreement on the treatment goals and tasks [ 36 , 50 ]. The therapeutic alliance is a patient-centred approach as well. Patient-led goal setting was more often a component within efficacious interventions than therapist-led goal setting [ 17 ].

None of the included reviews focused specifically on affective interventions. However, some interventions relate to affective components, for example patient-led goal setting or motivational interviewing may cover emotional needs [ 27 ]; health coaching, therapeutic alliance or social support may include emotional support [ 13 , 34 , 35 , 38 ]; monitoring may consider emotional consequences [ 6 ]; or messaging and information provision may include emotional components [ 36 ]. Room et al. [ 40 ] included one RCT [ 51 ], comparing emotionally meaningful messages against factual informational messages, but with no significant differences between the groups.

Efficacy according to the theoretical underpinning

McGrane et al. [ 28 ] provide a narrative analysis of the efficacy of interventions according to the different theoretical underpinnings. In their review, the cognitive-behavioural theory (CBT) was the most popular theory (4 primary studies) and showed to be efficacious in improving self-efficacy and activity limitations, but not consistently regarding attendance and attrition [ 28 ]. The social-cognitive theory was used in three studies, showing improvements in self-efficacy, action and coping planning, and attendance, but conflicting results for exercising in the short and long term. One intervention [ 52 ] based on self-determination theory showed to be efficacious to improve adherence to physical activity. In contrast to McGrane et al. [ 28 ], the reviews [ 12 , 30 , 35 ] point to moderate to conflicting evidence for no or inconsistent efficacy of CBT-based approaches to physiotherapy programmes (see Efficacy of cognitive and behavioural motivational interventions ). Jordan [ 35 ] concluded that the addition of transtheoretical model-based counselling to physiotherapy is no more effective than physiotherapy and a sham intervention (GRADE: High (high quality); Silver). Notably, the interventions may not be representative of the theory described due to diverse translations of the theory into practice and the overlap of the same BCTs among the theories.

Various theories (e.g. the transtheoretical model or the Health Action Process Approach [ 53 ]) and studies [ 54 ] distinguish the action or adoption phase from the maintenance phase at 6 months. Interestingly, Willet et al. [ 17 ] found in total higher short (< 3 months) and long-term (12 months and more) than medium-term (around 6 months) efficacy ratios, pointing to the risk of drop-out when changing from the (short-term) adoption phase to the (long-term) maintenance phase [ 17 ]. Eisele et al. [ 7 ] divided in their meta-analysis the short-term (< 3 months), medium-term (3–6 months) and long-term (7–12 months post-intervention) differently, showing a small medium-term overall effect (pooled SMD 0.20, 95% CI 0.08–0.33, p  < 0.01), but no significant long-term effect of interventions comprising BCTs in enhancing physical activity adherence (pooled SMD 0.13, 95% CI 0.02–0.28, p  = 0.09).

Efficacy according to the different types of exercise, physiotherapeutic settings and medical condition

In their Cochrane review, Jordan et al. [ 35 ] compared the evidence for the efficacy of different types of exercises and physiotherapy settings. Graded exercise is beneficial for adherence (moderate evidence). The exercise type does not appear to play an important role (moderate evidence). Whether water-based exercise favours adherence is unclear (low evidence and inconsistent results). Furthermore, the supervision of exercising (moderate evidence) is beneficial for adherence, but also self-management programmes improve exercise frequency compared to waiting list or no-intervention control groups (moderate evidence). Exercising individually seems to improve attendance at exercise classes more than exercising in a group (moderate evidence), as individual sessions could be scheduled at more convenient times and missed sessions could be rescheduled, whereas group sessions were scheduled at relatively inflexible times, and missed sessions could not be rescheduled [ 35 ]. However, adding group exercise to a home exercise programme can increase overall physical activity levels (moderate evidence) [ 35 ]. While the results of home- versus clinic-based interventions were conflicting and confounded by the intervention approaches, a combination of home- and clinic-based approaches may be promising [ 12 ] and aligns with the moderate-quality evidence that self-management programmes, refresher or booster sessions with a physiotherapist assist people to better adhere to therapeutic exercise [ 16 ].

No study was identified in the reviews that compared other settings, such as private- and public-funded physiotherapy or primary care and rehabilitation settings regarding adherence outcomes. No review and no study comparing the same educational, motivational, or BCT-based intervention across different conditions were identified.

This overview of systematic reviews addresses adherence in the physiotherapy and therapeutic exercise domain, aiming to summarise the evidence on the efficacy of interventions, to explore heterogeneity and to identify research gaps. The overview of reviews provided an adequate approach to generate answers to the research questions. Nineteen reviews, covering 205 unique trials, were included and narratively synthesised. In addition, four meta-analyses were pooled in a panoramic meta-analysis. The findings provide an overview of the diverse interventions and techniques aiming to enhance adherence, ranging from informational/educational to cognitive/behavioural/motivational and to relational/psychosocial intervention types. Furthermore, it synthesised their efficacy in physiotherapy for adults.

Confidence in the reviews was rated moderate or high in four reviews [ 7 , 16 , 27 , 35 ], but low or very low in the others (Table  3 ). The individual reviews considered the evidence levels as mostly low or very low (Table  4 ; see Risk of bias and evidence assessment ). Table 5 summarizes the evidence on the efficacy of each intervention and technique according to (a) whether the evidence supports efficacy, (b) the evidence level based on the report in the systematic reviews and (c) the confidence in the reviews as assessed with AMSTAR-2. It must be noted that the components of the intervention which caused the efficacy were not always clear. Some interventions lacked detailed definitions and descriptions of the specific BCTs included [ 33 ]. A single technique or mechanism of action was not always identifiable; moreover, various techniques seem to influence each other in such a way that they achieved efficacy only jointly [ 17 , 40 ].

No clear conclusion can be drawn on the efficacy of informational/educational interventions. Five reviews [ 12 , 30 , 31 , 34 , 36 ] showed low evidence for the efficacy of interventions on knowledge acquisition and low evidence for limited short-term efficacy on adherence. Providing knowledge alone seems not enough and should be complemented with supportive material (very low evidence) and combined with other interventions (low evidence). Patient education should also include social-cognitive or cognitive-behavioural approaches, psychoeducational interventions and collaborative processes as it is included in the therapeutic alliance approach [ 31 , 34 , 36 ]. Patient education with a more constructive educational approach builds upon the knowledge of the patient, supporting him/her in exploring and co-constructing knowledge which is very relevant in physiotherapy as research has shown [ 55 , 56 ].

The reviews on additional motivational, cognitive and behavioural interventions showed findings ranging from non-efficacy of behavioural counselling based on readiness to change (with low to moderate evidence) to moderate efficacy for booster sessions and behavioural graded physical activity (with moderate evidence) (see Table  5 ). Overall, a small overall effect size (SMD 0.24) for motivational interventions is indicative of the findings of the panoramic meta-analysis. The four pooled meta-analyses [ 7 , 16 , 27 , 28 ] included studies analysing interventions with a considerable amount of content overlap (e.g. goal-setting and booster sessions are BCTs and often part of motivational interventions), and no statistical heterogeneity of the true effect was found. Nevertheless, the diversity of interventions and techniques included constrain the explanatory power for potential components responsible for the efficacy of adherence. The sensitivity analyses in the meta-analysis of Eisele et al. [ 7 ] indicate that low-quality studies tend to overestimate the efficacy (cf. [ 31 ]). While some evidence exists on short- and medium-term effects of motivational programmes on adherence, no clear evidence for long-term effects can be concluded [ 7 , 30 ]. Furthermore, there is moderate and low evidence that additional motivational interventions and goal planning/setting improve adherence to self-efficacy beliefs [ 27 , 28 , 39 ]. Since self-efficacy beliefs play an important role in motivation and adherence [ 13 , 48 ], the results are relevant for physiotherapists to promote motivation and adherence. Experiencing that one can reach the set goals and manage daily challenges, complemented with feedback and reinforcement from the therapist (or important others), may increase self-efficacy beliefs and human agency [ 48 , 57 , 58 , 59 ].

A closer look at how and in which manner goals and actions are planned and reviewed seems crucial. The patient-led approach was only reported in 5 of the 26 interventions that incorporated the BCT goal setting (behaviour) , although it is associated with greater engagement and achievement than goals which are set by the therapist [ 17 ]. Goal setting and action planning should be informed by the patient’s motives, interests and values in order to promote intrinsic motivation, self-determination and subsequently better adherence ([ 17 ],cf. [ 27 , 28 , 60 , 61 ]). The reviews on the BCTs displayed various positive effects relating to the BCT cluster goals and planning ; however, they point out that the BCT goal setting is not used alone but in connection with several other BCTs. Feedback on outcomes of behaviour , behavioural contract and non-specific reward as well as patient led-goal setting , self-monitoring of behaviour and social support (unspecified) was included in efficacious interventions [ 17 ]. Social support seems to have an important influence on adherence [ 6 , 7 , 17 , 40 ], for example through regular phone-calls or home visits, encouraging messaging, supervision or community-based group programs (cf. [ 1 , 2 , 3 ],[ 37 , 62 ]). Social support also relates to the promotion of self-efficacy beliefs, if it endorses confidence in own abilities and competences [ 6 ].

Some BCTs seem inherent to standard practices of physiotherapy [ 6 ] even though physiotherapists seem to use rather a small number of BCTs [ 15 ]. Control groups also contained BCTs [ 6 , 7 ]; in particular instruction on how to perform a behaviour , generalisation of the target behaviour and social support (unspecified) were frequently coded [ 6 ]. Thus, it seems difficult to identify those BCTs that are (most) efficacious in promoting adherence ([ 7 ],cf. [ 50 ]). Unsurprisingly, the reviews revealed conflicting results and a high risk of bias in the individual studies. However, combining a greater number of BCTs (≥ 8) can be highly recommended, as this achieved a larger effect than interventions using fewer BCTs [ 7 ]. It is fairly unlikely that any single BCT changes adherence [ 6 , 7 , 17 , 40 ]. In that regard, Ariie et al. [ 63 ] argue that not only the amount of BCTs but also the quality, appropriateness and feasibility of the use of the BCTs is crucial.

Meaningful combinations of several BCTs are required. However, the combinations of BCTs may also differ among conditions, personal factors and therapeutic interventions ([ 7 ],cf. [ 63 , 64 ], [ 64 , 65 , 66 ]), and over the time. Two reviews consistently point to the same crucial time point (i.e. after 6 months) when BCT efficacy seems to drop, and more attention is required to maintain adherence [ 7 , 17 ]. Action planning , feedback on behaviour and behavioural practice/rehearsal seem efficacious particularly on short-term. Patient led-goal setting , self-monitoring of behaviour and social support (unspecified) are among those BCTs that seem more efficacious at long-term [ 17 ]. These findings are also in line with findings in non-clinical adults [ 54 ] and with motivational theories (e.g. the Health Action Process Approach [ 53 ]).

Limitations

Conducting an overview of reviews is per se associated with methodological limitations. A limitation is that reviews were analysed and not the original RCTs, which adds further risks of bias domains such as selection, analysis and reporting bias. A specific potential source of bias in overviews of reviews is the overlap of primary studies among the included reviews. The small overlap, caused by a few reviews with similar thematic scope, was controlled for in the data analysis. The substantial non-overlap of primary studies across the reviews reflects the clinical and methodological diversity of the included reviews and showcases the efforts to address (a) motivation and (non-)adherence as complex phenomena and from various perspectives.

Another methodological limitation originates from the search strategies. Considering different health-care systems and delimitations of the physiotherapy profession among countries, divergences among the definitions of terms and the use of diverse approaches to physical therapy, physiotherapy or the therapeutic use of exercise and physical activity, made a clear delimitation in the search strategy and inclusion/exclusion criteria difficult. Therefore, we may have missed out some relevant reviews by reducing our search to the two terms physiotherapy and physical therapy. Equally, we may also have included some aspects that were not primarily investigated for physiotherapists or physical therapists. Including only studies with adults, the findings may not be applicable to promote adherence among children.

While we did not exclude reviews from another language, the search was conducted only in English, which may omit important reviews in other languages. All included reviews (and as far as reported, also the original RCTs) were conducted in economically developed countries; however, social-cultural and context-specific factors influence participation and adherence [ 67 , 68 , 69 , 70 , 71 ]. Furthermore, we are aware that our own cultural background and experiences may have influenced the analysis and synthesis of the results and that conclusions drawn in this overview of reviews may not be suitable for every setting around the world. Therefore, we encourage the readers to critically assess the applicability of the findings to their specific context.

Another gap in coverage of this overview is that interventions that were analysed in RCTs but not included in any systematic review are not considered in this overview. Thus, there may be new or alternative intervention approaches that resulted efficacious but were not covered by this overview. Furthermore, reviews that focused only on the use of digital apps or tools, e.g. virtual reality, gamification, exergames or tele-rehabilitation, were excluded from this overview. Several reviews in this field include adherence-related outcomes, showing potential efficacy as well as limitations of the use of digital tools [ 72 , 73 , 74 , 75 , 76 , 77 , 78 , 79 , 80 , 81 , 82 , 83 ].

Research gaps, recommendations and measuring adherence

This overview of reviews highlighted some gaps in the existing knowledge. First, there is a lack of clear evidence on the efficacy of the interventions. The use of BCTs in the intervention as well as in the control groups may be a reason for inconsistent findings and conflicting evidence. Furthermore, the clinical and methodological heterogeneity constrains drawing clear conclusions on the efficacy. Second (and related to the previous), interventions are insufficiently described regarding their theoretical underpinning and active ingredients/techniques and thus limit the comparison of interventions. Theoretical underpinnings were used partly and translated into practise differently. Difficulties concerning the derivation or deduction of concrete, practical techniques or strategies from the theories were reported. A broader use of the BCT taxonomies would make interventions more comparable. Recently, the BCT Ontology was published, which claims to provide a standard terminology and a comprehensive classification system for the content of behaviour change interventions, suitable for describing interventions [ 84 ]. Third, there is a need for studies on holistic approaches, complex interventions based on integrative theories and the combination of multiple BCTs. While many theories are based on cognitive and behavioural approaches, affective and psychosocial factors are hardly investigated, overlooked and probably underestimated. Rhodes and Fiala [ 12 ] call for studying the influences of affective attitudes on adherence (e.g. enjoyment and pleasing behaviour) which may oppose the more cognitive, instrumental attitudes (e.g. the utility of behaviour). Jordan et al. [ 35 ] refer to a meta-analysis in another therapeutic regime [ 85 ] to explicit the potential efficacy of affective interventions (e.g. appealing to feelings, emotions or social relationships and social supports) in combination with educational and behavioural interventions on patient adherence [ 35 ]. Fourth, more research in patient-led approaches to goal setting and action planning and the relationship of patient-centeredness to adherence is promising [ 60 , 61 , 86 , 87 ].

Fifth, the reviews reported many studies that failed to provide evidence for intervention efficacy on adherence, particularly on long-term adherence. There is a need for prolonged observation to investigate long-term effects on adherence. Probably, intervention or follow-up interventions (e.g. booster sessions) must also be prolonged or repeated to avoid drop out to medium-term follow-ups (around 6 months) and to maintain participation. Sixth, studies should pay more attention to the actual efficacy of adherent behaviour on the desired therapeutic outcomes.

Seventh, another research gap lies in the analysis of the potential variation of the intervention efficacy across medical conditions, physiotherapeutic settings, personal characteristics (e.g. age, gender, sociocultural background) and dispositions (e.g. motives, affective attitudes, previous behaviour) and diverse context-related factors. Huynh et al. [ 79 ] showed for the case of multiple sclerosis that the efficacy of BCTs is not investigated in all disease stages or throughout the disease course; participants with mild-to-moderate level disability were more frequently included in the studies (cf. [ 18 ]). Ariie et al. [ 73 ] stated that the response to BCTs may be different according to the condition (cf. [ 76 ]). On the one hand, studies analysing the use of the same intervention or same combination of BCTs in different intervention groups (according to the categories mentioned above) could be beneficial for comparison purposes. On the other hand, studies should analyse how to find the ‘right’ (ideally, the ‘most efficacious’) adherence promotion intervention for the patient or target group. Qualitative studies may explore adequate combinations of BCTs and contribute to the understanding of complex intervention processes. The findings showcased that different interventions and BCTs may contribute to adherence and that the BCT Taxonomy defines a wide range of techniques, providing the physiotherapists with an overview of which techniques are useable and thus may inspire and support them to develop additional interventions and to enrich their current physiotherapeutic practise. The physiotherapist may use this knowledge to tailor interventions in a patient-centred manner to promote adherence, and to adapt to the condition, characteristics, dispositions and context-related factors of the patient. Hence, experimental studies could compare the efficacy of tailored to not-tailored interventions.

Finally, the outcome adherence should be better defined and holistically assessed. The definition of adherence (as the extent to which a person’s behaviour corresponds with treatment goals or plans) and calculation of adherence rates (by reported exercise or attended sessions divided by the recommended or prescribed exercise or sessions) are simplifying a complex phenomenon. The average or the percentages of attended or completed sessions do not picture interruptions, regularity or periods of more and less adherence. Attendance regularity can change over the time and different participation and fluctuation patterns can be identified [ 88 , 89 ]. For example, an adherence rate of 50% can imply (a) that a person attended regularly every second session throughout the period of observation or (b) that a person attended all sessions of the first half of the observation period and then stopped attending. The underlying reasons and motivational factors may be quite different in these two cases. Besides assessing participation and fluctuation patterns, the three dimensions of the SIRAS scale [ 42 ], i.e. frequency, intensity and reciprocity, could be considered for a holistic account of adherence. The findings of this overview emphasized the importance of a patient-led goal setting and planning, which includes a shared decision-making process and the mutual agreement to adhere to the jointly established plan (cf. WHO definition of adherence, [ 5 ]). The measurement of adherence should be able to distinguish a patient-led approach from a therapist-led approach (cf. [ 17 ]) and to appraise the extent of a shared decision-making process. In conclusion, a holistic approach to measure adherence in physiotherapy may include measures of the frequency of attendance/exercising (e.g. attended sessions out of the prescribed/recommended sessions), the regularity of participation and fluctuation (e.g. timeline with pauses and interruptions, visualizing more and less adherent periods), the intensity of attendance/exercising (e.g. the number or the increment of exercises and repetitions performed in comparison to the plan), reciprocity and fidelity to the agreed goals and plan (e.g. therapist’s and patient’s subjective appraisal of the degree of accomplishment of the agreed plan) and persistence/perseverance over the time (e.g. measuring volition via questionnaires or rating persistence in participation in spite of the experienced challenges and barriers).

We conclude that moderate certainty of evidence supports that (i) additional motivational interventions and behaviour change programmes can increase adherence and patients’ self-efficacy beliefs and (ii) interventions applying BCTs increase adherence, particularly when using a greater number of BCTs and combining various BCTs, and particularly on short to medium term. The BCTs’ patient-led goal setting , self-monitoring of behaviour and social support seem promising to promote maintenance; (iii) graded activities, booster sessions with a physiotherapist and supervision foster adherence.

There is low certainty of evidence that (i) goal setting and planning improves adherence to treatment regimens, particularly if a patient-centred approach is taken; (ii) motivational interventions including various techniques, such as positive reinforcement, social support, monitoring or feedback, can foster adherence; (iii) social support seems to play an important role in promoting adherence; however, evidence is low as this BCT is frequently found in the control group; and (iv) information provision and transfer of knowledge to the patient may improve adherence-related outcomes when combined with motivational techniques, as in psychoeducational programmes. Additional written information is superior to verbal instructions alone; (v) a combination of home-based exercise programmes with clinical supervision, refresher or booster sessions, or/and self-management programmes seems promising to increase adherence.

Regarding the implications for future research, a holistic approach to measure adherence in physiotherapy and the investigation of clearly defined interventions combining multiple BCTs is recommended.

Availability of data and materials

All data generated or analysed during this study are included in this published article and its supplementary information files.

Overview of reviews, umbrella review and reviews of reviews are considered as synonyms in this article (cf. [ 19 ]).

Abbreviations

Behaviour change technique

Cognitive behavioural/cognitive behavioural theory

Control/comparator group

Grades of Recommendation, Assessment, Development and Evaluation

Intervention/experimental group

Physical activity

Preferred Reporting Items for Overviews of Reviews

Preferred Reporting Items for Systematic Reviews and Meta-Analysis

Physiotherapy

Randomised controlled trial

Standardised mean difference

Systematic review

Essery R, Geraghty AW, Kirby S, Yardley L. Predictors of adherence to home-based physical therapies: a systematic review. Disabil Rehabil. 2017;39:519–34.

Article   PubMed   Google Scholar  

Jack K, McLean SM, Moffett JK, Gardiner E. Barriers to treatment adherence in physiotherapy outpatient clinics: a systematic review. Man Ther. 2010;15:220–8.

Article   PubMed   PubMed Central   Google Scholar  

Peek K, Sanson-Fisher R, Mackenzie L, Carey M. Interventions to aid patient adherence to physiotherapist prescribed self-management strategies: a systematic review. Physiotherapy. 2016;102:127–35.

Bullard T, Ji M, An R, Trinh L, Mackenzie M, Mullen SP. A systematic review and meta-analysis of adherence to physical activity interventions among three chronic conditions: cancer, cardiovascular disease, and diabetes. BMC Public Health. 2019;19:636.

World Health Organization. Adherence to long-term therapies: evidence for action. World Health Organization; 2003. Available from: https://apps.who.int/iris/handle/10665/42682

Thacker J, Bosello F, Ridehalgh C. Do behaviour change techniques increase adherence to home exercises in those with upper extremity musculoskeletal disorders? A systematic review. Musculoskeletal care. 2020;19(3):340-62.

Eisele A, Schagg D, Kramer L, Bengel J, Gohner W. Behaviour change techniques applied in interventions to enhance physical activity adherence in patients with chronic musculoskeletal conditions: a systematic review and meta-analysis. Patient Educ Couns. 2019;102:25–36.

Michie S, Richardson M, Johnston M, Abraham C, Francis J, Hardeman W, et al. The behavior change technique taxonomy (v1) of 93 hierarchically clustered techniques: building an international consensus for the reporting of behavior change interventions. Ann Behav Med. 2013;46:81–95.

Davis R, Campbell R, Hildon Z, Hobbs L, Michie S. Theories of behaviour and behaviour change across the social and behavioural sciences: a scoping review. Health Psychol Rev. 2015;9:323–44.

Michie S, Johnston M. Theories and techniques of behaviour change: developing a cumulative science of behaviour change. Health Psychol Rev. 2012;6:1–6.

Article   Google Scholar  

Rhodes RE, McEwan D, Rebar AL. Theories of physical activity behaviour change: a history and synthesis of approaches. Psychol Sport Exerc. 2019;42:100–9.

Rhodes RE, Fiala B. Building motivation and sustainability into the prescription and recommendations for physical activity and exercise therapy: the evidence. Physiother Theory Pract. 2009;25:424–41.

Areerak K, Waongenngarm P, Janwantanakul P. Factors associated with exercise adherence to prevent or treat neck and low back pain: a systematic review. Musculoskeletal Science and Practice. 2021;52.

Husebø AML, Dyrstad SM, Søreide JA, Bru E. Predicting exercise adherence in cancer patients and survivors: a systematic review and meta-analysis of motivational and behavioural factors. J Clin Nurs. 2013;22:4–21.

Kunstler BE, Cook JL, Freene N, Finch CF, Kemp JL, O’Halloran PD, et al. Physiotherapists use a small number of behaviour change techniques when promoting physical activity: a systematic review comparing experimental and observational studies. J Sci Med Sport. 2018;21:609–15.

Nicolson PJA, Bennell KL, Dobson FL, Van Ginckel A, Holden MA, Hinman RS. Interventions to increase adherence to therapeutic exercise in older adults with low back pain and/or hip/knee osteoarthritis: a systematic review and meta-analysis. Br J Sports Med. 2017;51:791–9.

Willett M, Duda J, Fenton S, Gautrey C, Greig C, Rushton A. Effectiveness of behaviour change techniques in physiotherapy interventions to promote physical activity adherence in lower limb osteoarthritis patients: a systematic review. Regnaux J-P, editor. PLoS ONE. 2019;14:e0219482.

Kim Y, Mehta T, Lai B, Motl RW. Immediate and sustained effects of interventions for changing physical activity in people with multiple sclerosis: meta-analysis of randomized controlled trials. Arch Phys Med Rehabil. 2020;101:1414–36.

Pollock M, Fernandes R, Becker L, Pieper D, Hartling L. Chapter V: overviews of reviews. In: Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane Handbook for Systematic Reviews of Interventions version 63 (updated February 2022). Cochrane; 2022 [cited 2022 May 19]. Available from: https://training.cochrane.org/handbook/current/chapter-v

Aromataris E, Fernandez R, Godfrey C, Holly C, Khalil H, Tungpunkom P. Chapter 10: umbrella reviews. In: Aromataris E, Munn Z, editors. JBI Manual for Evidence Synthesis. JBI; 2020 [cited 2021 Apr 19]. Available from: https://jbi-global-wiki.refined.site/space/MANUAL/4687363/Chapter+10%3A+Umbrella+reviews

Ballard M, Montgomery P. Risk of bias in overviews of reviews: a scoping review of methodological guidance and four-item checklist. Res Synth Methods. 2017;8:92–108.

Centre for Reviews and Dissemination. Undertaking systematic reviews of research on effectiveness: CRD’s guidance for carrying out or commissioning reviews. York, UK: NHSCentre for Reviews and Dissemination, University of York; 2001 [cited 2023 Feb 20]. Available from: http://www.york.ac.uk/inst/crd/crdreports.htm

Higgins J, Thomas J, Chandler J, Cumpston M, Li T, Page M, et al., editors. Cochrane Handbook for Systematic Reviews of Interventions version 6.3 (updated February 2022). Cochrane; 2022 [cited 2022 May 19]. Available from: www.training.cochrane.org/handbook

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372: n71.

Furlan AD, Malmivaara A, Chou R, Maher CG, Deyo RA, Schoene M, et al. 2015 Updated Method Guideline for Systematic Reviews in the Cochrane Back and Neck Group. Spine (Phila Pa 1976). 2015;40:1660–73.

Shea BJ, Reeves BC, Wells G, Thuku M, Hamel C, Moran J, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017;358:j4008.

Levack WMM, Weatherall M, Hay-Smith EJC, Dean SG, Mcpherson K, Siegert RJ. Goal setting and strategies to enhance goal pursuit for adults with acquired disability participating in rehabilitation. Cochrane Database of Systematic Reviews. 2015;2015.

McGrane N, Galvin R, Cusack T, Stokes E. Addition of motivational interventions to exercise and traditional Physiotherapy: a review and meta-analysis. Physiotherapy. 2015;101:1–12.

Article   CAS   PubMed   Google Scholar  

Gates M, Gates A, Pieper D, Fernandes RM, Tricco AC, Moher D, et al. Reporting guideline for overviews of reviews of healthcare interventions: development of the PRIOR statement. BMJ. 2022;378: e070849.

McLean SM, Burton M, Bradley L, Littlewood C. Interventions for enhancing adherence with physiotherapy: a systematic review. Man Ther. 2010;15:514–21.

Niedermann K, Fransen J, Knols R, Uebelhart D. Gap between short- and long-term effects of patient education in rheumatoid arthritis patients: a systematic review. Arthritis Care Res. 2004;51:388–98.

Cinthuja P, Krishnamoorthy N, Shivapatham G. Effective interventions to improve long-term physiotherapy exercise adherence among patients with lower limb osteoarthritis. A systematic review BMC Musculoskelet Disord. 2022;23:147.

Beinart NA, Goodchild CE, Weinman JA, Ayis S, Godfrey EL. Individual and intervention-related factors associated with adherence to home exercise in chronic low back pain: a systematic review. The Spine Journal. 2013;13:1940–50.

Holden J, Davidson M, O’Halloran PD. Health coaching for low back pain: a systematic review of the literature. Int J Clin Pract. 2014;68:950–62.

Jordan JL, Holden MA, Mason EE, Foster NE. Interventions to improve adherence to exercise for chronic musculoskeletal pain in adults. Cochrane Database Syst Rev. 2010;CD005956.

Abu Abed M, Himmel W, Vormfelde S, Koschack J. Video-assisted patient education to modify behavior: a systematic review. Patient Educ Couns. 2014;97:16–22.

Bachmann C, Oesch P, Bachmann S. Recommendations for improving adherence to home-based exercise: a systematic review. Phys Med Rehab Kuror. 2018;28:20–31.

Hall AM, Ferreira PH, Maher CG, Latimer J, Ferreira ML. The influence of the therapist-patient relationship on treatment outcome in physical rehabilitation: a systematic review. Phys Ther. 2010;90:1099–110.

Levack WMM, Taylor K, Siegert RJ, Dean SG, McPherson KM, Weatherall M. Is goal planning in rehabilitation effective? A systematic review Clin Rehabil. 2006;20:739–55.

Room J, Hannink E, Dawes H, Barker K. What interventions are used to improve exercise adherence in older people and what behavioural techniques are they based on? A systematic review BMJ Open. 2017;7: e019221.

Dalager T, Bredahl TGV, Pedersen MT, Boyle E, Andersen LL, Sjøgaard G. Does training frequency and supervision affect compliance, performance and muscular health? A cluster randomized controlled trial. Man Ther. 2015;20:657–65.

Kolt GS, Brewer BW, Pizzari T, Schoo AMM, Garrett N. The Sport Injury Rehabilitation Adherence Scale: a reliable scale for use in clinical physiotherapy. Physiotherapy. 2007;93:17–22.

Green LW. Determining the impact and effectiveness of health education as it relates to federal policy. Health Educ Monogr. 1978;6:28–66.

Google Scholar  

Michie S, van Stralen MM, West R. The behaviour change wheel: a new method for characterising and designing behaviour change interventions. Implement Sci. 2011;6:42.

Pisters MF, Veenhof C, de Bakker DH, Schellevis FG, Dekker J. Behavioural graded activity results in better exercise adherence and more physical activity than usual care in people with osteoarthritis: a cluster-randomised trial. J Physiother. 2010;56:41–7.

Göhner W, Schlicht W. Preventing chronic back pain: evaluation of a theory-based cognitive-behavioural training programme for patients with subacute back pain. Patient Educ Couns. 2006;64:87–95.

Bandura A. Toward a psychology of human agency: pathways and reflections. Perspect Psychol Sci. 2018;13:130–6.

Ashford S, Edmunds J, French DP. What is the best way to change self-efficacy to promote lifestyle and recreational physical activity? A systematic review with meta-analysis. Br J Health Psychol. 2010;15:265–88.

Frost H, Campbell P, Maxwell M, O’Carroll RE, Dombrowski SU, Williams B, et al. Effectiveness of motivational interviewing on adult behaviour change in health and social care settings: a systematic review of reviews. PLoS ONE. 2018;13: e0204890.

Michie S, West R, Sheals K, Godinho CA. Evaluating the effectiveness of behavior change techniques in health-related behavior: a scoping review of methods used. Translational Behavioral Medicine. 2018;8:212–24.

Gallagher KM. Helping older adults sustain their physical therapy gains: a theory-based intervention to promote adherence to home exercise following rehabilitation. Journal of Geriatric Physical Therapy. 2016;39:20–9.

Silva MN, Vieira PN, Coutinho SR, Minderico CS, Matos MG, Sardinha LB, et al. Using self-determination theory to promote physical activity and weight control: a randomized controlled trial in women. J Behav Med. 2010;33:110–22.

Schwarzer R, Lippke S, Luszczynska A. Mechanisms of health behavior change in persons with chronic illness or disability: the Health Action Process Approach (HAPA). Rehabil Psychol. 2011;56:161–70.

Murray JM, Brennan SF, French DP, Patterson CC, Kee F, Hunter RF. Effectiveness of physical activity interventions in achieving behaviour change maintenance in young and middle aged adults: a systematic review and meta-analysis. Soc Sci Med. 2017;192:125–33.

Areskoug Josefsson K, Andersson A-C. The co-constructive processes in physiotherapy. Lee A, editor. Cogent Medicine. 2017;4:1290308.

Qasem M. Constructivist learning theory in physiotherapy education: a critical evaluation of research. Journal of Novel Physiotherapies. 2015;5.

Brinkman C, Baez SE, Genoese F, Hoch JM. Use of goal setting to enhance self-efficacy after sports-related injury: a critically appraised topic. J Sport Rehabil. 2019;29:498–502.

Fillipas S, Oldmeadow LB, Bailey MJ, Cherry CL. A six-month, supervised, aerobic and resistance exercise program improves self-efficacy in people with human immunodeficiency virus: a randomised controlled trial. Australian Journal of Physiotherapy. 2006;52:185–90.

Ley C, Karus F, Wiesbauer L, Rato Barrio M, Spaaij R. Health, integration and agency: sport participation experiences of asylum seekers. J Refug Stud. 2021;34:4140–60.

Melin J, Nordin Å, Feldthusen C, Danielsson L. Goal-setting in physiotherapy: exploring a person-centered perspective. Physiother Theory Pract. 2021;37:863–80.

Wijma AJ, Bletterman AN, Clark JR, Vervoort SC, Beetsma A, Keizer D, et al. Patient-centeredness in physiotherapy: what does it entail? A systematic review of qualitative studies. Physiother Theory Pract. 2017;33:825–40.

Meade LB, Bearne LM, Sweeney LH, Alageel SH, Godfrey EL. Behaviour change techniques associated with adherence to prescribed exercise in patients with persistent musculoskeletal pain: systematic review. Br J Health Psychol. 2019;24:10–30.

Ariie T, Takasaki H, Okoba R, Chiba H, Handa Y, Miki T, et al. The effectiveness of exercise with behavior change techniques in people with knee osteoarthritis: a systematic review with meta-analysis. PM R. 2022;

Demmelmaier I, Iversen MD. How are behavioral theories used in interventions to promote physical activity in rheumatoid arthritis? A systematic review. Arthritis Care Res. 2018;70:185–96.

Larkin L, Gallagher S, Cramp F, Brand C, Fraser A, Kennedy N. Behaviour change interventions to promote physical activity in rheumatoid arthritis: a systematic review. Rheumatol Int. 2015;35:1631–40.

Rausch Osthoff A-K, Juhl CB, Knittle K, Dagfinrud H, Hurkmans E, Braun J, et al. Effects of exercise and physical activity promotion: meta-analysis informing the 2018 EULAR recommendations for physical activity in people with rheumatoid arthritis, spondyloarthritis and hip/knee osteoarthritis. RMD Open. 2018;4: e000713.

Armstrong TL, Swartzman LC. 3 - cross-cultural differences in illness models and expectations for the health care provider-client/patient interaction. In: Shané S. Kazarian, David R. Evans, editors. Handbook of Cultural Health Psychology. San Diego: Academic Press; 2001 [cited 2013 Aug 20]. p. 63–84. Available from: http://www.sciencedirect.com/science/article/pii/B9780124027718500052

Brady B, Veljanova I, Chipchase L. Culturally informed practice and physiotherapy. J Physiother. 2016;62:121–3.

Jimenez DE, Burrows K, Aschbrenner K, Barre LK, Pratt SI, Alegria M, et al. Health behavior change benefits: perspectives of Latinos with serious mental illness. Transcult Psychiatry. 2016;53:313–29.

Jorgensen P. Concepts of body and health in physiotherapy: the meaning of the social/cultural aspects of life. Physiother Theory Pract. 2000;16:105–15.

Teng B, Rosbergen ICM, Gomersall S, Hatton A, Brauer SG. Physiotherapists’ experiences and views of older peoples’ exercise adherence with respect to falls prevention in Singapore: a qualitative study. Disabil Rehabil. 2022;44:5530–8.

Alfieri FM, da Silva DC, de Oliveira NC, Battistella LR. Gamification in musculoskeletal rehabilitation. Curr Rev Musculoskelet Med. 2022;15:629–36.

Cox NS, Dal Corso S, Hansen H, McDonald CF, Hill CJ, Zanaboni P, et al. Telerehabilitation for chronic respiratory disease. Cochrane Database Syst Rev. 2021;1:CD013040.

Cruz-Cobo C, Bernal-Jiménez MÁ, Vázquez-García R, Santi-Cano MJ. Effectiveness of mHealth interventions in the control of lifestyle and cardiovascular risk factors in patients after a coronary event: systematic review and meta-analysis. JMIR Mhealth Uhealth. 2022;10: e39593.

Darabseh MZ, Aburub A, Davies S. The effects of virtual reality physiotherapy interventions on cardiopulmonary function and breathing control in cystic fibrosis: a systematic review. Games Health J. 2023;12:13–24.

Fernandes CS, Magalhães B, Gomes JA, Santos C. Exergames to improve rehabilitation for shoulder injury: Systematic Review and GRADE Evidence Synthesis. REHABIL NURS. 2022;47:147–59.

García-Bravo S, Cuesta-Gómez A, Campuzano-Ruiz R, López-Navas MJ, Domínguez-Paniagua J, Araújo-Narváez A, et al. Virtual reality and video games in cardiac rehabilitation programs. A systematic review Disabil Rehabil. 2021;43:448–57.

Hawley-Hague H, Lasrado R, Martinez E, Stanmore E, Tyson S. A scoping review of the feasibility, acceptability, and effects of physiotherapy delivered remotely. Disability and Rehabilitation. 2022;

Melillo A, Chirico A, De Pietro G, Gallo L, Caggianese G, Barone D, et al. Virtual reality rehabilitation systems for cancer survivors: a narrative review of the literature. Cancers. 2022;14.

Moulaei K, Sheikhtaheri A, Nezhad MS, Haghdoost A, Gheysari M, Bahaadinbeigy K. Telerehabilitation for upper limb disabilities: a scoping review on functions, outcomes, and evaluation methods. Arch Public Health. 2022;80:196.

Patsaki I, Dimitriadi N, Despoti A, Tzoumi D, Leventakis N, Roussou G, et al. The effectiveness of immersive virtual reality in physical recovery of stroke patients: a systematic review. Frontiers in Systems Neuroscience. 2022;16.

Skov Schacksen C, Henneberg NC, Muthulingam JA, Morimoto Y, Sawa R, Saitoh M, et al. Effects of telerehabilitation interventions on heart failure management (2015–2020): scoping review. JMIR Rehabil Assist Technol. 2021;8: e29714.

Thompson D, Rattu S, Tower J, Egerton T, Francis J, Merolli M. Mobile app use to support therapeutic exercise for musculoskeletal pain conditions may help improve pain intensity and self-reported physical function: a systematic review. J Physiother. 2023;69:23–34.

Marques MM, Wright AJ, Corker E, Johnston M, West R, Hastings J, et al. The behaviour change technique ontology: transforming the behaviour change technique taxonomy v1. Wellcome Open Res. 2023;8:308.

Roter DL, Hall JA, Merisca R, Nordstrom B, Cretin D, Svarstad B. Effectiveness of interventions to improve patient compliance: a meta-analysis. Med Care. 1998;36:1138–61.

Hansen LS, Præstegaard J, Lehn-Christiansen S. Patient-centeredness in physiotherapy–a literature mapping review. Physiotherapy theory and practice. 2022;38(12):1843-56.

Robinson JH, Callister LC, Berry JA, Dearing KA. Patient-centered care and adherence: definitions and applications to improve outcomes. J Am Acad Nurse Pract. 2008;20:600–7.

Seelig H, Fuchs R. Physical exercise participation: a continuous or categorical phenomenon? Psychol Sport Exerc. 2011;12:115–23.

Shang B, Duan Y, Huang WY, Brehm W. Fluctuation – a common but neglected pattern of physical activity behaviour: an exploratory review of studies in recent 20 years. European Journal of Sport Science. 2018;18(2):266-78.

Download references

Acknowledgements

Not applicable

No funding was received.

Author information

Authors and affiliations.

Department Health Sciences, Physiotherapy, FH Campus Wien University of Applied Sciences, Favoritenstrasse 226, 1100, Vienna, Austria

Clemens Ley

Department Health Sciences, Competence Center INDICATION, FH Campus Wien, University of Applied Sciences, Favoritenstrasse 226, 1100, Vienna, Austria

You can also search for this author in PubMed   Google Scholar

Contributions

CL and PP conceived and designed the review. CL did the database search and data extraction. CL and PP did screening and quality assessment. CL did the narrative synthesis and drafted the manuscript. PP conducted the panoramic meta-analysis and critically revised and substantially contributed throughout the writing of the manuscript. The authors read and approved the final manuscript.

Corresponding author

Correspondence to Clemens Ley .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:.

Search details

Additional file 2:

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Ley, C., Putz, P. Efficacy of interventions and techniques on adherence to physiotherapy in adults: an overview of systematic reviews and panoramic meta-analysis. Syst Rev 13 , 137 (2024). https://doi.org/10.1186/s13643-024-02538-9

Download citation

Received : 29 November 2023

Accepted : 17 April 2024

Published : 21 May 2024

DOI : https://doi.org/10.1186/s13643-024-02538-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Umbrella review
  • Physical therapy
  • Rehabilitation
  • Behaviour change techniques
  • Effectiveness

Systematic Reviews

ISSN: 2046-4053

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

systematic review quantitative synthesis

Intersections of Intimate Partner Violence and Natural Disasters: A Systematic Review of the Quantitative Evidence

Affiliations.

  • 1 Griffith University, QLD, Australia.
  • 2 Climate Action Beacon, Griffith University, QLD, Australia.
  • 3 Disruption Violence Beacon, Griffith University, QLD, Australia.
  • PMID: 38770897
  • DOI: 10.1177/15248380241249145

Natural disasters and extreme weather events are increasing in both intensity and frequency. Emerging evidence suggests that there is a relationship between intimate partner violence (IPV) and natural disasters. However, there is a scarcity of methodologically sound research in this area with no systematic review to date. To address the gap, this paper systematically assesses the quantitative evidence on the association between IPV with natural disasters between 1990 and March 2023. There were 27 articles that meet the inclusion criteria for the data extraction process. A quantitative critical appraisal tool was used to assess the quality of each study and a narrative synthesis approach to explore the findings. The review found an association between IPV and disasters, across disaster types and countries. However, more research is needed to explore the nuances and gaps within the existing knowledge base. It was unclear whether this relationship was causal or if natural disasters heightened existing risk factors. Further, it is inconclusive as to whether disasters create new cases of IPV or exacerbate existing violence. The majority of studies focused on hurricanes and earthquakes with a dearth of research on "slow onset disasters." These gaps represent the need for further research. Further research can provide a more thorough understanding of IPV and natural disasters, increasing stakeholders' ability to strengthen community capacity and reduce IPV when natural disasters occur.

Keywords: intimate partner violence; natural disasters; systematic review.

Publication types

IMAGES

  1. [PDF] How to Write a Systematic Review : A Step-by-Step Guide

    systematic review quantitative synthesis

  2. Systematic reviews

    systematic review quantitative synthesis

  3. Systematic Literature Review Methodology

    systematic review quantitative synthesis

  4. What is a Systematic Review

    systematic review quantitative synthesis

  5. How to Conduct a Systematic Review

    systematic review quantitative synthesis

  6. (PDF) Synthesizing Quantitative and Qualitative Studies in Systematic

    systematic review quantitative synthesis

VIDEO

  1. Statistical Procedure in Meta-Essentials

  2. Systematic Literature Review

  3. Literature Review

  4. Introduction to Evidence Synthesis

  5. Systematic Literature Review and Meta Analysis(literature review)(quantitative analysis)

  6. ESMARConf2023: Presentations 5: Quantitative synthesis 2

COMMENTS

  1. Quantitative Synthesis—An Update

    Quantitative synthesis is a key method for Comparative Effective Reviews, but it can be challenging to apply it consistently and transparently. This guide offers practical recommendations and examples for conducting synthesis based on the experience of AHRQ Evidence-based Practice Centers. Learn how to choose and report synthesis methods for different types of evidence and questions.

  2. Synthesising quantitative evidence in systematic reviews of complex

    Quantitative syntheses of studies on the effects of complex health interventions face high diversity across studies and limitations in the data available. Statistical and non-statistical approaches are available for tackling intervention complexity in a synthesis of quantitative data in the context of a systematic review.

  3. Synthesising quantitative evidence in systematic reviews of complex

    To help systematic review and guideline development teams decide how to address this complexity in syntheses of quantitative evidence, we summarise considerations and methods in tables 1 and 2. We close with the important remark that quantitative synthesis is not always a desirable feature of a systematic review.

  4. Systematic Reviews & Evidence Synthesis Methods

    You may have a qualitative synthesis, a quantitative synthesis, or both. ... Whether or not your Systematic Review includes a full meta-analysis, there is typically some element of data analysis. The quantitative synthesis combines and analyzes the evidence using statistical techniques. This includes comparing methodological similarities and ...

  5. Improving Conduct and Reporting of Narrative Synthesis of Quantitative

    Introduction Reliable evidence syntheses, based on rigorous systematic reviews, provide essential support for evidence-informed clinical practice and health policy. Systematic reviews should use reproducible and transparent methods to draw conclusions from the available body of evidence. Narrative synthesis of quantitative data (NS) is a method commonly used in systematic reviews where it may ...

  6. A review of the quantitative effectiveness evidence synthesis methods

    A systematic review is a separate synthesis within a guideline that systematically collates all evidence on a specific research question of interest in the literature. Systematic reviews of quantitative effectiveness, cost-effectiveness evidence and decision modelling reports were all included as relevant. Qualitative reviews, field reports ...

  7. Meta-analysis and the science of research synthesis

    Meta-analysis is the quantitative, scientific synthesis of research results. Since the term and modern approaches to research synthesis were first introduced in the 1970s, meta-analysis has had a ...

  8. Systematic Reviews & Evidence Synthesis Methods

    We use the term evidence synthesis to better reflect the breadth of methodologies that we support, including systematic reviews, scoping reviews, evidence gap maps, umbrella reviews, meta-analyses and others. Note: Librarians at UC Irvine Libraries have supported systematic reviews and related methodologies in STEM fields for several years.

  9. Quantitative evidence synthesis: a practical guide on meta-analysis

    Evidence synthesis is an essential part of science. The method of systematic review provides the most trusted and unbiased way to achieve the synthesis of evidence [1,2,3].Systematic reviews often include a quantitative summary of studies on the topic of interest, referred to as a meta-analysis (for discussion on the definitions of 'meta-analysis', see []).

  10. How to Do a Systematic Review: A Best Practice Guide for Conducting and

    The best reviews synthesize studies to draw broad theoretical conclusions about what a literature means, linking theory to evidence and evidence to theory. This guide describes how to plan, conduct, organize, and present a systematic review of quantitative (meta-analysis) or qualitative (narrative review, meta-synthesis) information.

  11. PDF STARTING A NARRATIVE SYNTHESIS

    We often describe these methods as 'narrative' analysis or synthesis. Additionally, even where meta‐analysis is used, the results need to be described and integrated in the text of the review. A narrative synthesis can provide a first step in looking systematically at, and organising, the data.

  12. Synthesise

    The synthesis part of a systematic review will determine the outcomes of the review. There are two commonly accepted methods of synthesis in systematic reviews: Quantitative data synthesis. Qualitative data synthesis. The way the data is extracted from your studies and synthesised and presented depends on the type of data being handled.

  13. Evidence Synthesis/Systematic Reviews

    (From the University of California, San Francisco Library Research Guide, Systematic Reviews) Systematic review - A systematic review synthesizes data from articles into a summary review which has the potential to make conclusions more certain.Systematic reviews are considered the highest level of evidence in evidence-based medicine (EBM) evidence pyramid.

  14. Systematic review and meta-analysis of ex-post evaluations on the

    A number of reviews describe the literature and summarise the findings of the primary studies but do not attempt a quantitative synthesis of the findings 15,25,26,27.

  15. Guidelines for writing a systematic review

    A Systematic Review (SR) is a synthesis of evidence that is identified and critically appraised to understand a specific topic. ... A Meta-analysis, for example, is a review whereby the results of quantitative studies are combined to provide a more comprehensive result, a Meta-synthesis is the qualitative equivalent (Grant and Booth, ...

  16. An overview of methodological approaches in systematic reviews

    Evidence synthesis is a prerequisite for knowledge translation. 1 A well conducted systematic review (SR), often in conjunction with meta‐analyses ... 5 , 6 , 7 Second, some SRs evaluated the impact of methods on the results of quantitative synthesis and MA conclusions. Future research studies must also focus on the interpretations of SR ...

  17. "Narrative synthesis" of quantitative effect data in Cochrane reviews

    Mhairi Campbell has broad experience of conducting complex systematic reviews, including: qualitative evidence of policy interventions, review of theories linking income and health, and research investigating the reporting of narrative synthesis methods of quantitative data in public health systematic reviews.

  18. Convergent and sequential synthesis designs ...

    Reviews were included if they were systematic reviews combining qualitative and quantitative evidence. The included reviews were analyzed according to three concepts of synthesis processes: (a) synthesis methods, (b) sequence of data synthesis, and (c) integration of data and synthesis results. A total of 459 reviews were included.

  19. PDF Synthesising quantitative evidence in systematic reviews of complex

    To help systematic review and guideline development teams decide how to address this complexity in syntheses of quantitative evidence, we summarise considerations and methods in tables 1 and 2. We close with the important remark that quantitative synthesis is not always a desirable feature of a systematic review.

  20. Systematic Reviews: Synthesis & Meta-Analysis

    Synthesis involves pooling the extracted data from the included studies and summarizing the findings based on the overall strength of the evidence and consistency of observed effects. All reviews should include a qualitative synthesis and may also include a quantitative synthesis (i.e. meta-analysis). Data from sufficiently comparable and ...

  21. LibGuides: Systematic Reviews: 8. Synthesize Your Results

    A quantitative synthesis, or meta-analysis, uses statistical techniques to combine and analyze the results of multiple studies. The feasibility and sensibility of including a meta-analysis as part of your systematic review will depend on the data available. Requirements for quantitative synthesis:

  22. Adherence to PRISMA 2020 statement assessed through the expanded

    In recent years, the number of biomedical articles has grown exponentially , making systematic reviews (SRs) crucial in EBM, synthesizing all updated and available evidence on a specific topic to help healthcare decision-makers and beyond., This is due to the rigor of SRs methods, which provide a critical comprehensive and unbiased synthesis ...

  23. Chapter 12: Synthesizing and presenting findings using other methods

    When writing the review, details of the synthesis methods should be described. Synthesis methods that involve vote counting based on statistical significance have serious limitations and are unacceptable. Cite this chapter as: McKenzie JE, Brennan SE. Chapter 12: Synthesizing and presenting findings using other methods.

  24. Quantitative synthesis in systematic reviews

    The final common pathway for most systematic reviews is a statistical summary of the data, or meta-analysis. The complex methods used in meta-analyses should always be complemented by clinical acumen and common sense in designing the protocol of a systematic review, deciding which data can be combined, and determining whether data should be combined.

  25. Qualitative and mixed methods in systematic reviews

    Mixed methods reviews. As one reason for the growth in qualitative synthesis is what they can add to quantitative reviews, it is not surprising that there is also growing interest in mixed methods reviews. This reflects similar developments in primary research in mixing methods to examine the relationship between theory and empirical data which ...

  26. What Synthesis Methodology Should I Use? A Review and Analysis of

    In quantitative synthesis, the units of analysis range from specific statistics for systematic reviews to effect size of the intervention for meta-analysis. More recently, some systematic reviews focus on theories -, therefore it depends on the research question. Similarly, within conventional literature synthesis the units of analysis also ...

  27. Systematic Reviews & Evidence Synthesis Methods

    Note that PROSPERO only accepts protocols for systematic reviews, rapid reviews, and umbrella reviews, so if you plan to do another type of review, search OSF and the JBI EBP Database instead. If you plan to include qualitative evidence or topics related to nursing and allied health, you should also search CINAHL and any relevant subject ...

  28. Efficacy of interventions and techniques on adherence to physiotherapy

    Background Adherence to physiotherapeutic treatment and recommendations is crucial to achieving planned goals and desired health outcomes. This overview of systematic reviews synthesises the wide range of additional interventions and behaviour change techniques used in physiotherapy, exercise therapy and physical therapy to promote adherence and summarises the evidence of their efficacy ...

  29. Nutrients

    Understanding the relationship between the intake of sugars and diet quality can inform public health recommendations. This systematic review synthesized recent literature on associations between sugar intake and diet quality in generally healthy populations aged 2 years or older. We searched databases from 2010 to 2022 for studies of any design examining associations between quantified sugar ...

  30. Intersections of Intimate Partner Violence and Natural ...

    There were 27 articles that meet the inclusion criteria for the data extraction process. A quantitative critical appraisal tool was used to assess the quality of each study and a narrative synthesis approach to explore the findings. The review found an association between IPV and disasters, across disaster types and countries.