Draft data | Study A | Study B | Study C |
---|---|---|---|
Title | |||
Author/Publication year | |||
Country of origin | |||
Type of source/study design | |||
Aim | |||
Theater setting | |||
Type of theater method used (eg, ethnodrama, non-theatrical performance) | |||
Description of theater method | |||
Content focus (ie, what aspect of health is addressed) | |||
Source of content (ie, what research is informing the theater) | |||
Target audience | |||
Evaluation method (if applicable) | |||
Outcome measures (if applicable) | |||
Facilitators (if applicable) | |||
Challenges (if applicable) |
health research; knowledge translation; protocol; scoping review; theater
Introduction, conclusions, recommendations and limitations of the study, supplementary data, data availability.
Rachel Rahman, Caitlin Reid, Philip Kloer, Anna Henchie, Andrew Thomas, Reyer Zwiggelaar, A systematic review of literature examining the application of a social model of health and wellbeing, European Journal of Public Health , Volume 34, Issue 3, June 2024, Pages 467–472, https://doi.org/10.1093/eurpub/ckae008
Following years of sustained pressure on the UK health service, there is recognition amongst health professionals and stakeholders that current models of healthcare are likely to be inadequate going forward. Therefore, a fundamental review of existing social models of healthcare is needed to ascertain current thinking in this area, and whether there is a need to change perspective on current thinking.
Through a systematic research review, this paper seeks to address how previous literature has conceptualized a social model of healthcare and, how implementation of the models has been evaluated. Analysis and data were extracted from 222 publications and explored the country of origin, methodological approach, and the health and social care contexts which they were set.
The publications predominantly drawn from the USA, UK, Australia, Canada and Europe identified five themes namely: the lack of a clear and unified definition of a social model of health and wellbeing; the need to understand context; the need for cultural change; improved integration and collaboration towards a holistic and person-centred approach; measuring and evaluating the performance of a social model of health.
The review identified a need for a clear definition of a social model of health and wellbeing. Furthermore, consideration is needed on how a model integrates with current models and whether it will act as a descriptive framework or, will be developed into an operational model. The review highlights the importance of engagement with users and partner organizations in the co-creation of a model of healthcare.
Following years of sustained and increasing pressure brought about through inadequate planning and chronic under-resourcing including the unprecedented challenges of the Covid-19 pandemic, the UK NHS is at crisis point. 1 The incidents of chronic disease continue to increase alongside an ageing population who have more complex health and wellbeing needs, whilst recruitment and retention of staff continue to be insufficient to meet these increased demands. 1 Furthermore, the Covid-19 pandemic has only served to exacerbate pressures, resulting in delays in; patient presentation, 2 poor public mental health 3 strain and burnout amongst workforce. 4 However, preceding the pandemic there was already recognition of a need for a change to the current biomedical model of care to better prevent and treat the needs of the population. 5
While it is recognized that demands on the healthcare system are increasing rapidly, the biomedical model used to deal with these issues (which is the current model of healthcare provision in the UK) has largely remained unchanged over the years. The biomedical model takes the perspective that ill-health stems from biological factors and operates on the theory that good health and wellbeing is merely the absence of illness. Application of the model therefore focuses treatment on the management of symptoms and cure of disease from a biological perspective. This suggests that the biomedical approach is mainly reactive in nature and whilst rapid advancements in technology such as diagnostics and robotics have significantly improved patient outcomes and identification of early onset of disease, it does not fully extend into managing the social determinants that can play an important role in the prevention of disease. Therefore, despite its contribution in advancing many areas of biological and health research, the biomedical model has come under increasing scrutiny. 6 This is in part due to the growing recognition of the impact of those wider social determinants on health, ill-health and wellbeing including physical, mental and social wellbeing which moves the focus beyond individual physical abilities or dysfunction. 7–9 In order to address these determinants, action needs to be taken through developing policies in a range of non-medical areas such as social, economic and environment so that they regulate the commercial and corporate determinants. In this sense, we can quickly see that the traditional biological model rapidly becomes inadequate. With the current model, health care and clinical staff can do little to affect these determinants and as such can do little to assist the individual patient or society. The efficiency and effectiveness of clinical work will undoubtedly improve if staff have the ability to observe and understand the wider social determinants and consequences of the individual patients’ condition. Therefore, in order to provide a basis for understanding the determinants of disease and arriving at rational treatments and patterns of health care, a medical model must also take into account the patient, the social context in which they live, and a system devised by society to deal with the disruptive effects of illness, that is, the physician’s role and that of the health care system. Models such as Engel’s biopsychosocial model, 9 , 10 the social model of disability, social–ecological models of health 10 , 11 including the World Health Organisation’s framework for action on social determinants of health 8 , 9 are all proposed as attempting to integrate these wider social determinants.
However, the ability of health systems to effectively transition away from a dominant biomedical model to the adoption of a social model of health and care have yet to be fully developed. Responsibility for taking action on these social determinants will need to come from other sectors and policy areas and so future health policy will need to evolve into a more comprehensive and holistic social model of health and wellbeing. Wales’ flagship Wellbeing of Future Generations Act 12 for instance outlines ways of working towards sustainable development and includes the need to collaborate with society and communities in developing and achieving wellbeing goals. However, developing and implementing an effective operational model that allows multi-stakeholder integration will prove far more difficult to achieve than creating the polices. Furthermore, if the implementation of a robust model of social health is achievable, it’s efficiency, effectiveness and ability to deliver has yet to be proven. Therefore, any future model will need to extend past its conceptual development and provide an ability to manage the complex interactions that will exist between the stakeholders and polices.
Therefore, the use of the term ‘model’ poses its own challenges and debates. Different disciplines attribute differing parameters to what constitutes a model and this in turn may influence the interpretations or expectations surrounding what a model should comprise of or deliver. 13 According to numerous authors, a model has no ontological category and as such anything from physical entities, theoretical concepts, descriptive frameworks or equations can feasibly be considered a model. 14 It appears therefore, that much discussion has focussed on the move towards a ‘descriptive’ Social Model of Health and Wellbeing in an attempt to view health more holistically and identify a wider range of determinants that can impact on the health of the population. However, in defining an operational social model of health that can facilitate organizational change, there may be a need to consider a more systems- or process-based approach.
As a result, this review seeks to systematically explore the academic literature in order to better understand how a social model of health and wellbeing is conceptualized, implemented, operationalized and evaluated in health and social care.
The review seeks to address the research questions:
How is ‘a social model of health and wellbeing’ conceptualized?
How have social models of health and wellbeing been implemented and evaluated?
A systematic search of the literature was carried out between 6 January 2022 and 20 January 2022. Using the search terms shown in table 1 , a systematic search was carried out using online databases PsycINFO, ASSIA, IBSS, Medline, Web of Science, CINHAL and SCOPUS. English language and peer-reviewed journals were selected as limiters.
Search terms
The search strategy considered research that explicitly included, framed, or adopted a ‘social model of health and wellbeing’. Each paper was checked for relevance and screened. The authors reviewed the literature using the Preferred Reporting Items for Systematic Reviews and Meta Analysis (PRISMA) method using the updated guidelines from 2020. 15 Figure 1 represents the process followed.
PRISMA flow chart.
A systematic search of the literature identified 222 eligible papers for inclusion in the final review. A data extraction table was used to extract information regarding location of the research, type of paper (e.g. review, empirical), service of interest and key findings. Quantitative studies were explored with a view to conducting a quantitative meta-analysis; however, given the disparate nature of the outcome measures, and research designs, this was deemed unfeasible. All included papers were coded using NVivo software with the identified research questions in mind, and re-analysed using Thematic Analysis 16 to explore common themes of relevance.
The majority of papers were from the USA (34%), with the UK (28%), Australia (16%), Canada (6%) and wider Europe (10%) also contributing to the field. The ‘other’ category (6%) was made up of single papers from other countries. Papers ranged in date from 1983 to 2021 with no noticeable temporal patterns in country of origin, health context or model definition. However, the volume of papers published relating to the social model for healthcare in each decade increased significantly, thus suggesting the increasing research interest towards the social model of healthcare. Table 2 shows the number of publications per decade that were identified from this study.
Publications identifying social models of healthcare.
Year of publication . | Number of publications identifying social models of healthcare . |
---|---|
1980s | 5 |
1990s | 11 |
2000 | 70 |
2010 | 87 |
2020–22 | 49 |
Year of publication . | Number of publications identifying social models of healthcare . |
---|---|
1980s | 5 |
1990s | 11 |
2000 | 70 |
2010 | 87 |
2020–22 | 49 |
Most of the papers were narrative reviews ( n = 90) with a smaller number of systematic reviews ( n = 9) and empirical research studies including qualitative ( n = 47), quantitative ( n = 39) and mixed methods ( n = 14) research. The remaining papers ( n = 23) comprised small samples of, for example, clinical commentaries, cost effectiveness analysis, discussion papers and impact assessment development papers. The qualitative meta-analysis identified five overarching themes in relation to the research questions, some with underlying sub-themes, which are outlined in figure 2 .
Overview of meta-synthesis themes.
There was common recognition amongst the papers that a key aim of applying a social model of health and wellbeing was to better address the social determinants of health. Papers identified and reviewed relevant frameworks and models, which they later used to conceptualize or frame their approach when attempting to apply a social model of health. Amongst the most commonly referenced was the WHO’s framework. 17 Engel’s biopsychosocial model 9 which was referred to as a seminal framework by many of the researchers. However, once criticism of the biopsychosocial model was its inability to fully address social needs. As a result, a number of papers reported the development of new or enhanced models that used the biopsychosocial model as their underpinning ‘social model’ 18 , 19 but then extended their work by including a wider set of social elements in their resulting models. 20 The Social ecological model, 11 the Society-Behaviour-Biology Nexus, 21 and the Environmental Affordances Model are such examples. 22 Further examples of ‘Social Models’ included the Model of Social Determinants of Health 23 which framed specific determinants of interest (namely social gradient, stress, early life, social exclusion, work, unemployment, social support, addiction, food and transport). Similarly, Dahlgren and Whitehead’s ‘social model’ 10 illustrates social determinants via a range of influential factors from the individual to the wider cultural and socioeconomic influences. However, none of these papers formally developed a working ‘definition’ of a social model of health and wellbeing, instead applying guiding principles and philosophies associated with a social model to their discussions or interventions. 24 , 25
Numerous articles highlight that in order to move towards a social model of health and wellbeing, it is important to understand the context of the environment in which the model will need to operate. This includes balancing the needs of the individual with the resulting model to have been co-created, developed and implemented within the community whilst ensuring that the complexity of interaction between the social determinants of health and their influence on health and wellbeing outcomes are delivered effectively and efficiently.
The literature identified the complex multi-disciplinary nature of a variety of conditions or situations involving medical care. These included issues such as, but not exclusively, chronic pain, 26 cancer, 27 older adult care 28 and dementia, 29 thus indicating the complex arrangement of medical issues that a model will need to address and, where many authors acknowledged that the frequently used biomedical models failed to fully capture the holistic nature and need of patients. Papers outlined some of the key social determinants of health affecting the specific population of interest in their own context, highlighting the interactions between wider socioeconomic and cultural factors such as poverty, housing, isolation and transport and health and wellbeing outcomes. Interventions that had successfully addressed individual needs and successful embedded services in communities reported improved outcomes for end users and staff in the form of empowerment, agency, education and belonging. 30 There was also recognition that the transition to more community-based care could be challenging for health and social care providers who were having to work outside of their traditional models of care and accept a certain level of risk.
A number of papers referred to the need for a ‘culture change’ or ‘cultural shift’ in order to move towards a social model of health and wellbeing. Papers identified how ‘culture change models’ were implemented as a way of adapting to a social model. It was recognized that for culture change models to be effective, staff and the general public needed to be fully engaged with the entire move towards a social model, informing and shaping the mechanisms for the cultural shift as well as the application of the model itself.
The importance of integration and collaboration between health professionals, (which includes public, private and third sector organizations), services users and patients were emphasized in the ambition to achieve best practice when applying a social model of health and wellbeing. Papers identified the reported benefits of improved collaboration between, and integration of services which included improved continuity of care throughout complex pathways, 31 improved return to home or other setting on discharge, 25 and social connectedness. 32 Numerous papers discussed the importance of multi-disciplinary teams who were able to support individuals beyond the medicalized model.
A number of papers suggested specific professional roles or structures that would be ideal to act as champions or integrators of collaborative services and communities. 25 , 33 These could act as a link between secondary, primary and community level care helping to identify patient needs and supporting the integration of relevant services.
Individual papers applying and evaluating interventions based on a social model used a variety of methods to evaluate success. Amongst these, some of the most common outcome measures included; general self-report measures of outcomes such as mental health and perceptions of safety, 34 wellbeing, 35 life satisfaction and health social networks and support 19 Some included condition specific self-report outcomes relevant to the condition in question (e.g. pregnancy, anxiety) and pain inventories. 36 Other papers considered the in-depth experiences of users or service implementers through qualitative techniques such as in-person interviews. 37 , 38
However, the complexity of developing effective methods to evaluate social models of health were recognized. The need to consider the complex interactions between social determinants, and health, wellbeing, economic and societal outcomes posed particular challenges in developing consistency across evaluations that would enable a conclusive evaluation of the benefits of social models to wider health systems and societal health. Some criticized the over-reliance of quantitative and evidence-based practice methods of evaluation highlighting how these could fail to fully capture the complexity of human behaviour and the manner in which their lives could be affected.
The aim of this systematic review was to better understand how a social model of health and wellbeing is conceptualized, implemented and evaluated in health and social care. The review sought to address the research questions identified in the ‘Introduction’ section of this paper.
With regards to the conceptualization of a social model of health and wellbeing, analysis of the literature suggests that whilst the ethos, values and aspirations of achieving a unified model appears to have consensus. However, a fundamental weakness exists in that there is no single unified definition or operational model of a social model of health and wellbeing applied to the health and social care sector. The decision about how best to conceptualize a ‘social model’ is important both in terms of its operational value but also the implication of the associated semantics. However, without a single or unified definition then implementation or further, operationalization of any model will be almost impossible to develop. Furthermore, use of the term ‘social model’ arguably loses site of the biological factors that are clearly relevant in many elements of clinical medicine. Furthermore, there is no clarification in the literature about what would ‘not’ be considered a social model of health and wellbeing, potentially leading to confusion within health and social care sectors when addressing their wider social remit. This raises questions and requires decisions about whether implementation of a social model of health and wellbeing will need to work alongside or replace the existing biomedical approach.
Authors have advocated that a social model provides a way of ‘thinking’ or articulating an organization’s values and culture. 24 Common elements of the values associated with a social model amongst the papers reviewed included recognition and awareness of the social determinants of health, increased focus on preventative rather than reactive care, and similarly the importance of quality of ‘life’ as opposed to a focus on quality of ‘care’. However, whilst this approach enables individual services to consider how well their own practices align with a social model, the authors suggest that this does not provide large organizations such as the NHS, with multifaceted services and complex internal and external connections and networks, sufficient guidance to enable large scale evaluation or transition to a widespread operational model of a social model of health and wellbeing. This raises questions about what the model should be: whether its function is to support communication of a complex ethos to encourage reflection and engagement of its staff and end users, or to develop the current illustrative framework into a predictive model that can be utilized as an evaluative tool to inform and measure the success of widespread systems change.
Regarding the potential implementation of a future social model of health and wellbeing, none of the papers evaluated the complex widespread organizational implementation of a social model, instead focusing on specific organizational contexts of services such as long-term care in care homes, etc. Despite this, common elements of successful implementation did emerge from the synthesis. This included the need to wholeheartedly engage and be inclusive of end users in policy and practice change to fully understand the complexity of their social worlds and to ensure that changes to practice and policy were ‘developed with’, as opposed to ‘create for’, the wider public. This also involved ensuring that health, social care and wider multi-disciplinary teams were actively included in the process of culture change from an early stage.
The analysis identifies that a significant change of mindset and removal of perceived and actual hierarchical structures (that are historically embedded in health and social care structures) amongst both staff and public is needed although, eradicating socially embedded hierarchies will pose significant challenges in practice. Furthermore, the study revealed that many of the models proposed were conceptually underdeveloped and lacked the capability to be operationalized which in turn compromised their ability to be empirically tested. Therefore, in order that a future ‘implementable and operational’ model of social care and wellbeing can be created, further research into organizational behaviours, organizational learning and stakeholder theory (amongst others) applied to the social care and health environment is needed.
In attempting to conceptualize a definition for a social model of health and wellbeing, it is important to note that the model needs to be sufficiently broad in scope in order to include the prevailing biomedical while also including the need to draw in the social determinants that provide a view and future trajectory towards social health and wellbeing. Therefore, the authors suggest that the ‘preventative’ approach brought by the improvements in the social health determinants (social, cultural, political, environmental ) need to be balanced effectively with the ‘remedial/preventative’ focus of the biomedical model (and the associated advancements in diagnostics, technology, vaccines, etc), ensuring that a future model drives cultural change; improved integration and collaboration towards a holistic and person-centred approach whilst ensuring engagement with citizens, users, multi-disciplinary teams and partner organizations to ensure that transition towards a social model of health and wellbeing is undertaken.
Through a comprehensive literature analysis, this paper has provided evidence that advocates a move towards a social model of health and wellbeing. However, the study has predominantly considered mainly literature from the USA, UK, Canada and Australia and therefore is limited in scope at this stage. The authors are aware of the need to consider research undertaken in non-English speaking countries where a considerable body of knowledge also exists and which will add to further discussion about how that work dovetails into this body of literature and, how it aligns with the biomedical perspective. There is a need for complex organizations such as the NHS and allied organizations to agree a working definition of their model of health and wellbeing, whether that be a social model of health and wellbeing, a biopsychosocial model, a combined model, or indeed a new or revised perspective. 39
One limitation seen of the models within this study is that at a systems level, most models were conceptual models that characterized current systems or conditions and interventions to the current system that result in localized improvements in systems’ performance. However, for meaningful change to occur, a ‘future state’ model may need to focus on a behavioural systems approach allowing modelling of the complete system to take place in order to understand how the elements within the model 40 behave under different external conditions and how these behaviours affect overall system performance.
Furthermore, considerable work will be required to engage on a more equal footing with the public, health and social care staff as well as wider supporting organizations in developing workable principles and processes that fully embrace the equality of a social model and challenging the ‘power’ imbalances of the current biomedical model.
Supplementary data are available at EURPUB online.
This research was funded/commissioned by Hywel Dda University Health Board. The research was funded in two phases.
Conflicts of interest: None declared.
The datasets generated and/or analysed during the current study are available in the Data Archive at Aberystwyth University and have been included in the supplementary file attached to this submission. A full table of references for studies included in the review will be provided as a supplementary document. The references below refer to citations in the report which are in addition to the included studies of the synthesis.
The review identified five themes namely: the lack of a clear definition of a social model of health and wellbeing; the need to understand context; the need for cultural change; improved integration and collaboration towards a holistic and person-centred approach; measuring and evaluating the performance of a social model of health.
The review identified a need for organizations to decide on how a social model is to be defined especially at the interfaces between partner organizations and communities.
The implications for public policy in this paper highlights the importance of engagement with citizens, users, multi-disciplinary teams and partner organizations to ensure that transition towards a social model of health and wellbeing is undertaken with holistic needs as a central value.
British Medical Association (ND). An NHS under pressure. Accessed via An NHS under pressure (bma.org.uk). https://www.bma.org.uk/advice-and-support/nhs-delivery-and-workforce/pressures/an-nhs-under-pressure (26 June 2023, date last accessed).
Nuffield Trust ( 2022 ). NHS performance summary. Accessed via NHS performance summary | The Nuffield Trust. https://www.nuffieldtrust.org.uk/news-item/nhs-performance-summary-january-february-2022 (26 June 2023, date last accessed).
NHS confederation , ( 2022 ) Running hot: The impact of the Covid-19 pandemic on mental health services. Accessed via Running hot | NHS Confederation. https://www.nhsconfed.org/publications/running-hot (26 June 2023, date last accessed).
Gemine R , Davies GR , Tarrant S , et al. Factors associated with work-related burnout in NHS staff during COVID-19: a cross-sectional mixed methods study . BMJ Open 2021 ; 11 : e042591 .
Google Scholar
Iacobucci G. Medical models of care needs updating say experts . BMJ 2018 ; 360 : K1034 .
Podgorski CA , Anderson SD , Parmar J. A biopsychosocial-ecological framework for family-framed dementia care . Front Psychiatry 2021 ; 12 : 744806 .
Marmot M. Social determinants of health inequalities . Lancet 2005 ; 365 : 1099 – 104 .
World Health Organisation ( 1946 ) Preamble to the Constitution of the World Health Organization as adopted by the International Health Conference . New York: World Health Organization, 19–22 June, 1946.
World Health Organisation ( 2010 ). A conceptual framework for action on the social determinants of health. Accessed via A Conceptual Framework for Action on the Social Determinants of Health (who.int) (26 June 2023, date last accessed).
Engel G. The need for a new medical model: a challenge for biomedicine . Science 1977 ; 196 : 129 – 36 .
Dahlgren G , Whitehead M. ( 2006 ). European strategies for tackling social inequities in health: Levelling up part 2. Studies on Social Economic Determinants of Population Health, 1–105. Available at: http://www.euro.who.int/__data/assets/pdf_file/0018/103824/E89384.pdf (12 October 2023, date last accessed).
McLeroy KR , Bibeau D , Steckler A , Glanz K. An ecological perspective on health promotion programs . Health Educ Q 1988 ; 15 : 351 – 77 .
Welsh Government , Wellbeing of Future Generations Act 2015. Available at: https://www.gov.wales/sites/default/files/publications/2021-10/well-being-future-generations-wales-act-2015-the-essentials-2021.pdf (12 October 2023, date last accessed).
Stanford Encyclopaedia of Philosophy ( 2006 , 2020). Models in Science. Available at: https://plato.stanford.edu/entries/models-science/ (26 June 2023, date last accessed).
Page MJ , McKenzie JE , Bossuyt PM , et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews . BMJ 2021 ; 372 : n71 .
Braun V , Clarke V. Using thematic analysis in psychology . Qual Res Psychol 2006 ; 3 : 77 – 101 .
Thomas J , Harden A. Methods for the thematic synthesis of qualitative research in systematic reviews . BMC Med Res Methodol 2008 ; 8 : 45 .
Solar O , Irwin A. ( 2016 ) “A conceptual framework for action on the social determinants of health. Geneva, Switzerland: WHO; 2010”. (Social determinants of health discussion paper 2 (policy and practice)). Available at: http://www.who.int/sdhconference/resources/ConceptualframeworkforactiononSDH_eng.pdf (12 October 2023, date last accessed).
Farre A , Rapley T. The new old (and old new) medical model: four decades navigating the biomedical and psychosocial understandings of health and illness . Healthcare 2017 ; 5 : 88 .
Smedema SM. Evaluation of a concentric biopsychosocial model of well-being in persons with spinal cord injuries . Rehabil Psychol 2017 ; 62 : 186 – 97 . PMID: 28569533.
Robles B , Kuo T , Thomas Tobin CS. What are the relationships between psychosocial community characteristics and dietary behaviors in a racially/ethnically diverse urban population in Los Angeles county? . Int J Environ Res Public Health 2021 ; 18 : 9868 .
Glass TA , McAtee MJ. Behavioral science at the crossroads in public health: extending horizons, envisioning the future . Soc Sci Med 2006 ; 62 : 1650 – 71 .
Mezuk B , Abdou CM , Hudson D , et al. "White Box" epidemiology and the social neuroscience of health behaviors: the environmental affordances model . Soc Ment Health 2013 ; 3 : 10.1177/2156869313480892
Wilkinson RG , Marmot M , editors. Social Determinants of Health: The Solid Facts . Copenhagen, Denmark: World Health Organization , 2003 .
Google Preview
Mannion R , Davies H. Understanding organisational culture for healthcare quality improvement . BMJ 2018 ; 363 : k4907 .
Blount A , Bayona J. Toward a system of integrated primary care . Fam Syst Health 1994 ; 12 : 171 – 82 .
Berger MY , Gieteling MJ , Benninga MA. Chronic abdominal pain in children . BMJ 2007 ; 334 : 997 – 1002 . PMID: 17494020; PMCID: PMC1867894.
Berríos-Rivera R , Rivero-Vergne A , Romero I. The pediatric cancer hospitalization experience: reality co-constructed . J Pediatr Oncol Nurs 2008 ; 25 : 340 – 53 .
Doty MM , Koren MJ , Sturla EL. ( 2008 ). Culture change in nursing homes: How far have we come? Findings from the Commonwealth Fund 2007 National Survey. The Commonwealth Fund, 91. Available at: http://www.commonwealthfund.org/Content/Publications/Fund-Reports/2008/May/Culture-Change-in-NursingHomes-How-Far-Have-We-Come-Findings-FromThe-Commonwealth-Fund-2007-Nati.aspx (16 October 2023, date last accessed).
Robinson L , Tang E , Taylor J. Dementia: timely diagnosis and early intervention . BMJ 2015 ; 350 : h3029 .
Baxter S , Johnson M , Chambers D , et al. Understanding new models of integrated care in developed countries: a systematic review . Health Serv Deliv Res 2018 ; 6 : 1 .
Seys D , Panella M , VanZelm R , et al. Care pathways are complex interventions in complex systems: new European Pathway Association framework . Int J Care Coord 2019 ; 22 : 5 – 9 .
Agarwal G , Brydges M. Effects of a community health promotion program on social factors in a vulnerable older adult population residing in social housing” . BMC Geriatr 2018 ; 18 : 95 . PMID: 29661136; PMCID: PMC5902999.
Franklin CM , Bernhardt JM , Lopez RP , et al. Interprofessional teamwork and collaboration between community health workers and healthcare teams: an integrative review . Health Serv Res Manag Epidemiol 2015 ; 2 : 2333392815573312 . PMID: 28462254; PMCID: PMC5266454.
Gagné T , Henderson C , McMunn A. Is the self-reporting of mental health problems sensitive to public stigma towards mental illness? A comparison of time trends across English regions (2009-19) . Soc Psychiatry Psychiatr Epidemiol 2023 ; 58 : 671 – 80 . PMID: 36473961; PMCID: PMC9735159.
Geyh S , Nick E , Stirnimann D , et al. Biopsychosocial outcomes in individuals with and without spinal cord injury: a Swiss comparative study . Spinal Cord 2012 ; 50 : 614 – 22 .
Davies C , Knuiman M , Rosenberg M. The art of being mentally healthy: a study to quantify the relationship between recreational arts engagement and mental well-being in the general population . BMC Public Health 2016 ; 16 : 15 . PMID: 26733272; PMCID: PMC4702355.
Duberstein Z , Brunner J , Panisch L , et al. The biopsychosocial model and perinatal health care: determinants of perinatal care in a community sample . Front Psychiatry 2021 ; 12 : 746803 .
The King’s Fund , ( 2021 ). Health inequalities in a nutshell. Accessed via Health inequalities in a nutshell | The King's Fund (kingsfund.org.uk) https://www.kingsfund.org.uk/projects/nhs-in-a-nutshell/health-inequalities (23 October 2023, date last accessed)
Blount A. Integrated primary care: organizing the evidence . Fam Syst Health 2003 ; 21 : 121 – 33 .
Month: | Total Views: |
---|---|
January 2024 | 176 |
February 2024 | 297 |
March 2024 | 435 |
April 2024 | 655 |
May 2024 | 537 |
June 2024 | 360 |
Citing articles via.
Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide
Sign In or Create an Account
This PDF is available to Subscribers Only
For full access to this pdf, sign in to an existing account, or purchase an annual subscription.
Introduction Due to a change in diagnostic prerequisites and the inclusion of novel diagnostic entities, the implementation of the 11th revision of the International Classification of Diseases (ICD-11) will presumably change prevalence rates of specific mental, behavioural or neurodevelopmental disorders and result in an altered prevalence rate for this grouping overall. This scoping review aims to summarise the characteristics of primary studies examining the prevalence of mental, behavioural or neurodevelopmental disorders based on ICD-11 criteria. The knowledge attained through this review will primarily characterise the methodological approaches of this research field and additionally assist in deciding which psychiatric diagnoses are—given the current literature—most relevant for subsequent systematic reviews and meta-analyses intended to approximate the magnitude of prevalence rates while providing a first glimpse of the range of expected (differences in) prevalence rates in these conditions.
Methods and analysis MEDLINE, Embase, Web of Science and PsycINFO will be searched from 2011 to present without any language filters. This scoping review will follow the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Review guidelines.
We will consider (a) cross-sectional and longitudinal studies (b) focusing on the prevalence rates of mental, behavioural or neurodevelopmental disorders (c) using ICD-11 criteria for inclusion. The omission of (a) case numbers and sample size, (b) study period and period of data collection or (c) diagnostic procedures on full-text level is considered an exclusion criterion.
This screening will be conducted by two reviewers independently from one another and a third reviewer will be consulted with disagreements. Data extraction and synthesis will focus on outlining methodological aspects.
Ethics and dissemination We intend to publish our review in a scientific journal. As the primary data are publicly available, we do not require research ethics approval.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ .
https://doi.org/10.1136/bmjopen-2023-081082
Request permissions.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
This scoping review will be the first to summarise the characteristics of the literature assessing prevalence rates of mental, behavioural or neurodevelopmental disorders (MBND) according to the 11th revision of the International Classification of Diseases (ICD-11). Additionally, it will identify research gaps and inform subsequent systematic reviews and meta-analyses on the prevalence of the mentioned disorders.
Our search strategy consists of four electronic databases targeting peer-reviewed literature as well as grey literature sources to reduce publication bias; it will be conducted with no language restrictions.
We will adhere to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines for the conduct of Scoping Reviews to ensure transparent reporting.
To the end of a timely review, this scoping review covers the vast majority but not the entirety of diagnostic entities located within the MBND chapter of ICD-11.
In 2019, mental health conditions were among the 10 primary contributors to disease burden worldwide—an increase in burden being observable since 1990. 1 The current Global Burden of Disease study estimates roughly 970 million cases of mental health disorders worldwide to be responsible for more than 125 million disability-adjusted life years and for 15% of all years lived with disability 1 : numbers which highlight the relevance of mental health conditions as a global public health concern.
Reliable and standardised measurements of health issues—relying on proper categorisation of diseases and associated processes—are necessary to understand, prevent and treat diseases while guaranteeing efficient resource utilisation. 2
Through several revisions, 3 the International Classification of Diseases (ICD) has evolved from a limited catalogue of causes of death 4 into the ‘essential infrastructure for health information’ 2 and as such should serve the aforementioned functions. 2
The product of its 10th revision process, the ICD-11, was accepted by the World Health Assembly of WHO in May 2019. 2 Notable differences in its mental, behavioural or neurodevelopmental disorders (MBND) chapter were described by Gaebel et al and are summarised as follows 5 :
Altered subchapter structure: with 21 subchapters, the MBND chapter encompasses almost twice as many as chapter V of the ICD-10. 5 This change resulted from the removal of a rule limiting the number of subchapters to 10 at every level of the ICD-10. 6 Cross-links within chapter VI refer to the new sleep-wake disorders and conditions related to sexual health chapters, and in an effort to emphasise the continuous nature of development, the subchapter on mental or behavioural disorders with onset during childhood and adolescence was disintegrated, locating the respective diagnoses elsewhere. 5 7
New diagnostic entities: the revision resulted in the elimination of diagnostic groupings and the introduction of new diagnostic entities such as body dysmorphic disorder, prolonged grief disorder and complex post-traumatic stress disorder (complex PTSD). 5
Changes regarding diagnostic criteria: examples comprise a higher diagnostic threshold for PTSD 5 and schizoaffective disorders 6 and a new conceptualization of personality disorders, which removes the established category types classification of the ICD-10. 5 6 8
As observed in the context of other revision processes, changes in diagnostic criteria can lead to a change in prevalence rates of diagnoses. 9 The introduction of a reduced diagnostic threshold for attention deficit hyperactivity disorder in older adolescents and adults by the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5), for instance, led to an increase of 65% in reported prevalence rates within these populations. 9 Considering this, the publication of the ICD-11 alpha browser in May 2011 10 initiated a growing body of work pertaining to the prevalence rates of new diagnostic entities and the difference in prevalence rates of MBND assessed according to ICD-11 and ICD-10 criteria. 11–13
As accurate estimates of prevalence rates are of key importance for public health planning, healthcare resource allocation as well as identifying risk factors or health disparities, this scoping review seeks to provide an overview of primary studies which examine the prevalence of mental disorders based on ICD-11 criteria. It aims to analyse the methodologies used to determine prevalence rates, including data sources, sampling methods, diagnostic tools and population characteristics. As such it will also support the decision on which diagnoses are most suitable for subsequent systematic reviews and meta-analyses, which can provide more accurate estimates on how the ICD-11 will impact prevalence rates of specific MBND and disorders of this grouping in general.
Rationale 1: the rationale of this review is to outline how prevalence rates of MBND of ICD-11 have been assessed so far and thereby summarise the approaches of currently available primary studies.
Associated review questions are:
What are the sample characteristics of primary studies?
Where were the primary studies based?
What was the timeframe for data collection within primary studies?
Study period.
Year of data collection.
What are the study designs of primary studies?
What were research aims of the primary studies?
Which MBND are most frequently assessed?
How were diagnoses assessed?
What measurement tools were used?
How was data collected?
What prevalence was estimated for the diagnoses?
Additionally, research gaps will be identified:
Which mental, behavioural or neurodevelopmental disorders are least frequently assessed within prevalence studies?
Rationale 2: identify mental, behavioural or neurodevelopmental disorders most suitable for subsequent systematic reviews and meta-analyses: here we are interested in:
Disorders, where multiple (≥2) primary studies exist which assess the prevalence of the disorders listed below (table 1) according to ICD-11 criteria and ICD-10 criteria within one cohort.
Newly introduced disorders, where multiple (≥2) primary studies exist which assess the prevalence of the disorders listed below (table 1) according to ICD-11 criteria.
As is reflected within these rationales, the main outcome of our project is a summary of the study characteristics of a body of work. A scoping review lends itself to the most appropriate method of evidence synthesis.
A preliminary search of MEDLINE, Embase and PsycINFO for existing scoping and systematic reviews on the topic was performed on 6 October 2023. We did not identify reviews pertaining to a similar topic.
This scoping review in its final form will be reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension tool for Scoping Reviews. 14 This protocol has been developed in accordance with the JBI methodological guidelines. 15 We will describe protocol modifications with their respective dates.
We will include
Cross-sectional and longitudinal studies
assessing the prevalence of MBND listed (table 1)
as per ICD-11 criteria
In sight of the feasibility of our project, provision of insufficient data on full-text level of primary studies constitutes an exclusion criterion: we will exclude primary studies which fail to provide
Case numbers and sample size
Study period and period of data collection
Details of diagnostic procedures
We will conduct a search for the peer-reviewed literature from 2011 to present in the following databases: MEDLINE, Embase, PsycINFO and Web of Science. No language restriction will be applied. Sources identified in other languages which require translation for the full-text screening will be translated by state-certified translators.
Our search strategy for the peer-reviewed databases will consist of a search string of the general pattern:
String element for retrieving articles on each of the specific diagnoses as listed in table 1
String element for retrieving diagnoses according to ICD-11 criteria
Search filter identifying cross-sectional and longitudinal studies
Relevant mental, behavioural or neurodevelopmental disorders adapted from the WHO ICD-11 browser
The MBND listed in table 1 will be searched for.
The reference list of all included studies will be searched for additional sources. Sources of grey literature will also be identified and searched.
The search strategy was developed in consultation with an information specialist. The search string will be modified for the grey literature sources. We will repeat the search before the final analysis. The exact search strategy for MEDLINE via Ovid can be found in the online supplemental material . The planned start and end dates for this study are May 2024 and May 2026, respectively.
Data management and study selection process.
After performing searches across the databases, the title and abstract of each article will be exported to EndNote. Any duplicates will be removed at this stage. The titles and abstracts of all articles will be reviewed by two reviewers (KN and SG) according to the inclusion/exclusion criteria. Disagreements at this screening stage will be resolved by consensus of a third reviewer (SF) and studies will be retrieved for full-text review, if not excluded at this stage. Similarly, the full-text review will be conducted by two reviewers and disagreements will be resolved by consulting a third reviewer.
Following the review of titles and abstracts, an Excel spreadsheet will be created for the full-text review where the reviewers will have to document (a) whether the article is to be included or excluded, (b) record the reason for exclusion for excluded sources and (c) extract key information from each included paper. Data will be extracted by two reviewers, and discrepancies will be solved by a third reviewer.
The data extraction form will be piloted on a sample of the included studies and possibly modified.
Inclusion of a primary source provided; we intend to contact authors for further information when necessary.
Concerning the data extraction—in alignment with the aims of this project—our current data extraction form contains the following items:
Bibliographic information
Last name of the first author
Year of publication
Peer-review status (peer reviewed: eg, yes, no as in preprint)
Journal/source
Study location
Study period/year of data collection
Study design
Scope of the investigation/research aims
Investigating the prevalence
Investigating predictors
Investigating consequences
Investigating psychosocial correlates
Study sample
Study sample (as in sampling process)
Sample size
Age range of the study population
Sex/gender ratio
Psychiatric disorders assessed
Diagnostic tool
Measurement tools used
Method(s) of data collection
Prevalence of psychiatric disorders
Analysis performed
As these data points provide the basis for an appropriate description of the methodology of this body of work, we cannot distinguish between main and additional outcomes.
Due to the aim of our work (ie, to give an overview of prevalence data available and methodological approaches used to obtain these estimates), we will use the JBI prevalence critical appraisal tool (possibly with minor modifications) to assess the methodological limitations or risk of bias of the evidence of primary studies included.
For all studies meeting the inclusion criteria of the scoping review, we will use a descriptive synthesis approach. Our summary will focus on the extracted data. The results will be presented as charts, maps or tables. We will choose those visualisation and summary approaches that best fit the extracted content.
This project aims to analyse an existing body of research studies, and we include an expert of experience (peer-to-peer trainer) and a representative of relatives in our research group. The expert of experience (AJ) was involved in the development of this protocol and will be consulted during the process of data synthesis and the discussion of our results. The representatives of relatives will be consulted during the process of data synthesis and the discussion of our results.
Regarding the dissemination of our work, the scoping review will be provided to scientific journals for consideration for publication, and its results may be presented as conference posters and presentations. No ethics approval is required as the analysed data originates from publicly available material.
Patient consent for publication.
Not applicable.
Contributors SG and KN conceptualised this scoping review. KN is the author of the first draft of this protocol. SF, AH, SG, SS, KD, SK and AJ critically reviewed the manuscript and provided amendments. The search strategy was developed by KN with input from information scientists, SG and SK. All authors read and approved the final manuscript.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Patient and public involvement Patients and/or the public were involved in the design, conduct, reporting or dissemination plans of this research. Refer to the Methods section for further details.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.
BMC Medical Research Methodology volume 20 , Article number: 259 ( 2020 ) Cite this article
36k Accesses
34 Citations
27 Altmetric
Metrics details
Data extraction forms link systematic reviews with primary research and provide the foundation for appraising, analysing, summarising and interpreting a body of evidence. This makes their development, pilot testing and use a crucial part of the systematic reviews process. Several studies have shown that data extraction errors are frequent in systematic reviews, especially regarding outcome data.
We reviewed guidance on the development and pilot testing of data extraction forms and the data extraction process. We reviewed four types of sources: 1) methodological handbooks of systematic review organisations (SRO); 2) textbooks on conducting systematic reviews; 3) method documents from health technology assessment (HTA) agencies and 4) journal articles. HTA documents were retrieved in February 2019 and database searches conducted in December 2019. One author extracted the recommendations and a second author checked them for accuracy. Results are presented descriptively.
Our analysis includes recommendations from 25 documents: 4 SRO handbooks, 11 textbooks, 5 HTA method documents and 5 journal articles. Across these sources the most common recommendations on form development are to use customized or adapted standardised extraction forms (14/25); provide detailed instructions on their use (10/25); ensure clear and consistent coding and response options (9/25); plan in advance which data are needed (9/25); obtain additional data if required (8/25); and link multiple reports of the same study (8/25). The most frequent recommendations on piloting extractions forms are that forms should be piloted on a sample of studies (18/25); and that data extractors should be trained in the use of the forms (7/25). The most frequent recommendations on data extraction are that extraction should be conducted by at least two people (17/25); that independent parallel extraction should be used (11/25); and that procedures to resolve disagreements between data extractors should be in place (14/25).
Overall, our results suggest a lack of comprehensiveness of recommendations. This may be particularly problematic for less experienced reviewers. Limitations of our method are the scoping nature of the review and that we did not analyse internal documents of health technology agencies.
Peer Review reports
Evidence-based medicine has been defined as the integration of the best-available evidence and individual clinical expertise [ 1 ]. Its practice rests on three fundamental principles: 1) that knowledge of the evidence should ideally come from systematic reviews, 2) that the trustworthiness of the evidence should be taken into account and 3) that the evidence does not speak for itself and appropriate decision making requires trade-offs and consideration of context [ 2 ]. While the first principle directly speaks to the importance of systematic reviews, the second and third have important implications for their conduct. The second principle implies that systematic reviews should be based on rigorous, bias-reducing methods. The third principle implies that decision makers require sufficient information on the primary evidence to make sense of a review’s findings and apply them to their context.
Broadly speaking, a systematic review consists of five steps: 1) formulating a clear question, 2) searching for studies able to answer this question, 3) assessing and extracting data from the studies, 4) synthesizing the data and 5) interpreting the findings [ 3 ]. At a minimum, steps two to five rely on appropriate and thorough data collection methods. In order to collate data from primary studies, standardised data collection forms are used [ 4 ]. These link systematic reviews with primary research and provide the foundation for appraising, analysing, summarising and interpreting a body of evidence. This makes their development, pilot testing and application a crucial part of the systematic reviews process.
Studies on the prevalence and impact of data extraction errors have recently been summarised by Mathes and colleagues [ 5 ]. They identified four studies that looked at the frequency of data extraction errors in systematic reviews. The error rate for outcome data ranged from 8 to 63%. The impact of the errors on summary results and review conclusions varied. In one of the studies the effect size from the meta-analytic point estimates changed by more than 0.1 in 70% of cases (measured as standardised differences in means) [ 6 ]. Considering that most interventions have small to moderate effects, this can have a large impact on conclusions and decisions. Little research has been conducted on extraction errors relating to non-outcome data.
The importance of a rigorous data extraction process is not restricted to outcome data. As previously mentioned, users of systematic reviews need sufficient information on non-outcome data to make sense of the underlying primary studies and assess their applicability. Despite this, many systematic reviews do not sufficiently report this information. In one study almost 90% of systematic reviews of interventions did not provide the information required for treatments to be replicated in practice – compared to 35% of clinical trials [ 7 ]. While there are several possible reasons for this – including the quality of reporting – insufficient data collection forms or procedures may to contribute to the problem.
Against this background, we sought to review the guidance that is available to systematic reviewers for the development and pilot testing of data extraction forms and the data extraction process, these being central elements in systematic reviews.
This project was conducted as part of a dissertation, for which an exposé is available in German. We did not publish a protocol for this descriptive analysis, however. As there are no specific reporting guidelines for this type of methodological review, we reported our methods in accordance with the PRISMA statement as applicable [ 8 ].
Systematic reviews are conducted in a variety of different contexts – most notably as part of dissertations or academic research projects, as standalone projects, by health technology assessment (HTA) agencies and by systematic review organisations (SROs). Thus, we looked at a broad group of sources to identify recommendations:
Methodological handbooks from major SROs
Textbooks aimed at students and researchers endeavouring to conduct a systematic review
Method documents from HTA agencies
Published journal articles making recommendations on how to conduct a systematic review or how to develop data extraction forms
While the sources that we searched mainly focus on medicine and health, we did not exclude other health-related areas such as the social sciences or psychology.
Regarding the methodological handbooks from SROs, we considered the following to be the most relevant to our analysis:
The Centre for Reviews and Dissemination’s guidance for undertaking reviews in health care (CRD guidance)
The Cochrane Handbook of Systematic Reviews of Interventions (Cochrane Handbook)
The Institute of Medicine’s Finding What Works in Health Care: Standards for Systematic Reviews (IoM Standards)
The Joanna Briggs Institute’s Reviewer Manual (JBI Manual)
The list of textbooks was based on a recently published article that reviewed systematic review definitions used in textbooks and other sources [ 9 ]. The authors did not carry out a systematic search for textbooks, but included textbooks from a broad range of disciplines including medicine, nursing, education, health library specialties and the social sciences published between 1998 and 2017. These textbooks included information on data extraction in systematic reviews, but none of them focussed on this topic exclusively.
Regarding the HTA agencies, we compiled a list of all member organisations of the European Network for Health Technology Assessment (EUnetHTA), the International Network of Agencies for Health Technology Assessment (INAHTA), Health Technology Assessment international (HTAi) and the Health Technology Assessment Network of the Americas (Red de Evaluación de Tecnologías en Salud de las Américas – RedETSA). The reference month for the compilation of this list was January 2019, the list is included in additional file 1 . We searched these websites for potentially relevant documents and downloaded these. We then reviewed the full texts of all documents for eligibility and included those that fulfilled our inclusion criteria. The website searches and the full text screening of the documents were conducted by two authors independently (RBB and AW). Disagreements were resolved by discussion. We also planned to include the newly founded Asia-Pacific HTA network (HTAsiaLink), but the webpage had not yet been launched during our research period.
To identify relevant journal articles, we first searched the Scientific Resource Center’s Methods Library (SRCML). This is a bibliography of publications relevant to evidence synthesis methods which was maintained until the third quarter of 2017 and has been archived as a RefWorks library. Because the SRCML is no longer updated, we conducted a supplementary search of Medline from the 1st of October 2017 to the 12th of December 2019. Finally, we searched the Cochrane Methodology Register (CMR), a reference database of publications relevant to the conduct of systematic reviews that was curated by the Cochrane Methods Group. The CMR was discontinued on the 31st of May 2012 and has been archived. Due to the limited search and export functions of these archived SRCML and CMR, we used pragmatic search methods for these sources. The search terms that were used for the databases searches are included in additional file 2 . The titles and abstracts from the database searches and the full texts of potentially relevant articles were screened for eligibility by two authors independently (RBB and AW). Disagreements were resolved by discussion or, if this was unsuccessful, arbitration with DP.
To be eligible for inclusion in our review, documents had to fulfil the following criteria:
Published method document (e.g. handbook, guidance, standard operating procedure, manual), academic textbook or journal article
Include recommendations on the development or piloting of data extraction forms or the data extraction process in systematic reviews
Available in English or German
We excluded empirical research on different data extraction methods as well as papers on technical aspects, because these have been reviewed elsewhere [ 10 , 11 , 12 ]. This includes, for example, publications on the merits and downsides of different types of software (word processors, spreadsheets, database or specialised software) or the use of pencil and paper versus electronic extraction forms. We also excluded conference abstracts and other documents not published in full.
For journal articles we specified the inclusion and exclusion criteria more narrowly as this group includes a much broader variety of sources (for example we excluded “primers”, i.e. articles that provide an introduction to reading or appraising a systematic review for practitioners). The full list of inclusion and exclusion criteria for journal articles is published in additional file 2 .
We looked at a variety of items relevant to three categories of interest:
the development of data extraction forms,
the piloting of data extraction forms and
the data extraction process.
To our knowledge, no comprehensive list of potentially relevant items exists. We therefore developed a list of potentially relevant items based on iterative reading of the most influential method handbooks from SROs (see above) and our personal experience. The full list of items included in our extraction form is reported in additional file 3 together with a proposed rationale for each item.
We did not examine recommendations regarding the specific information that should be extracted from studies, because this depends on a review’s question. For example, reviewers might choose to include information on surrogate outcomes in order to aid interpretation of effects or they might choose not to, because they often poorly correlate with clinical endpoints and the researchers are interested in patient-relevant outcomes [ 13 , 14 ]. Furthermore, the specific information that is extracted for a review depends on the area of interest with special requirements for complex intervention or adverse effects reviews, for example [ 15 ]. For the same reason, we did not examine recommendations regarding specific methodological or statistical aspects. For instance, when a generic inverse variance meta-analysis is conducted, standard errors are of interest, whereas in other cases standard deviations may be preferably extracted.
One author developed the first draft of the data extraction form to gather information on the items of interest. This was reviewed by DP and complemented and revised after discussion. We collected bibliographic data, direct quotations on recommendations from the source text and page numbers.
Each item was coded using a coding scheme of five possible attributes:
recommendation for the use of this method
recommendation against the use of this method
optional use of this method
a general statement on this method without a recommendation
method not mentioned
For some items descriptive information was of additional interest. This included specific recommendations on the sample of studies that should be used to pilot the data extraction form or the experience or expertise of the reviewers that should be involved. Descriptive information was copied and pasted into the form. The form also included an open field for comments in case any additional items of interest were identified.
One author (RBB) extracted the information of interest from the included documents using the final version of the extraction form. A second author double-checked the information for each of the extracted items (AW). Discrepancies were resolved by discussion or by arbitration with DP.
During extraction, one major change was required to the form. Initially, we considered quantifying agreement only during the piloting phase of an extraction form, but later realised that some sources recommended this for the extraction phase of a review. We thus added items on quantifying agreement to this category.
We separately analysed and reported the four groups of documents (handbooks from SROs, documents from HTA agencies, textbooks and journal articles) and the three categories of interest (development, piloting and extraction). We summarised the results of our findings descriptively. We also aggregated the results across sources for each item using frequencies. Additional information is presented descriptively in the text.
In our primary analysis we only included documents that made recommendations for interventional reviews or generic recommendations. We did this because almost all included documents focussed on these types of reviews and, more importantly, to avoid inclusion of multiple recommendations from one institution. This was particularly relevant for the Joanna Briggs Institute’s Reviewer Manual which at the time of our analysis had 10 separate chapters on a variety of different systematic review types. The decision to restrict the primary analysis to documents focussing on interventional reviews and generic documents was made post hoc. Results for other types of reviews (e.g. scoping reviews, umbrella reviews, economic reviews) are presented as a secondary analysis.
We identified and searched 158 webpages of HTA agencies via the member lists of EUnetHTA, INAHTA, HTAi and RedETSA (see additional file 1 ). This resulted in 155 potentially relevant method documents from 67 agencies. After full text screening, 6 documents remained that fulfilled our inclusion criteria. The database searches resulted in 2982 records. After title and abstract screening, 15 potentially relevant full texts remained. Of these 5 fulfilled our inclusion criteria. A PRISMA flow chart depicting the screening process for the database searches is provided in additional file 2 and for the HTA method documents in additional file 1 .
In total, we collected data from 14 chapters in 4 handbooks of SROs [ 16 , 17 , 18 , 19 ], 11 textbooks [ 3 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 ], 6 method documents from HTA agencies [ 30 , 31 , 32 , 33 , 34 , 35 ] and 5 journal articles [ 36 , 37 , 38 , 39 , 40 ]. Additional file 4 lists all documents that fulfilled our inclusion criteria. In our primary analysis we describe recommendations from a total of 25 sources: 4 chapters from 4 SRO handbooks, 11 textbooks, 5 method documents from HTA agencies and 5 journal articles. Our secondary analysis on recommendations for non-interventional systematic reviews is included in Additional file 5 and the detailed results for the primary analysis in Additional file 6 .
In sum, we analysed recommendations from 25 sources in our primary analysis. The most frequent recommendations on the development of extraction forms are to use customised or adapted standardised extraction forms (14/25); provide detailed instructions on their use (10/25); ensure clear and consistent coding and response options (9/25); plan in advance which data are needed (9/25); obtain additional data if required (8/25); and link multiple reports of the same study (8/25).
The most frequent recommendations on piloting extractions forms are that forms should be piloted on a sample of studies (18/25); and that data extractors should be trained in the use of the forms (7/25).
The most frequent recommendations on data extraction are that data extraction should be conducted by at least two people (17/25); that independent parallel extraction should be used (11/25); and that procedures to resolve disagreements between data extractors should be in place (14/25).
To provide a more comprehensible overview and illustrate areas where guidance is sparse, we have aggregated the results for definite recommendations (excluding optional recommendations or general statements) in Tables 1 , 2 and 3 . To avoid any misconceptions, we emphasise that by aggregating these results we by no means suggest that all items are of equal importance. Some are in fact mutually exclusive or interconnected.
The following sections provide details for each groups of documents sorted by the three categories of interest.
Category: development of extraction forms.
Three handbooks recommend that reviewers should plan in advance which data to extract [ 16 , 17 , 18 ]. Furthermore, three recommended that reviewers develop a customized data extraction form or adapt an existing form to meet the specific review needs [ 17 , 18 , 19 ]. In contrast, the JBI recommends use of their own standardised data extraction form, but allows reviewers to use others, if this is justified and the forms are described [ 16 ]. All four handbooks recommend that reviewers link multiple reports of the same study to avoid multiple inclusions of the same data [ 16 , 17 , 18 , 19 ]. Three handbooks make statements on strategies for obtaining unpublished data [ 16 , 17 , 18 ]. The Cochrane Handbook recommends contacting authors to obtain additional data, while the CRD guidance makes a general statement in light of the chances of success and resources available. The JBI manual makes this optional but requires the systematic reviewers to report whether authors of included studies are contacted in the review protocol.
Two handbooks recommend that the data collection form includes consistent and clear coding instructions and response options and that data extractors are provided with detailed instructions on how to complete the form [ 17 , 18 ]. The Cochrane Handbook also recommends that the entire review team should be involved in the development of the data extraction form and that this should include authors with expertise in the content area, review methods, statisticians and data extractors. The Cochrane Handbook also recommends that reviewers check compatibility of electronic forms or data systems with analytical software and ensure methods are in place to record, assess and correct data entry errors.
Three handbooks recommended that authors pilot test their data extraction form [ 17 , 18 , 19 ]. The Cochrane Handbook recommends that “several people” are involved and “at least a few articles” used. The CRD guidance states that “a sample of included studies” should be used for piloting. The Cochrane Handbook also recommends that data extractors are trained; that piloting may need to be repeated if major changes to the extraction form are made during the review process; and that reports that have already been extracted should be re-checked in this case. None of the handbooks makes an explicit recommendation on who should be involved in piloting the data extraction form or their expertise. Furthermore, none of the handbooks makes a recommendation on quantifying agreement during the piloting process or using a quantified reliability threshold that should be reached before beginning the extraction process.
All handbooks recommend that data should be extracted by at least two reviewers (dual data extraction) [ 16 , 17 , 18 , 19 ]. Three handbooks recommend that data are extracted by two reviewers independently (parallel extraction) [ 16 , 18 , 19 ], one also considers it acceptable that one reviewer extracts the data and a second reviewer checks it for accuracy and completeness (double-checking) [ 17 ]. Furthermore, two of the handbooks make an optional recommendation that independent parallel extraction could be done only for critical data such as risk of bias and outcome data, while non-critical data is extracted by a single reviewer and double-checked by a second reviewer [ 18 , 19 ]. The Cochrane Handbook also recommends that data extractors have a basic understanding of the review topic and knowledge of study design, data analysis and statistics [ 18 ].
All handbooks recommend that reviewers should have procedures in place to resolve disagreements arising from dual data extraction [ 16 , 17 , 18 , 19 ]. In all cases discussion between extractors or arbitration with a third person are suggested. The Cochrane Handbook recommends hierarchical use of these strategies, while the other sources do not specify this [ 18 ]. Of note, the IoM Standards highlights the need for a fair procedure that ensures both reviewers judgements are considered in case of a power or experience asymmetry [ 19 ]. The Cochrane Handbook also recommends that disagreements that remain unresolved after discussion, arbitration or contact with study authors should be reported in the systematic review [ 18 ].
Two handbooks recommend to informally consider the reliability of coding throughout the review process [ 17 , 18 ]. These handbooks also mention the possibility of quantifying agreement of the extracted data. The Cochrane Handbook considers this optional and recommends it only for critical outcomes such as risk of bias assessments or key outcome data, if done [ 18 ]. The CRD guidance mentions this possibility without making a recommendation [ 17 ]. Two handbooks recommend that reviewers document disagreements and how they were resolved [ 17 , 18 ] and two recommend reporting who was involved in data extraction [ 18 , 19 ]. The IoM Standards specify this in that the number of individual data extractors and their qualifications should be reported in the methods section of the review [ 19 ].
Regarding the development of data extraction forms, the most frequent recommendation in the analysed textbooks is that reviewers should develop a customized extraction form or adapt an existing one to suit the needs of their review (6/11) [ 20 , 21 , 23 , 24 , 26 , 29 ]. Two textbooks consider the choice between customized and generic or pre-existing extraction forms optional [ 3 , 25 ].
Many of the textbooks also make statements on unpublished data (7/11). Most of them recommend that reviewers develop a strategy for obtaining unpublished data (4/11) [ 24 , 25 , 26 , 29 ]. One textbook makes an optional recommendation on obtaining unpublished data and mentions the alternative of conducting sensitivity analysis to account for missing data [ 3 ]. Two textbooks make general statements regarding missing data without a compulsory or optional recommendation [ 22 , 23 ].
Four textbooks recommend that reviewers ensure consistent and easy coding rules and response options in their data collection form [ 3 , 22 , 25 , 29 ]; three to provide detailed instruction on how to complete the data collection form [ 22 , 24 , 25 ]; and three to link multiple reports of the same study [ 3 , 24 , 26 ]. One textbook discusses the impact of including multiple study reports but makes no specific recommendation [ 23 ].
Two textbooks recommend reviewers to plan in advance which data they will need to extract for their review [ 24 , 28 ]. One textbook makes an optional recommendation, depending on the number of included studies [ 22 ]. For reviews with a small number of studies it considers an iterative process appropriate; for large data sets it recommends a thoroughly developed and overinclusive extraction form to avoid the need to go back to study reports later in the review process.
One textbook recommends that clinical experts or methodologists are consulted in developing the extraction form to ensure important study aspects are included [ 26 ]. None includes statements on the recording and handling of extraction errors.
For this category, the most frequently made recommendation in the analysed textbooks is that reviewers should pilot test their data extraction form (8/11) [ 3 , 20 , 22 , 23 , 24 , 25 , 26 , 29 ]. One textbook makes a general statement on piloting, but no specific recommendation [ 27 ].
Three textbooks recommend that data extractors are trained [ 22 , 24 , 25 ]. One textbook states that extraction should not begin before satisfactory agreement is achieved but does not define how this should be assessed [ 22 ]. No recommendations were identified for any of the other items regarding piloting of extraction form in the analysed textbooks.
Six textbooks recommend data extraction by at least two reviewers [ 22 , 23 , 24 , 25 , 26 , 29 ]. Four of these recommend parallel extraction [ 23 , 24 , 25 , 26 ], while two do not specify the exact procedure [ 22 , 29 ]. One textbook explains the different types of dual extraction modes but makes no recommendation on their use [ 27 ].
One textbook recommends that reviewer agreement for extracted data is quantified using a reliability measure [ 25 ], while two mention this possibility without making a clear recommendation [ 22 , 26 ]. Two of these mention Cohen’s kappa as possible measures for quantifying agreement [ 22 , 26 ], one also mentions raw agreement [ 22 ].
Five textbooks recommend that reviewers develop explicit procedures for resolving disagreements, either by discussion or consultation of a third person [ 22 , 24 , 25 , 26 , 29 ]. Two textbooks suggest a hierarchical approach using discussion and, if this is unsuccessful, arbitration with a third person [ 25 , 29 ]. One textbook also suggests the possibility of including the entire review team in discussions [ 24 ]. One textbook emphasizes that educated discussions should be preferred over voting procedures [ 26 ]. One textbook also recommends that reviewers document disagreements and how they were resolved [ 26 ].
One textbook makes recommendations on the expertise of the data extractors [ 24 ]. It suggests that data extraction is conducted by statisticians, data managers and methods experts with the possible involvement of content experts, when required.
In two documents from HTA agencies it is recommended that a customised extraction form is developed [ 31 , 35 ]. One of these roughly outlines the contents of extraction forms that can be used as a starting point [ 31 ]. Three documents recommend that detailed instructions on using the extraction form should be provided [ 30 , 31 , 34 ]. Two documents recommend that reviewers develop a strategy for obtaining unpublished data [ 30 , 31 ].
The following recommendations are only included in one method document each: planning in advance which data will be required for the synthesis [ 30 ]; ensuring consistent coding and response options in the data collection form [ 31 ] and linking multiple reports of the same study to avoid including data from the same study more than once [ 31 ].
For this category the only recommendation we found in HTA documents is that data collection forms should be piloted before use (3/5) [ 30 , 31 , 33 ]. None of the documents specifies how this may be done, for example regarding the number or types of studies involved. One of the documents makes a vague suggestion that all reviewers ought to be involved in pilot testing.
In most documents it is recommended that data extraction should be conducted by two reviewers (4/5) [ 30 , 31 , 34 , 35 ]. Two make an optional recommendation for either parallel extraction or a double-checking procedure [ 30 , 31 ], one recommends parallel extraction [ 34 ] and one reports use of double-checking [ 35 ]. Three method documents recommend that reviewers resolving disagreements by discussion [ 30 , 31 , 35 ]. One method document recommends that reviewers report who was involved in data extraction [ 34 ].
We identified 5 journal articles that fulfilled our inclusion criteria. This included a journal article specifying the methods used by the Cochrane Back and Neck Group [ 36 ], an article describing the data extraction and synthesis methods used in JBI systematic reviews [ 38 ], a paper on guidelines for systematic review in the environmental research field [ 39 ] and two in-depth papers on data extraction and coding methods within systematic reviews [ 37 , 40 ]. One of these used the Systematic Reviews Data Suppository (SRDS) as an example, but the recommendations made were not exclusive to this system [ 37 ].
Three journal articles recommended that authors should plan in advance which data they require for the review [ 37 , 39 , 40 ]. A recommendation for developing a customized extraction form (or adapting one) for the specific purpose of the review was also made in three journal articles [ 36 , 37 , 40 ]. Two articles recommended that consistent and clear coding and response options should be ensured and detailed instruction provided to data extractors [ 37 , 40 ]. Furthermore, two articles recommended that mechanisms should be in place for recording, assessing and correcting data entry errors [ 36 , 37 ]. Both referred to plausibility or logic checks of the data and/or statistics.
One article recommends that reviewers try to obtain further data from the included studies, where required [ 39 ], while one makes an optional recommendation [ 36 ] and another a general statement without a specific recommendation [ 37 ]. One of the articles also makes recommendations on the expertise of the reviewers that should be involved in the development of the extraction form. It recommends that all members of the team are involved including data extractors, content area experts, statisticians and reviewers with formal training in form design such as epidemiologists [ 37 ].
Four articles recommend that reviewers should pilot test their extraction form [ 36 , 37 , 38 , 40 ]. Three articles recommend training of data extractors [ 37 , 38 , 40 ]. One recommends that reviewers informally assess the reliability of coding during the piloting process [ 37 ]. One article mentions the possibility of quantifying agreement during the piloting process, without making a specific recommendation or specifying any thresholds [ 40 ].
Three articles recommend that data are extracted by two reviewers, in each case using independent parallel extraction [ 36 , 37 , 38 ]. Citing the IoM standards, one article also mentions the possibility of a using independent parallel extraction for critical data and a double-checking procedure for non-critical data [ 37 ]. One article recommends that the principle reviewer runs regular logic checks to validate the extracted data [ 37 ]. One article also mentions the possibility that the reliability of extraction may need to be reviewed throughout the extraction process in case of extended coding periods [ 40 ].
Two articles mention the need to have a procedure in place for resolving disagreements, either with a hierarchical procedure using discussion and arbitration with a third person [ 36 ] or by discussion and review of the source document [ 37 ]. One article recommends that disagreements and consensus results are documented for future reference [ 37 ]. Finally, one article mentions advantages of having data extractors with complementary expertise such as a content expert and method experts, but does not make a clear recommendations on this [ 37 ].
We reviewed current recommendations on data extraction methods in systematic reviews across a different range of sources. Our results suggest that current recommendations are fragmented. Very few documents made comprehensive recommendations. This may be detrimental to the quality of systematic reviews and makes it difficult to aspiring reviewers to prepare high quality data extraction forms and ensure reliable and valid extraction procedures. While our review cannot show that improved recommendations will truly have an impact on the quality of systematic reviews, it seems reasonable to assume that clear and comprehensive recommendations are a prerequisite to high quality data extraction, especially for less experienced reviewers.
There were some notable exceptions to our findings. Among the most comprehensive documents were the Cochrane Handbook for Systematic Reviews, the textbook by Foster and colleagues and the journal article by Li and colleagues [ 18 , 24 , 37 ]. We believe that these are among the most helpful resources for systematic reviewers from the pool of documents that we analysed – not only because they provide in-depth information, but also for being among the most current sources.
We were particularly surprised by the lack of information provided by HTA agencies. Only very few HTA agencies had documents with relevant recommendations at all. Since many HTA agencies publish detailed documents on many other methodological aspects such as search screening methods, risk of bias assessments or evidence grading methods, it would seem reasonable to provide more information on data extraction methods.
We believe there would be many practical benefits of developing clearer recommendations for the development and testing of extraction forms and the data extraction process. One reason is that data extraction is one of the most resource intensive parts of a systematic review – especially, when the review includes a significant number of studies and/or outcomes. Having a good extraction form can also save time at later stages of the review. For example, a poorly developed extraction form may lead to extensive revisions during the review process and may require reviewers to go back to the original sources or repeat extraction on some included studies. Furthermore, some methodological standards such as independent parallel extraction could be modified to save resources. This is not reflected in most of the sources included in our review. Lastly, it would be helpful to specify recommendations further to accommodate for systematic reviews of different sizes, both in terms of the number of included studies and the review team. While the general quality standards should remain the same, a mega-review with several tens or even hundreds of studies, a large, heterogeneous or international review team and several data extractors may differ in some requirements from a small review with few studies and a small, local team [ 12 , 37 ]. For example, training and piloting may need more time to achieve agreement. We therefore encourage developers of guidelines documents for systematic reviews to provide more comprehensive recommendations on developing and piloting data extraction forms and the data extraction process. Our review can be used as a starting point. Formal development of structured guidance or a set of minimum standards on data extraction methods in systematic reviews may also be useful. Moher and colleagues have developed a framework to support the development of guidance to improve reporting, which includes literature reviews and a Delphi study and provides a helpful starting point [ 41 ]. Lastly, authors of reporting guidelines for systematic reviews of various types can use our results to consider elements worth including.
To some extent the results reflect the empirical evidence from comparative methods research. For example, among the most frequent recommendations were that data extraction should be conducted by two reviewers to reduce risk of errors, which is supported by some evidence [ 11 ]. This is also true for the recommendation that additional data should be retrieved if necessary, which reflects selective outcome reporting [ 42 ]. At the same time, we found few recommendations on reviewer expertise, for which empirical studies have produced inconsistent results [ 11 ]. Arguably, some items in our analysis have theoretical rather than empirical foundations. For instance, we would consider the inclusion of content experts in the development of the extraction forms to be important to enhance clinical relevance and applicability. Even this is a somewhat contested issue, however. Gøtzsche and Ioannidis, for instance, have questioned the value of involving content experts in systematic reviews [ 43 ]. In their analysis, they highlight the lack of evidence on the effects of involving them and in addition to the possible benefits raise potential downsides of expert involvement – notably that experts often have conflicts of interest and strong prior opinions that may introduce bias. While we do not argue against involvement of content experts since conflicts of interest can be managed, the controversy shows that this in fact may be an issue worth exploring empirically [ 44 ]. Thus, in addition to providing more in-depth recommendations for systematic reviewers, empirical evaluations of extraction methods should be encouraged. Such method studies should be based on a systematic review of the current evidence and overcome some of the limitations from previous investigations including the use of convenience samples and small sets of reviewers [ 11 ].
As a final note, some parts of systematic reviews can now be assisted by automation methods. Examples include enhanced study selection using learning algorithms (e.g. implemented in Rayyan) and assisted risk of bias assessments using RobotReviewer [ 45 , 46 ]. However, not all of the software solutions are free and some are still in their early development or have not been validated yet. Furthermore, some of them are restricted to specific review types [ 47 ]. To the best of our knowledge comprehensive tools to assist with data extraction, including for example extraction of outcome data, are no yet available [ 48 ]. For example, a recent systematic review conducted with currently available automation tools used traditional spreadsheet-based data extraction forms and piloting methods [ 49 ]. The authors identified two issues regarding data extraction that could be assisted by automation methods: contacting authors of included studies for additional information using metadata and better integration of software tools to automatically exchange data between different software. Thus, much work is still to be done in this area. Furthermore, when automation tools for data extraction become available, they will need to be readily available, usability tested, accepted by systematic reviewers and validated before widespread use (validation is especially important for technically complex or critical tasks) [ 50 ]. It is also likely that they will complement current data extraction methods rather than replace them as it is currently the case for automated risk of bias assessments of randomised trials [ 46 ]. For these reasons we believe that traditional data extraction methods will still be required and used in the future.
There are some limitations to our methods. Firstly, our review is not exhaustive. The list of handbooks from SROs was compiled based on previous research and discussions between the authors, but no formal search was conducted to identify other potentially relevant organisations [ 51 , 52 ]. The list of textbooks was also based on a previous study not intended to cover the literature in full. It does, however, include textbooks from a range of disciplines including medicine, nursing, education and the social sciences, which arguably increases the generalisability of the findings. The search strategy for our database search was pragmatic for reasons stated in the methods and may have missed some relevant articles. Furthermore, the databases searched focus on the field of medicine and health, so other areas may be underrepresented.
Secondly, searching the websites of HTA agencies proved difficult in some instances, as some websites have quite intricate site structures. Furthermore, we did not contact the HTA agencies to retrieve unpublished documents. It is likely that at least some HTA agencies have internal documents that provide more specific recommendations. Our focus was the usefulness of the HTA method documents as a guidance to systematic reviewers outside of HTA institutions, however. For this purpose, we believe that the assumption is appropriate that most reviewers are likely to depend on the information directly accessible to them.
Thirdly, it was difficult to classify some of the recommendations using our coding scheme. For example, recommendations in the new Cochrane Handbook are based on Cochrane’s Methodological Expectations for Cochrane Intervention Reviews Standards (MECIR) which make a subtle differentiation between mandatory and highly desirable recommendations. In this case we considered both these types of recommendations as positive in our classification scheme. To use a more difficult example, one HTA method document did not make a statement on the number of reviewers involved in data extraction but stated that a third investigator may check a random sample of extracted data for additional quality assurance. This would imply that data extraction is conducted by two reviewers independently, but since this method was not stated, it was classified as “method not mentioned”. While some judgements were required, we have described notable cases in the results section and do not believe that different decisions in these cases would affect our overall results or conclusions.
Lastly, we note that some of the included sources referenced more comprehensive guidance such as the Cochrane Handbook. We have not formally extracted information on cross-referencing between documents, however.
Many current methodological guidance documents for systematic reviewers lack comprehensiveness and clarity regarding the development and piloting of data extraction forms and the data extraction process. In the future, developers of learning resources should consider providing more information and guidance on this important part of the systematic review process. Our review and list of items may be a helpful starting point. HTA agencies may consider describing in more detail their published methods on data extraction procedures to increase transparency.
The datasets used and analysed for the current study are available from the corresponding author on reasonable request.
Cochrane Methodology Register
Centre for Reviews and Dissemination
European Network for Health Technology Assessment
Health Technology Assessment
Health Technology Assessment international
The collaborative research network of Health Technology Assessment agencies in the Asia-Pacific region
International Network of Agencies for Health Technology Assessment
Institute of Medicine
Joanna Briggs Institute
Preferred Reporting Items for Systematic Reviews and Meta-Analyses
Red de Evaluación de Tecnologías en Salud de las Américas (Health Technology Assessment Network of the Americas)
Scientific Resource Center’s Methods Library
Systematic Review Organisations
Sackett DL, Rosenberg WMC, Gray JAM, Haynes RB, Richardson WS. Evidence based medicine: what it is and what it isn’t. BMJ. 1996;312:71–2.
Article CAS PubMed PubMed Central Google Scholar
Guyatt G, Rennie D, Meade MO, Cook DJ, editors. Users’ guides to the medical literature: a manual for evidence-based clinical practice. 3rd ed. New York: McGraw-Hill Education Ltd; 2015.
Google Scholar
Khan KS, Kunz R, Kleijnen J, Antes G. Five steps to conducting a systematic review. J R Soc Med. 2003;96:118–21.
Article PubMed PubMed Central Google Scholar
Montori VM, Swiontkowski MF, Cook DJ. Methodologic issues in systematic reviews and meta-analyses. Clin Orthop Relat Res. 2003;413:43–54.
Article Google Scholar
Mathes T, Klaßen P, Pieper D. Frequency of data extraction errors and methods to increase data extraction quality: a methodological review. BMC Med Res Methodol. 2017;17:152.
Gøtzsche PC, Hróbjartsson A, Maric K, Tendal B. Data extraction errors in meta-analyses that use standardized mean differences. JAMA. 2007;298:430–7.
PubMed Google Scholar
Glasziou P, Meats E, Heneghan C, Shepperd S. What is missing from descriptions of treatment in trials and reviews? BMJ. 2008;336:1472–4.
Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ. 2009;339:b2535.
Krnic Martinic M, Pieper D, Glatt A, Puljak L. Definition of a systematic review used in overviews of systematic reviews, meta-epidemiological studies and textbooks. BMC Med Res Methodol. 2019;19:203.
Van der Mierden S, Tsaioun K, Bleich A, Leenaars CHC. Software tools for literature screening in systematic reviews in biomedical research. ALTEX. 2019;36:508–17.
Robson RC, Pham B, Hwee J, Thomas SM, Rios P, Page MJ, et al. Few studies exist examining methods for selecting studies, abstracting data, and appraising quality in a systematic review. J Clin Epidemiol. 2019;106:121–35.
Article PubMed Google Scholar
Elamin MB, Flynn DN, Bassler D, Briel M, Alonso-Coello P, Karanicolas PJ, et al. Choice of data extraction tools for systematic reviews depends on resources and review complexity. J Clin Epidemiol. 2009;62:506–10.
Ciani O, Buyse M, Garside R, Pavey T, Stein K, Sterne JAC, et al. Comparison of treatment effect sizes associated with surrogate and final patient relevant outcomes in randomised controlled trials: meta-epidemiological study. BMJ. 2013;346:f457.
Haslam A, Hey SP, Gill J, Prasad V. A systematic review of trial-level meta-analyses measuring the strength of association between surrogate end-points and overall survival in oncology. Eur J Cancer. 2019;106:196–211.
Pfadenhauer LM, Gerhardus A, Mozygemba K, Lysdahl KB, Booth A, Hofmann B, et al. Making sense of complexity in context and implementation: the Context and Implementation of Complex Interventions (CICI) framework. Implement Sci. 2017;12:21.
Aromataris E, Munn Z, editors. Joanna Briggs Institute reviewer's manual: The Joanna Briggs Institute; 2017. https://reviewersmanual.joannabriggs.org/ . Accessed 04 June 2020.
Centre for Reviews and Dissemination. CRD’s guidance for undertaking reviews in health care. York: York Publishing Services Ltd; 2009.
Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al. Cochrane Handbook for Systematic Reviews of Interventions version 6.0: Cochrane; 2019. www.training.cochrane.org/handbook . Accessed 04 June 2020.
Institute of Medicine. Finding what works in health care: standards for systematic reviews. Washington, DC: The National Academies Press; 2011.
Bettany-Saltikov J. How to do a systematic literature review in nursing: a step-by-step guide. Berkshire: McGraw-Hill Education; 2012.
Booth A, Papaioannou D, Sutton A. Systematic approaches to a successful literature review. London: Sage Publications Ltd; 2012.
Cooper HM. Synthesizing research: a guide for literature reviews. Thousand Oaks: Sage Publications Inc; 1998.
Egger M, Smith GD, Altman DG. Systematic reviews in health care: meta-analysis in context. 2nd ed. London: BMJ Publishing Group; 2001.
Book Google Scholar
Foster MJ, Jewell ST. Assembling the pieces of a systematic review: a guide for librarians. Lanham: Rowman & Littlefield; 2017.
Holly C, Salmond S, Saimbert M. Comprehensive systematic review for advanced nursing practice. New York: Springer Publishing Company; 2012.
Mulrow C, Cook D. Systematic reviews: synthesis of best evidence for health care decisions. Philadelphia: ACP Press; 1998.
Petticrew M, Roberts H. Systematic Reviews in the Social Sciences: A Practical Guide. Malden: Blackwell Publishing; 2008.
Pope C, Mays N, Popay J. Synthesizing Qualitative and Quantitative Health Evidence. Maidenhead: McGraw Hill; 2007.
Sharma R, Gordon M, Dharamsi S, Gibbs T. Systematic reviews in medical education: A practical approach: AMEE Guide 94. Dundee: Association for Medical Education in Europe; 2015.
Fröschl B, Bornschein B, Brunner-Ziegler S, Conrads-Frank A, Eisenmann A, Gartlehner G, et al. Methodenhandbuch für health technology assessment: Gesundheit Österreich GmbH; 2012. https://jasmin.goeg.at/121/ . Accessed 19 Feb 2019.
Gartlehner G. (Internes) Manual Abläufe und Methoden: Ludwig Boltzmann Institut für Health Technology Assessment (LBI-HTA); 2007. http://eprints.aihta.at/713/ . Accessed 19 Feb 2019.
Health Information and Quality Authority (HIQA). Guidelines for the retrieval and interpretation of economic evaluations of health technologies in Ireland: HIQA; 2014. https://www.hiqa.ie/reports-and-publications/health-technology-assessments/guidelines-interpretation-economic . Accessed 19 Feb 2019.
Institute for Clinical and Economic Review (ICER). A guide to ICER’s methods for health technology assessment: ICER; 2018. https://icer-review.org/methodology/icers-methods/icer-hta-guide_082018/ . Accessed 19 Feb 2019.
International Network of Agencies for Health Technology Assessment (INAHTA). A checklist for health technology assessment reports: INAHTA; 2007. http://www.inahta.org/hta-tools-resources/briefs/ . Accessed 19 Feb 2019.
Malaysian Health Technology Assessment Section (MaHTAS). Manual on health technology assessment. 2015. https://www.moh.gov.my/moh/resources/HTA_MANUAL_MAHTAS.pdf?mid=636 .
Furlan AD, Malmivaara A, Chou R, Maher CG, Deyo RA, Schoene M, et al. 2015 Updated Method Guideline for Systematic Reviews in the Cochrane Back and Neck Group. Spine. 2015;40:1660–73.
Li T, Vedula SS, Hadar N, Parkin C, Lau J, Dickersin K. Innovations in data collection, management, and archiving for systematic reviews. Ann Intern Med. 2015;162:287–94.
Munn Z, Tufanaru C, Aromataris E. JBI’s systematic reviews: data extraction and synthesis. Am J Nurs. 2014;114:49–54.
Pullin AS, Stewart GB. Guidelines for systematic review in conservation and environmental management. Conserv Biol. 2006;20:1647–56.
Stock WA, Goméz Benito J, Balluerka LN. Research synthesis. Coding and conjectures. Eval Health Prof. 1996;19:104–17.
Article CAS PubMed Google Scholar
Moher D, Schulz KF, Simera I, Altman DG. Guidance for developers of health research reporting guidelines. PLoS Med. 2010;7:e1000217.
Kirkham JJ, Dwan KM, Altman DG, Gamble C, Dodd S, Smyth R, et al. The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews. BMJ. 2010;340:c365.
Gøtzsche PC, Ioannidis JPA. Content area experts as authors: helpful or harmful for systematic reviews and meta-analyses? BMJ. 2012;345:e7031.
Agoritsas T, Neumann I, Mendoza C, Guyatt GH. Guideline conflict of interest management and methodology heavily impacts on the strength of recommendations: comparison between two iterations of the American College of Chest Physicians Antithrombotic Guidelines. J Clin Epidemiol. 2017;81:141–3.
Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan—a web and mobile app for systematic reviews. Syst Rev. 2016;5:210.
Marshall IJ, Kuiper J, Wallace BC. RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. J Am Med Informatics Assoc. 2016;23:193–201.
Beller E, Clark J, Tsafnat G, Adams C, Diehl H, Lund H, et al. Making progress with the automation of systematic reviews: principles of the International Collaboration for the Automation of Systematic Reviews (ICASR). Syst Rev. 2018;7:77.
O’Connor AM, Glasziou P, Taylor M, Thomas J, Spijker R, Wolfe MS. A focus on cross-purpose tools, automated recognition of study design in multiple disciplines, and evaluation of automation tools: a summary of significant discussions at the fourth meeting of the International Collaboration for Automation of Systematic R. Syst Rev. 2020;9:100.
Clark J, Glasziou P, Del Mar C, Bannach-Brown A, Stehlik P, Scott AM. A full systematic review was completed in 2 weeks using automation tools: a case study. J Clin Epidemiol. 2020;121:81–90.
O’Connor AM, Tsafnat G, Thomas J, Glasziou P, Gilbert SB, Hutton B. A question of trust: can we build an evidence base to gain trust in systematic review automation technologies? Syst Rev. 2019;8:143.
Cooper C, Booth A, Britten N, Garside R. A comparison of results of empirical studies of supplementary search techniques and recommendations in review methodology handbooks: a methodological review. Syst Rev. 2017;6:234.
Cooper C, Booth A, Varley-Campbell J, Britten N, Garside R. Defining the process to literature searching in systematic reviews: a literature review of guidance and supporting studies. BMC Med Res Methodol. 2018;18:85.
Download references
We thank information specialist Simone Hass for peer reviewing the search strategy and conducting searches.
No funding was received. Open Access funding enabled and organized by Projekt DEAL.
Authors and affiliations.
Institute for Research in Operative Medicine (IFOM), Faculty of Health - School of Medicine, Witten/Herdecke University, Ostmerheimer Str. 200, 51109, Cologne, Germany
Roland Brian Büchter, Alina Weise & Dawid Pieper
You can also search for this author in PubMed Google Scholar
Study design: RBB, DP. Data extraction: RBB, AW. Data analysis and interpretation: RBB, DP, AW. Writing the first draft of the manuscript: RBB. Revisions of the manuscript for important intellectual content: RBB, DP, AW. Final approval of the manuscript: RBB, DP, AW. Agree to be accountable for all aspects of the work: RBB, DP, AW. Guarantor: RBB.
Correspondence to Roland Brian Büchter .
Ethics approval and consent to participate.
Not applicable.
Competing interests.
The authors declare that they have no competing interests.
Publisher’s note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file 1..
List of HTA websites searched.
Information on database searches
List of items and rationale
List of included documents
Recommendations for non-interventional reviews
Primary analysis
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Reprints and permissions
Cite this article.
Büchter, R.B., Weise, A. & Pieper, D. Development, testing and use of data extraction forms in systematic reviews: a review of methodological guidance. BMC Med Res Methodol 20 , 259 (2020). https://doi.org/10.1186/s12874-020-01143-3
Download citation
Received : 11 June 2020
Accepted : 07 October 2020
Published : 19 October 2020
DOI : https://doi.org/10.1186/s12874-020-01143-3
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
ISSN: 1471-2288
BMC Oral Health volume 24 , Article number: 702 ( 2024 ) Cite this article
Metrics details
Knowledge about patient safety in orthodontics is scarce. Lack of standardisation and a common terminology hinders research and limits our understanding of the discipline. This study aims to 1) summarise current knowledge about patient safety incidents (PSI) in orthodontic care by conducting a systematic literature search, 2) propose a new standardisation of PSI terminology and 3) propose a future research agenda on patient safety in the field of orthodontics.
A systematic literature search was performed in the main online sources of PubMed, Web of Science, Scopus and OpenGrey from their inception to 1 July 2023. Inclusion criteria were based on the World Health Organization´s (WHO) research cycle on patient safety. Studies providing information about the cycle’s steps related to orthodontics were included. Study selection and data extraction were performed by two of the authors.
A total of 3,923 articles were retrieved. After review of titles and abstracts, 41 articles were selected for full-text review and 25 articles were eligible for inclusion. Seven provided information on the WHO’s research cycle step 1 (“measuring harm”), twenty-one on “understanding causes” (step 2) and twelve on “identifying solutions” (step 3). No study provided information on Steps 4 and 5 (“evaluating impact” or “translating evidence into safer care”).
Current evidence on patient safety in orthodontics is scarce due to a lack of standardised reporting and probably also under-reporting of PSIs. Current literature on orthodontic patient safety deals primarily with “measuring harms” and “understanding causes of patient safety”, whereas less attention has been devoted to initiatives “identifying solutions”, “evaluating impact” and “translating evidence into safer care”. The present project holds a proposal for a new categorisation, terminology and future research agenda that may serve as a framework to support future research and clinical initiatives to improve patient safety in orthodontic care.
PROSPERO (CRD42022371982).
Peer Review reports
For decades, patient safety has been recognised as a healthcare discipline. However, the awareness-raising publication of “To Err Is Human” by the Institute of Medicine Committee on Quality of Health Care in the US drew considerable attention to this important aspect of healthcare [ 1 , 2 ]. In this publication, experts estimated that in the US in any given year as many as 98,000 people die from medical errors that occur in hospitals [ 1 ]. The definition of patient safety by the World Health Organization (WHO) from 2009 is: “the freedom for a patient from unnecessary harm or potential harm related to healthcare” [ 2 ]. Similarly, in their report, Kohn et al. recognised safety as “freedom from accidental injury” [ 1 ]. In this context, a patient safety incident (PSI) is an event or circumstance that could have resulted or did result in unnecessary harm to a patient [ 2 ].
Patient safety is a crucial aspect of healthcare that seeks to minimise preventable harm, accidents, complications and adverse events (AEs). AEs are defined as injuries resulting from poor management practices that could have been prevented but are not attributed to an underlying disease process [ 2 , 3 ]. The WHO classifies certain AEs as "never events", which are serious incidents that should not occur given the presence of strong systemic safety measures [ 4 ]. Never events can have a profound impact on patients, and their prevention is a key objective of healthcare organisations. In this context, patient safety aims to limit the impact of AEs adverse events and promote the avoidance of preventable harm.
Patient safety is a priority from the patient’s perspective, and for care providers it falls in line with the Hippocratic Oath ("primum non nocere"), which is an important element of modern healthcare. Patient safety initiatives analyse characteristics and features of healthcare systems that may lead to the occurrence of AEs. These features are latent risks that may be of any nature from a soft tissue laceration or a loose wire to inhalation of an orthodontic appliance [ 5 ]. Throughout most healthcare treatment courses, multiple latent risks exist and this makes patient safety multifactorial and complex. When an AE occurs, patient safety does not aim to punish but rather to investigate how and why the protective barriers failed [ 6 , 7 ].
Improving the quality of care is a road that passes through patient safety. Additionally, patient safety has additional psychosocial and financial benefits. Dealing with the consequences of an adverse event has an economic cost to the practitioner, the patient and society. By improving patient safety, dental practitioners increase their quality of care, which is associated with safer and better treatment outcomes [ 8 , 9 , 10 ]. In addition, it affords increased legal security by minimising the risk of legal claims [ 6 ].
Knowledge about patient safety in dental care and orthodontics in particular is scarce. The absence of patient safety guidelines in orthodontics is a major concern. This issue is further complicated by the absence of standardized terminology in the field, challenging the development of consistent safety protocols. Additionally, there is a noticeable lack of research and publications in this area, which hinders progress in developing effective, evidence-based strategies to ensure patient safety in orthodontic care [ 11 ]. Therefore, an urgent need exists for studies in the field of orthodontics in particular [ 2 , 3 , 12 ]. Among others, the lack of a common language among orthodontic caregivers ultimately hinders research and limits our understanding of the discipline [ 13 , 14 ]. The aims of this study were to 1) summarise current knowledge about PSIs in orthodontic care by performing a systematic literature search; 2) propose a new standardisation of PSI terminology; 3) propose a research agenda on patient safety in the field of orthodontics that may serve to further develop and provide direction for future research on the subject.
Protocol and registration.
Prior to the initiation of the project, the study protocol was registered with PROSPERO (reg. no. CRD42022371982). No ethical approval was deemed necessary.
A systematic literature search was performed in the main online sources of MEDLINE (through PubMed), Web of Science, Scopus as well as the System for Information on Grey Literature in Europe (Open-Grey) from their inception to 1 July 2023. No language limitation was set in the search, and all types of eligible human studies were included.
The inclusion criteria for articles were based on the WHO research cycle on patient safety [ 15 , 16 ]. The various steps of the cycle aim to measure harm and identify causes while identifying solutions to improve patient safety. The ultimate goal is to translate evidence into safer care (Fig. 1 ). Only studies that provided relevant information in at least one of the following categories were eligible for inclusion in this systematic review:
Measuring harm: Studies characterising and/or reporting on the occurrence of AEs or orthodontic-related patient harm.
Understanding causes: Reports focusing on understanding causes leading to patient harm and AEs from orthodontic care.
Identifying solutions: Studies identifying solutions that are effective in reducing the occurrence of AEs and patient harm.
Evaluating impact: Studies evaluating the effectiveness of solutions in terms of impact, affordability and acceptability.
The World Health Organization’s research cycle on patient safety consisting of five steps with the main goal of measuring harm and its causes while identifying solutions and their impact. Ultimately, this evidence should lead to safer care with a set of actions and preventable measures
Only full-text articles were included. In addition, studies dealing with patient safety from a general dental-care perspective were included only if they were directly relevant to orthodontic care and the WHO research cycle. For example, although studies on oral surgery were excluded, wrong-tooth-extraction studies or articles investigating the light-curing safety on patients were included owing to their relevance to orthodontics.
The following MESH terms were used for the systematic search:
(((orthodontic*) OR (dental)) AND (patient safety)) AND ((((((((((((((((((((((((((harm) OR (risk*)) OR (malpractice)) OR (adverse event*)) OR (adverse effect*)) OR (never event*)) OR (iatrogenic)) OR (damage)) OR (incident*)) OR (accident*)) OR (delay* diagnos*)) OR (misdiagnosis)) OR (complication*)) OR (allerg*)) OR (infection)) OR (failure)) OR (error*)) OR (white spot lesion*)) OR (root resorption)) OR (relapse)) OR (decalcification)) OR (caries)) OR (periodontal disease)) OR (nerve damage)) OR (injury)) OR (temporomandibular joint dysfunction)).
After removal of duplicates, all results returned from the systematic literature search were initially screened by their title to establish their relevance. The second filtering decided relevance for inclusion based on the content of the abstract. Finally, the third filtering level was applied to the main text, and the remaining studies were then included in the review. All screening was performed independently by one of the authors (NF) and was later re-checked by another author (PS). Any disputes in study selection were addressed and resolved through discussion between the reviewing authors. On all included studies the main outcome/result was recorded. This was studies investigating prevalence (“measuring harm”- step 1) or assessing contributing factors (“understanding causes”-step 2). For all studies providing information on the cycle’s step 3 (“identifying solutions”), all recommended solutions to prevent harm were also noted. Due to the nature of the data in the included studies, no risk of bias assessment was possible. For the same reason, no quantitative synthesis and meta-analysis was performed. Based on these findings, the intention to conduct a systematic review was revised to a scoping literature review instead [ 17 ].
A total of 3,923 studies were identified from the systematic search and imported into Excel (Microsoft®, USA) (PubMed n = 2,049, Web of Science n = 663, Scopus n = 1203 and OpenGrey n = 8). Among the 3,923 articles, 237 were deemed relevant according to the inclusion criteria after screening their titles. Filtering by abstracts, left 41 articles for inclusion after removal of the duplicates. In one case, the full-text of an article was unavailable and it was therefore excluded [ 18 ]. Three relevant articles found in the reference lists were also added [ 4 , 14 , 19 ]. Finally, 25 studies were included as they were found to provide information within any of the categories of the WHO’s research cycle on patient safety related to the orthodontic field (flowchart presented in Fig. 2 ).
PRISMA flowchart diagram of the systematic literature search and inclusion procedure
Study characteristics are shown in Table 1 . Nine of the included papers were retrospective studies of AEs studying: eye wear protection and ocular trauma in orthodontic practice [ 19 ], clinical evaluation of a locking orthodontic facebow [ 20 ], adverse reactions to dental materials [ 3 ], case reports of latex allergy [ 21 ], wrong tooth extraction claims [ 4 ], dental and orthodontic PSIs in a UK register [ 7 ] and a Finnish register [ 8 ], adverse reactions to dental devices reported at the US Food and Drug Administration [ 9 ] and investigation of monomer release from orthodontic adhesives [ 22 ].
The remaining sixteen studies reported risk assessments of orthodontic procedures or materials. These included safety assessment of dental radiography [ 23 ], bonding of brackets under general anaesthesia [ 24 ], orthodontic facebows [ 10 ], mini-implants [ 12 , 25 , 26 ], soft-tissue lasers in orthodontics [ 13 ], effect of orthodontic treatment on patients’ diet [ 14 ], eye safety of curing lights [ 27 ], safety of metal fixed appliance during magnetic resonance imaging (MRI) [ 28 ], pulp safety of various types of curing lights [ 29 ], wrong tooth extraction in orthodontics [ 30 , 31 , 32 ], orthodontic treatment by identifying orthodontic never events [ 33 ] and complications after orthognathic surgery [ 34 ]. These studies identified risks in orthodontic procedures or materials and proposed solutions to manage and minimise these risks.
Measuring harm.
Seven of the studies included provided information in the first category of the WHO’s research cycle on patient safety, which is “measuring harm” [ 4 , 7 , 8 , 9 , 19 , 22 , 34 ]. Sims et al. conducted a postal survey on eye protection in the UK and found that ocular injuries were reported in 37.7% of all respondents involving orthodontists, assistants and patients [ 19 ]. Peleg et al. conducted a root-cause analysis of wrong-tooth extraction in 54 insurance claims in Israel and reported that in two thirds of all claims an identification error was the cause of the incorrect tooth extraction [ 4 ]. Also, a cross-sectional study on PSIs in the UK found that orthodontic PSIs accounted for 8.9% of all reported dental PSIs in the country [ 7 ]. Hebballi et al. investigated the frequency and types of AEs associated with dental devices as reported to the Food and Drug Administration and User Facility Device Experience (MAUDE) in the US [ 9 ]. They reported that orthodontic appliances and accessories accounted for 1% of all AEs involving dental devices. In a similar investigation in hospital and private settings in Finland, Hiivala et al. reported that orthodontic PSIs accounted for 3.6% of all dental PSIs [ 8 ]. Finally, a multi-centre retrospective review of orthognathic surgeries assessing complications and risk factors studied a population of 674 patients [ 34 ]. They reported that adverse events were rare (4.3%) with superficial incisional infection being the most common. They also concluded that the setting, the type of surgery as well as the patients’ ethnicity were identified as risk factors for some types of complications.
Twenty-one of the included studies identified the underlying causes of AEs that caused patient harm (WHO’s Category 2 “Understanding the causes”) [ 3 , 4 , 7 , 10 , 12 , 13 , 14 , 19 , 20 , 21 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 33 , 34 ]. In addition, twelve studies identified possible solutions that may be effective in reducing the occurrence of AEs (WHO’s Cycle Category 3 “Identifying solutions”) [ 4 , 10 , 12 , 13 , 19 , 20 , 21 , 23 , 24 , 25 , 31 , 32 ]. These solutions included: health and safety instructions for eye-protection goggles to prevent ocular trauma [ 19 ], use of non-latex materials [ 21 ], clear instructions with a brief description of the tooth to be extracted addressed to the clinician using two different identification methods to prevent wrong-site extraction and use of a computerised checklist [ 4 , 31 , 32 ], use of facebows with a locking mechanism and self-releasing head strap to prevent injuries from headgear [ 10 , 20 ], suggestions to improve safety in dental radiography [ 23 ], use of rubber dam during bonding of brackets under general anaesthesia [ 24 ], recommendations to overcome failures and risks during placement, loading and removal of mini-implants [ 12 , 25 ] and, finally, instructions for safe use of soft-tissue lasers in orthodontics recommending that the clinician obtained appropriate training and certification, use of proper eye wear by all involved parties, obtaining informed consent and providing proper post-operative instructions [ 13 ].
None of the included studies provided information on how to evaluate the impact of such solutions or on how to translate evidence into safer care in terms of affordability and acceptability. Data synthesis and meta-analysis was not possible due to the heterogeneity of the different studies and the nature of the data.
To our knowledge, this is the first systematic investigation of patient safety in orthodontics. The lack of evidence in the field manifests in our results. Twenty-five studies were included in this review and these studies were only peripherally related to orthodontics while providing some information based on the WHO’s research cycle. This cycle describes a process to identify solutions for enhancing patient safety and reducing patient harm. It consists of five steps representing the natural process for patient-safety initiatives. It seems that dentistry in general and orthodontics in particular have yet to take even the initial steps of the cycle (steps 1 and 2), which are to measure the harm and understand the causes of harm [ 16 ]. This is evident from the results as the included studies were either reviews of risks associated with specific orthodontic procedures (like mini-implant insertion, soft-tissue laser, facebow use, etc.) or retrospective reviews of AEs peripherally related to orthodontics (incidence of ocular trauma, adverse reactions to materials, etc.).
The results of this review document that current evidence relating to orthodontics is scarce. Without a basic understanding of PSIs and harms we cannot begin to understand the causes and identify solutions that will subsequently translate into safer care for our patients [ 16 ]. A major limitation to this is a trend towards potential under-reporting of PSIs in our field. In fact, a review of the National Patient Safety Agency (NPSA) database in the UK revealed that orthodontics is among the lowest reporting specialties along with dental surgery and paediatric dentistry [ 35 ]. A contributing factor in this may be the lesser severity of some PSIs in orthodontics, which may be smaller injuries like soft-tissue laceration from loose wires [ 16 ]. One way to overcome the underreporting issues may be effective keeping of patient records and clinical notes, which may prove an essential tool in clinical audits and will also underpin the reporting of more AEs [ 36 ]. Also, the lack of standardisation in terminology and reporting process of AEs makes it challenging if not impossible to summarise and categorise all PSIs in orthodontics, let alone analyse the data in depth.
Additionally, we hypothesise that an underreporting bias may exist between dental specialities. Dental implants are more expensive and dentists and/or patients may therefore report them more often when asking for replacements [ 9 ]. This leads, e.g., to many more reported PSIs for implants than for burs. Finally, another contributing factor in the lack of evidence on patient safety is the overlap found in some areas within dentistry. This makes it more challenging to precisely measure AEs in only one field. A clear example of this is the AE of wrong-tooth extraction for orthodontic reasons, which may fall in both the orthodontic and surgical category.
The lack of a standardised terminology and reporting of PSIs in orthodontics seems to hinder any effort to summarise and categorise PSIs, which could be a reasonable first research step to enhance our knowledge in this field. For future work in this field, we therefore suggest that PSIs related to orthodontics may be summarised into two main categories; local and systemic. Categorisation with subcategories and examples are shown in Table 2 . Terminology according to the WHO is proposed in Table 3 .
Local PSIs refer to any harm on dental tissues (root resorption, white spot lesions, pulp necrosis, caries) and soft tissues. This may be damage to both periodontal and surrounding soft tissues that could have been avoided (gingival recessions, soft tissue lacerations, local allergic reaction/contact dermatitis). In addition, local PSIs include treatment injuries with a negative effect on orofacial function. This may be development of lip catch as a result of orthodontic treatment. Finally, any harm related to any unwanted tooth movement is also included in this category. This may be unwanted tooth movement due to an active retainer.
Systemic PSIs refer to harm at a systemic level. This may be excessive pain and discomfort as a result of the orthodontic treatment due to a defective appliance or even hypersensitivity due to excessive interproximal reduction. In addition, systemic PSIs include potential emotional damage to patients. This may be development of general discomfort/odontophobia/mistrust towards the clinician or the healthcare system or deterioration of the oral health-related quality of life (OHRQoL). Systemic PSIs may be a result of delayed treatment initiation due to delayed/inadequate diagnosis. Finally, harm caused by poor cross-infection control, inhalation of orthodontic parts and extraction of a wrong tooth are also considered systemic PSIs.
A proposal for a future research agenda in orthodontic patient safety is shown in Table 4 . The agenda is intended as inspiration to promote future research and development in patient safety in orthodontics. It should not be considered absolute as topics other than those listed may be of interest for future patient safety initiatives. Two main categories of studies are presented in Table 4 : Retrospective or prospective studies dealing with patient safety (26).
Retrospective studies are reactive in nature and focus on the incidence, characteristics and severity of PSIs using an acknowledged methodology such as journal file audit and root cause analysis (RCA) (26,27). They investigate PSIs that have already occurred with the intention of generating knowledge to promote learning and guidance for future patient safety initiatives. RCA allows us to focus on individual PSIs and investigate, through a comprehensive analysis, all the contributing factors that lead to the occurrence of an AE.
Conversely, prospective studies assess potential risks associated with a treatment, appliance or material. The methodology in these studies is failure mode and effects analysis (FMEA) (27,28). This approach is the analysis of a method, treatment, material or procedure by first creating a risk map and then implementing measures to reduce the likelihood or impact of a PSI (27–30).
Both intrinsic and extrinsic motivation are key factors in the establishment of safer future orthodontic care. Intrinsic motivation is shaped by professional ethics, norms and patient-reported outcomes and expectations [ 1 , 37 , 38 ]. The articles included in our review, however, mainly focused on the extrinsic motivation, which refers to the environment, policies and strategies that we may develop with the ultimate goal of improving patient safety in orthodontics.
In orthodontic patient safety research, a need exists to increase our focus on this aspect and on clinical routines and administrative, organisational and legal contexts. One strategy that may help us move in this direction is to establish excellent records and clinical notes through periodical audits [ 30 ]. This will help clinicians and/or patients report more AEs in future. Honest exchange of such information between health professionals is a necessary first step and a founding rock for safer care and further research. To achieve this, it is important to establish a non-blame culture with psychological safety and a feeling of partnership, enthusiasm and commitment to improving patient safety in orthodontics [ 36 ].
Research on patient safety is more advanced in other parts of healthcare than orthodontics. Even other fields of dentistry have taken steps in this direction with the creation of checklists, i.e. in endodontics, orofacial function and oral surgery [ 39 , 40 , 41 , 42 , 43 ]. Checklists seem to have a positive effect on patient safety [ 44 , 45 , 46 ]. Most of the checklists are adaptations of the WHO’s surgical checklist that is now used in a wide range of surgical specialties in medicine [ 47 ]. Adjusting this to fit our orthodontic needs and implementing it in daily practice may be an important step towards improving safety in orthodontics [ 48 ]. In the past decade, the WHO has published several guidelines and educational curricula to enhance the level of patient safety in healthcare in general [ 49 , 50 ]. These publications may provide a starting point for the spreading of local patient safety initiatives and the introduction of educational and organisational measures to further patient safety.
Some orthodontic societies seem to have taken steps towards patient safety, however all societies in different countries need to follow and implement policies for safer care. In its core patient safety is the purpose of audit and clinical governance. Amongst other, research is a vital element in this process. Nevertheless, a limitation in this could be that clinical governance might differ from one country to another.
Traditionally, patient safety was focused on rare types of incidents with a significant degree of harm referred to as “never events” in the literature [ 51 ]. However, in recent years, more efforts have been devoted to understanding the frequency and causes of PSIs that we assume occur more frequently than is reported today [ 51 ]. The perceived threshold determining what is considered a PSI may often be vague; and the border is not absolute, particularly as we come to understand patient safety better. It is important to emphasize that common side effects (e.g., root resorption) are not considered PSIs as these side effects may also occur when a patient has undergone an optimally performed course of treatment, unless, of course, these side effects were avoidable and appropriate measures had been adopted [ 52 ]. The extent of such side effects, however, can vary and probably depends on a wide range of factors (force magnitude, treatment duration) [ 53 ]. Excessive root resorption, however, may be considered a PSI if the risk factors were not assessed before initiating treatment and if precautionary measures were not taken in advance. A step towards safer orthodontics may be to incorporate such “risk maps” routinely in systematic reviews. For example, when a systematic review compares A to B, reporting just which of the two is more efficient or faster may be insufficient. The burden and the risk of harm to the patient should also be reported. This reporting may include anything that may be considered a PSI, from excessive root resorption to increased exposure to radiation, cytotoxicity, effect on patients’ OHRQoL, late diagnosis, overtreatment, gingival recessions or bone dehiscence, etc. A cultural change in the way we approach these “side effects” and further patient-centred research will improve patient safety in our field. In addition, in today's rapidly evolving technological landscape, where new advancements outpace research capabilities, emphasizing the safety of orthodontic materials is crucial while treatment decisions need to be patient-centred, based on their perspective [ 54 ].
The strengths of this systematic review include an extensive literature search, a predefined protocol, a priori registration with PROSPERO and the adoption of a strict methodology at all study stages [ 55 ]. Also, the fact that there was no date or language limitation in the search, provided us with data that likely reflect the current understanding and knowledge about PSI in orthodontics. In addition, the proposed categorisation of PSIs in orthodontics and the future-agenda proposals may spark interest and lead to further research in the field of orthodontic patient safety.
Certain limitations need further consideration: mainly the inability to assess precise prevalence of orthodontic PSIs and categorise them accordingly. This inability is due to the poor current evidence and lack of standardisation and terminology and the fact that many PSIs are probably underreported. It can also be due to the fact that patient safety is a topic of increasing complexity, especially with the new risks arising directly from the use of new technologies [ 51 ]. Also, there is inherent risk of bias due to the nature of the studies included which were mostly retrospective [ 56 ]. Furthermore, in this study, the final selection of the included studies was consensus-based instead of individually assessing the suitability of the articles during the review process. Finally, despite thorough searching, there could be studies overlooked during the process, possibly originating from databases not encompassed in the search.
Current evidence on patient safety in orthodontics is scarce due to a lack of standardisation and potential under-reporting of PSIs. The current literature on orthodontic patient safety deals mostly with “measuring harms” and “understanding causes of patient safety”, whereas less attention has been devoted to initiatives “identifying solutions”, “evaluating impact” and “translating evidence into safer care”. The present project presents proposals for a new categorisation, terminology and a future research agenda that may serve as a framework to support future research and clinical initiatives to improve patient safety in orthodontic care.
All data generated or analysed during this study are included in this published article and its supplementary information files.
Kohn LT, Corrigan JM, Donaldson MS. To Err Is Human. Regul Toxicol Pharmacol. 2000;52:1–287.
Google Scholar
World Health Organization (WHO). Conceptual Framework for the International Classification for Patient Safety Final Technical Report. International Classification [Internet]. 2009;(January):3–101. Available from: http://www.who.int/patientsafety/taxonomy/ICPS_Statement_of_Purpose.pdf
Scott A, Egner W, Gawkrodger DJ, Hatton P V., Sherriff M, Van Noort R, et al. The national survey of adverse reactions to dental materials in the UK: A preliminary study by the UK Adverse Reactions Reporting Project. Vol. 196, British Dental Journal. Nature Publishing Group; 2004. p. 471–7.
Peleg O, Givot DMDN, Halamish-shani T, Taicher S. Wrong tooth extraction: root cause analysis. Br Dent J. 2011;210(4):163–163.
Article Google Scholar
Yamalik N, Perea PB. Patient safety and dentistry: What do we need to know? Fundamentals of patient safety, the safety culture and implementation of patient safety measures in dental practice. Int Dent J. 2012;62(4):189–96.
Article PubMed Google Scholar
Dehghanian D, Heydarpoor P, Attaran N, Khoshnevisan M. Clinical governance in general dental practice. Journal of International Oral Health. 2019;11(3):107–11.
Thusu S, Panesar S, Bedi R. Patient safety in dentistry - State of play as revealed by a national database of errors. Br Dent J. 2012;213(3):E3.
Article CAS PubMed Google Scholar
Hiivala N, Mussalo-Rauhamaa H, Tefke HL, Murtomaa H. An analysis of dental patient safety incidents in a patient complaint and healthcare supervisory database in Finland. Acta Odontol Scand [Internet]. 2016 Feb 17;74(2):81–9. Available from: https://www.tandfonline.com/doi/abs/ https://doi.org/10.3109/00016357.2015.1042040
Hebballi NB, Ramoni R, Kalenderian E, Delattre VF, Stewart DCL, Kent K, et al. The dangers of dental devices as reported in the food and drug administration manufacturer and user facility device experience database. J Am Dent Assoc. 2015;146(2):102–10.
Article PubMed PubMed Central Google Scholar
Samuels RHA, Brezniak N. Orthodontic facebows: Safety issues and current management. J Orthod. 2002;29(2):101–7.
Bailey E, Tickle M, Campbell S, O’Malley L. Systematic review of patient safety interventions in dentistry. BMC Oral Health. 2015;15(1):152.
Kravitz ND, Kusnoto B. Risks and complications of orthodontic miniscrews. Am J Orthod Dentofac Orthop. 2007;131(4):S43-51.
Kravitz ND, Kusnoto B. Soft-tissue lasers in orthodontics: An overview. Am J Orthod Dentofacial Orthop. 2008;133(4 SUPPL):S110-4.
Johal A, Abed Al Jawad F, Marcenes W, Croft N. Does orthodontic treatment harm children’s diets? J Dent. 2013;41(11):949–54.
World Health Organization. WHO patient safety research : better knowledge for safer care. 2009;12 p.
Tokede O, Walji M, Ramoni R, Rindal D, Worley D, Hebballi N, et al. Quantifying Dental Office–Originating Adverse Events: The Dental Practice Study Methods. J Patient Saf. 2017;Publish Ah(00):1–8.
Vaid N. Scoping studies: Should there be more in orthodontic literature? APOS Trends in Orthodontics. 2019;9(3):124–5.
Rak D. X-ray examinations in orthodontic diagnostics as a source of ionizing radiation. Bilten Udruzenja ortodonata Jugoslavije. Bulletin Orthod Soc Yugosl. 1989;22:37–48.
CAS Google Scholar
Sims AP, Roberts-Harry TJ, Roberts-Harry DP. The incidence and prevention of ocular injuries in orthodontic practice. Br J Orthod. 1993;20(4):339–43.
Samuels R, O’Neill J, Bhavra G, Hills D, Thomas P, Hug H, et al. A clinical evaluation of a locking orthodontic facebow. American J Orthod Dentofacial Orthop. 2000;117(3):344–50.
Article CAS Google Scholar
Raggio DP, Camargo LB, Naspitz GMCC, Bonifacio CC, Politano GT, Mendes FM, et al. Latex allergy in dentistry: Clinical cases report. J Clin Exp Dent. 2010;2(1):e55–9.
Bationo R, Jordana F, Boileau MJ, Colat-Parros J. Release of monomers from orthodontic adhesives. Am J Orthod Dentofac Orthop. 2016;150(3):491–8.
Abbott P. Are dental radiographs safe? Aust Dent J. 2000;45(3):208–13.
Chaushu S, Zeltser R, Becker A. Safe orthodontic bonding for children with disabilities during general anaesthesia. Eur J Orthod. 2000;22(3):225–8.
Suzuki M, Deguchi T, Watanabe H, Seiryu M, Iikubo M, Sasano T, et al. Evaluation of optimal length and insertion torque for miniscrews. Am J Orthod Dentofac Orthop. 2013;144(2):251–9.
Kuroda S, Tanaka E. Risks and complications of miniscrew anchorage in clinical orthodontics. Japan Dent Sci Rev. 2014;50:79–85.
McCusker N, Lee SM, Robinson S, Patel N, Sandy JR, Ireland AJ. Light curing in orthodontics; Should we be concerned? Dent Mater. 2013;29(6):e85-90.
Görgülü S, Ayyildiz S, Kamburoǧlu K, Gökçe S, Ozen T. Effect of orthodontic brackets and different wires on radiofrequency heating and magnetic field interactions during 3-T MRI. Dentomaxillofacial Radiol. 2014;43(2):20130356.
Mouhat M, Mercer J, Stangvaltaite L, Örtengren U. Light-curing units used in dentistry: factors associated with heat development—potential risk for patients. Clin Oral Investig. 2017;21(5):1687–96.
Anwar H, Waring D. Improving patient safety through a clinical audit spiral: prevention of wrong tooth extraction in orthodontics. Br Dent J. 2017;223(1):48–52.
Cullingham P, Saksena A, Pemberton MN. Patient safety: Reducing the risk of wrong tooth extraction. Br Dent J. 2017;222(10):759–63.
Jacob O, Gough E, Thomas H. Preventing wrong tooth extraction. Acta Stomatol Croat. 2021;55(3):316–24.
Jerrold L, Danoff-Rudick J. Never events in clinical orthodontic practice. Am J Orthod Dentofac Orthop. 2022;161(4):480–9.
Knoedler S, Baecher H, Hoch CC, Obed D, Matar DY, Rendenbach C, et al. Early Outcomes and Risk Factors in Orthognathic Surgery for Mandibular and Maxillary Hypo- and Hyperplasia: A 13-Year Analysis of a Multi-Institutional Database. J Clin Med. 2023;12(4):1444.
Bagley CHM, Panesar SS, Patel B, Cleary K, Pickles J. Safer cut: Revelations of surgical harm through a national database [Internet]. Vol. 71, British Journal of Hospital Medicine. MA Healthcare London; 2010. p. 484–5. Available from: https://www.magonlinelibrary.com/doi/10.12968/hmed.2010.71.9.78155
Yamalik N. Quality systems in dentistry Part 2. Quality assurance and improvement (QA/I) tools that have implications for dentistry. Int Dent J [Internet]. 2007;57(6):459–67. Available from: https://pubmed.ncbi.nlm.nih.gov/18265780/
Hua F. Dental patient-reported outcomes update 2022. J Evid Based Dent Pract. Mosby. 2023;23:1–6.
Tao Z, Zhao T, Ngan P, Qin D, Hua F, He H. The use of dental patient-reported outcomes among randomized controlled trials in orthodontics: a methodological study. J Evid Based Dent Pract. 2023;23(1): 101795.
Díaz-Flores-García V, Perea-Pérez B, Labajo-González E, Santiago-Sáez A, Cisneros-Cabello R. Proposal of a “Checklist” for endodontic treatment. J Clin Exp Dent. 2014;6(2):104–9.
Wright S, Ucer TC, Crofts G. The adaption and implementation of the WHO surgical safety checklist for dental procedures. Br Dent J. 2018;225(8):727–9.
Nenad MW, Halupa C, Spolarich AE, Gurenlian JAR. A Dental Radiography Checklist as a Tool for Quality Improvement. J Dent Hyg. 2016;90(6):386–93.
PubMed Google Scholar
Beddis HP, Davies SJ, Budenberg A, Horner K, Pemberton MN. Temporomandibular disorders, trismus and malignancy: Development of a checklist to improve patient safety. Br Dent J. 2014;217(7):351–5.
Schmitt CM, Buchbender M, Musazada S, Bergauer B, Neukam FW. Evaluation of Staff Satisfaction After Implementation of a Surgical Safety Checklist in the Ambulatory of an Oral and Maxillofacial Surgery Department and its Impact on Patient Safety. J Oral Maxillofac Surg. 2018;76(8):1616–39.
Wilson L, Walker L. The WHO surgical safety checklist: The evidence. J Perioper Pract [Internet]. 2009;19(10):362–4. Available from: https://journals.sagepub.com/doi/epdf/10.1177/175045890901901002
Weiser TG, Haynes AB, Dziekan G, Berry WR, Lipsitz SR, Gawande AA. Effect of A 19-item surgical safety checklist during urgent operations in a global patient population. Ann Surg. 2010;251(5):976–80.
Vats A, Vincent CA, Nagpal K, Davies RW, Darzi A, Moorthy K. Practical challenges of introducing WHO surgical checklist: UK pilot experience. BMJ (Online). 2010;340(7738):133–5.
World Health Organization. Tool and Resources [Internet]. WHO Surgical Safety Checklist. 2009. Available from: https://www.who.int/teams/integrated-health-services/patient-safety/research/safe-surgery/tool-and-resources
Clark S, Hamilton L. WHO surgical checklist: Needs to be customised by specialty. Vol. 340, BMJ (Online). British Medical Journal Publishing Group; 2010. p. 280.
World Health Organization (WHO). Global Patient Safety Action Plan 2021–2030 [Internet]. Vol. 53, World Health Organization. 2020. 1689–1699 p. Available from: https://www.who.int/teams/integrated-health-services/patient-safety/policy/global-patient-safety-action-plan
World Health Organization (WHO). Patient Safety Research course.2022; Available from: https://www.who.int/teams/integrated-health-services/patient-safety/guidance/patient-safety-research-course
Vincent C, Amalberti R. Safer Healthcare: Strategies for the Real World. Cham: Springer; 2016. p. 1–157.
Stoustrup P, Ferlias N. Patientskader i forbindelse med ortodonti Tandlaegebladet. 2022;126:812–22.
Yassir YA, McIntyre GT, Bearn DR. Orthodontic treatment and root resorption: An overview of systematic reviews. Eur J Orthod. 2021;43(4):442–56.
Alansari R, Vaiid N. Why do patients transition between orthodontic appliances? A qualitative analysis of patient decision-making. Orthod Craniofac Res. 2023;00:1–8.
Higgins, Julian PT and Green S. Cochrane Handbook for Systematic Reviews of Interventions | Cochrane Training. Vol. 2, Handbook. 2011. p. 649.
OCEBM Table of Evidence Working Group = Jeremy Howick, Iain Chalmers (James Lind Library), Paul Glasziou, Trish Greenhalgh, Carl Heneghan, Alessandro Liberati, Ivan Moschetti, Bob Phillips, Hazel Thornton OG and MH. OCEBM Levels of Evidence — Centre for Evidence-Based Medicine (CEBM), University of Oxford [Internet]. 2011. p. 1. Available from: https://www.cebm.ox.ac.uk/resources/levels-of-evidence/ocebm-levels-of-evidence
Download references
Authors and affiliations.
Section of Orthodontics, Department of Dentistry and Oral Health, Aarhus University, Aarhus, Denmark
Nikolaos Ferlias & Peter Stoustrup
Department of Neurosciences, Reproductive Sciences and Oral Sciences, Section of Orthodontics and Temporomandibular Disorders, University of Naples Federico II, Naples, Italy
Ambrosina Michelotti
Private Practice, Brighton, UK
Nikolaos Ferlias
You can also search for this author in PubMed Google Scholar
NF: Conceptualization, Search strategy, Data synthesis, interpretation and analysis, Investigation, Methodology, Validation, Writing original draft, Writing review & editing. AM: Investigation, Methodology, Data interpretation and analysis, Supervision, Validation, Writing review & editing. PS: Conceptualization, Search strategy, Data synthesis, interpretation and analysis, Investigation, Methodology, Project administration, Supervision, Validation, Writing original draft, Writing review & editing.
Correspondence to Nikolaos Ferlias .
Ethics approval and consent to participate.
Not applicable.
Competing interests.
The authors declare no competing interests.
Publisher’s note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary material 1., supplementary material 2., rights and permissions.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Reprints and permissions
Cite this article.
Ferlias, N., Michelotti, A. & Stoustrup, P. Patient safety in orthodontic care: a scoping literature review with proposal for terminology and future research agenda. BMC Oral Health 24 , 702 (2024). https://doi.org/10.1186/s12903-024-04375-7
Download citation
Received : 11 March 2024
Accepted : 14 May 2024
Published : 18 June 2024
DOI : https://doi.org/10.1186/s12903-024-04375-7
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
ISSN: 1472-6831
Systematic Reviews volume 4 , Article number: 78 ( 2015 ) Cite this article
42k Accesses
122 Citations
37 Altmetric
Metrics details
Automation of the parts of systematic review process, specifically the data extraction step, may be an important strategy to reduce the time necessary to complete a systematic review. However, the state of the science of automatically extracting data elements from full texts has not been well described. This paper performs a systematic review of published and unpublished methods to automate data extraction for systematic reviews.
We systematically searched PubMed, IEEEXplore, and ACM Digital Library to identify potentially relevant articles. We included reports that met the following criteria: 1) methods or results section described what entities were or need to be extracted, and 2) at least one entity was automatically extracted with evaluation results that were presented for that entity. We also reviewed the citations from included reports.
Out of a total of 1190 unique citations that met our search criteria, we found 26 published reports describing automatic extraction of at least one of more than 52 potential data elements used in systematic reviews. For 25 (48 %) of the data elements used in systematic reviews, there were attempts from various researchers to extract information automatically from the publication text. Out of these, 14 (27 %) data elements were completely extracted, but the highest number of data elements extracted automatically by a single study was 7. Most of the data elements were extracted with F-scores (a mean of sensitivity and positive predictive value) of over 70 %.
We found no unified information extraction framework tailored to the systematic review process, and published reports focused on a limited (1–7) number of data elements. Biomedical natural language processing techniques have not been fully utilized to fully or even partially automate the data extraction step of systematic reviews.
Peer Review reports
Systematic reviews identify, assess, synthesize, and interpret published and unpublished evidence, which improves decision-making for clinicians, patients, policymakers, and other stakeholders [ 1 ]. Systematic reviews also identify research gaps to develop new research ideas. The steps to conduct a systematic review [ 1 – 3 ] are:
Define the review question and develop criteria for including studies
Search for studies addressing the review question
Select studies that meet criteria for inclusion in the review
Extract data from included studies
Assess the risk of bias in the included studies, by appraising them critically
Where appropriate, analyze the included data by undertaking meta-analyses
Address reporting biases
Despite their widely acknowledged usefulness [ 4 ], the process of systematic review, specifically the data extraction step (step 4), can be time-consuming. In fact, it typically takes 2.5–6.5 years for a primary study publication to be included and published in a new systematic review [ 5 ]. Further, within 2 years of the publication of systematic reviews, 23 % are out of date because they have not incorporated new evidence that might change the systematic review’s primary results [ 6 ].
Natural language processing (NLP), including text mining, involves information extraction, which is the discovery by computer of new, previously unfound information by automatically extracting information from different written resources [ 7 ]. Information extraction primarily constitutes concept extraction, also known as named entity recognition, and relation extraction, also known as association extraction. NLP handles written text at level of documents, words, grammar, meaning, and context. NLP techniques have been used to automate extraction of genomic and clinical information from biomedical literature. Similarly, automation of the data extraction step of the systematic review process through NLP may be one strategy to reduce the time necessary to complete and update a systematic review. The data extraction step is one of the most time-consuming steps of a systematic review. Automating or even semi-automating this step could substantially decrease the time taken to complete systematic reviews and thus decrease the time lag for research evidence to be translated into clinical practice. Despite these potential gains from NLP, the state of the science of automating data extraction has not been well described.
To date, there is limited knowledge and methods on how to automate the data extraction phase of the systematic reviews, despite being one of the most time-consuming steps. To address this gap in knowledge, we sought to perform a systematic review of methods to automate the data extraction component of the systematic review process.
Our methodology was based on the Standards for Systematic Reviews set by the Institute of Medicine [ 8 ]. We conducted our study procedures as detailed below with input from the Cochrane Heart Group US Satellite.
We included a report that met the following criteria: 1) the methods or results section describes what entities were or needed to be extracted, and 2) at least one entity was automatically extracted with evaluation results that were presented for that entity.
We excluded a report that met any of the following criteria: 1) the methods were not applied to the data extraction step of a systematic review; 2) the report was an editorial, commentary, or other non-original research report; or 3) there was no evaluation component.
For collecting the initial set of articles for our review, we developed search strategies with the help of the Cochrane Heart Group US Satellite, which includes systematic reviewers and a medical librarian. We refined these strategies using relevant citations from related papers. We searched three datasets: PubMed, IEEExplore, and ACM digital library, and our searches were limited between January 1, 2000 and January 6, 2015 (see Appendix 1 ). We restricted our search to these dates because biomedical information extraction algorithms prior to 2000 are unlikely to be accurate enough to be used for systematic reviews.
We retrieved articles that dealt with the extraction of various data elements, defined as categories of data that pertained to any information about or deriving from a study, including details of methods, participants, setting, context, interventions, outcomes, results, publications, and investigators [ 1 ] from included study reports. After we retrieved the initial set of reports from the search results, we then evaluated reports included in the references of these reports. We also sought expert opinion for additional relevant citations.
We first de-duplicated the retrieve citations. For calibration and refinement of the inclusion and exclusion criteria, 100 citations were randomly selected and independently reviewed by a two authors (SRJ and PG). Disagreements were resolved by consensus with a third author (MH). In a second round, another set of 100 randomly selected abstracts was independently reviewed by two study authors (SRJ and PG), whereby we achieved a strong level of agreement (kappa = 0.97). Given the high level of agreement, the remaining studies were reviewed only by one author (PG). In this phase, we identified reports as “not relevant” or “potentially relevant”.
Two authors (PG and SRJ) independently reviewed the full text of all citations ( N = 74) that were identified as “potentially relevant”. We classified included reports into various categories based on the particular data element that they attempted to extract from the original, scientific articles. Example of these data elements might be overall evidence, specific interventions, among others (Table 1 ). We resolved disagreements between the two reviewers through consensus with a third author (MDH).
Two authors (PG and SRJ) independently reviewed the included articles to extract data, such as the particular entity automatically extracted by the study, algorithm or technique used, and evaluation results into a data abstraction spreadsheet. We resolved disagreements through consensus with a third author (MDH).
We reviewed the Cochrane Handbook for Systematic Reviews [ 1 ], the CONsolidated Standards Of Reporting Trials (CONSORT) [ 9 ] statement, the Standards for Reporting of Diagnostic Accuracy (STARD) initiative [ 10 ], and PICO [ 11 ], PECODR [ 12 ], and PIBOSO [ 13 ] frameworks to obtain the data elements to be considered. PICO stands for Population, Intervention, Comparison, Outcomes; PECODR stands for Patient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results; and PIBOSO stands for Population, Intervention, Background, Outcome, Study Design, Other.
Because of the large variation in study methods and measurements, a meta-analysis of methodological features and contextual factors associated with the frequency of data extraction methods was not possible. We therefore present a narrative synthesis of our findings. We did not thoroughly assess risk of bias, including reporting bias, for these reports because the study designs did not match domains evaluated in commonly used instruments such as the Cochrane Risk of Bias tool [ 1 ] or QUADAS-2 instrument used for systematic reviews of randomized trials and diagnostic test accuracy studies, respectively [ 14 ].
Of 1190 unique citations retrieved, we selected 75 reports for full-text screening, and we included 26 articles that met our inclusion criteria (Fig. 1 ). Agreement on abstract and full-text screening was 0.97 and 1.00.
Process of screening the articles to be included for this systematic review
Table 1 provides a list of items to be considered in the data extraction process based on the Cochrane Handbook (Appendix 2 ) [ 1 ], CONSORT statement [ 9 ], STARD initiative [ 10 ], and PICO [ 11 ], PECODR [ 12 ], and PIBOSO [ 13 ] frameworks. We provide the major group for each field and report which standard focused on that field. Finally, we report whether there was a published method to extract that field. Table 1 also identifies the data elements relevant to systematic review process categorized by their domain and the standard from which the element was adopted and was associated with existing automation methods, where present.
Table 2 summarizes the existing information extraction studies. For each study, the table provides the citation to the study (study: column 1), data elements that the study focused on (extracted elements: column 2), dataset used by the study (dataset: column 3), algorithm and methods used for extraction (method: column 4), whether the study extracted only the sentence containing the data elements, full concept or neither of these (sentence/concept/neither: column 5), whether the extraction was done from full-text or abstracts (full text/abstract: column 6) and the main accuracy results reported by the system (results: column 7). The studies are arranged by increasing complexity by ordering studies that classified sentences before those that extracted the concepts and ordering studies that extracted data from abstracts before those that extracted data from full-text reports.
The accuracy of most ( N = 18, 69 %) studies was measured using a standard text mining metric known as F-score, which is the harmonic mean of precision (positive predictive value) and recall (sensitivity). Some studies ( N = 5, 19 %) reported only the precision of their method, while some reported the accuracy values ( N = 2, 8 %). One study (4 %) reported P5 precision, which indicates the fraction of positive predictions among the top 5 results returned by the system.
Dawes et al. [ 12 ] identified 20 evidence-based medicine journal synopses with 759 extracts in the corresponding PubMed abstracts. Annotators agreed with the identification of an element 85 and 87 % for the evidence-based medicine synopses and PubMed abstracts, respectively. After consensus among the annotators, agreement rose to 97 and 98 %, respectively. The authors proposed various lexical patterns and developed rules to discover each PECODR element from the PubMed abstracts and the corresponding evidence-based medicine journal synopses that might make it possible to partially or fully automate the data extraction process.
Kim et al. [ 13 ] used conditional random fields (CRF) [ 15 ] for the task of classifying sentences in one of the PICO categories. The features were based on lexical, syntactic, structural, and sequential information in the data. The authors found that unigrams, section headings, and sequential information from preceding sentences were useful features for the classification task. They used 1000 medical abstracts from PIBOSO corpus and achieved micro-averaged F-scores of 91 and 67 % over datasets of structured and unstructured abstracts, respectively.
Boudin et al. [ 16 ] utilized a combination of multiple supervised classification techniques for detecting PICO elements in the medical abstracts. They utilized features such as MeSH semantic types, word overlap with title, number of punctuation marks on random forests (RF), naive Bayes (NB), support vector machines (SVM), and multi-layer perceptron (MLP) classifiers. Using 26,000 abstracts from PubMed, the authors took the first sentence in the structured abstracts and assigned a label automatically to build a large training data. They obtained an F-score of 86 % for identifying participants (P), 67 % for interventions (I) and controls (C), and 56 % for outcomes (O).
Huang et al. [ 17 ] used a naive Bayes classifier for the PICO classification task. The training data were generated automatically from the structured abstracts. For instance, all sentences in the section of the structured abstract that started with the term “PATIENT” were used to identify participants (P). In this way, the authors could generate a dataset of 23,472 sentences. Using 23,472 sentences from the structured abstracts, they obtained an F-score of 91 % for identifying participants (P), 75 % for interventions (I), and 88 % for outcomes (O).
Verbeke et al. [ 18 ] used a statistical relational learning-based approach (kLog) that utilized relational features for classifying sentences. The authors also used the PIBOSO corpus for evaluation and achieved micro-averaged F-score of 84 % on structured abstracts and 67 % on unstructured abstracts, which was a better performance than Kim et al. [ 13 ].
Huang et al. [ 19 ] used 19,854 structured extracts and trained two classifiers: one by taking the first sentences of each section (termed CF by the authors) and the other by taking all the sentences in each section (termed CA by the authors). The authors used the naive Bayes classifier and achieved F-scores of 74, 66, and 73 % for identifying participants (P), interventions (I), and outcomes (O), respectively, by the CF classifier. The CA classifier gave F-scores of 73, 73, and 74 % for identifying participants (P), interventions (I), and outcomes (O), respectively.
Hassanzadeh et al. [ 20 ] used the PIBOSO corpus for the identification of sentences with PIBOSO elements. Using conditional random fields (CRF) with discriminative set of features, they achieved micro-averaged F-score of 91 %.
Robinson [ 21 ] used four machine learning models, 1) support vector machines, 2) naive Bayes, 3) naive Bayes multinomial, and 4) logistic regression to identify medical abstracts that contained patient-oriented evidence or not. These data included morbidity, mortality, symptom severity, and health-related quality of life. On a dataset of 1356 PubMed abstracts, the authors achieved the highest accuracy using a support vector machines learning model and achieved an F-measure of 86 %.
Chung [ 22 ] utilized a full sentence parser to identify the descriptions of the assignment of treatment arms in clinical trials. The authors used predicate-argument structure along with other linguistic features with a maximum entropy classifier. They utilized 203 abstracts from randomized trials for training and 124 abstracts for testing and achieved an F-score of 76 %.
Hara and Matsumoto [ 23 ] dealt with the problem of extracting “patient population” and “compared treatments” from medical abstracts. Given a sentence from the abstract, the authors first performed base noun-phrase chunking and then categorized the base noun-phrase into one of the five classes: “disease”, “treatment”, “patient”, “study”, and “others” using support vector machine and conditional random field models. After categorization, the authors used regular expression to extract the target words for patient population and comparison. The authors used 200 abstracts including terms such as “neoplasms” and “clinical trial, phase III” and obtained 91 % accuracy for the task of noun phrase classification. For sentence classification, the authors obtained a precision of 80 % for patient population and 82 % for comparisons.
Zhao et al. [ 24 ] used two classification tasks to extract study data including patient details, including one at the sentence level and another at the keyword level. The authors first used a five-class scheme including 1) patient, 2) result, 3) intervention, 4) study design, and 5) research goal and tried to classify sentences into one of these five classes. They further used six classes for keywords such as sex (e.g., male, female), age (e.g., 54-year-old), race (e.g., Chinese), condition (e.g., asthma), intervention, and study design (e.g., randomized trial). They utilized conditional random fields for the classification task. Using 19,893 medical abstracts and full-text articles from 17 journal websites, they achieved F-scores of 75 % for identifying patients, 61 % for intervention, 91 % for results, 79 % for study design, and 76 % for research goal.
Hsu et al. [ 25 ] attempted to classify whether a sentence contains the “hypothesis”, “statistical method”, “outcomes”, or “generalizability” of the study and then extracted the values. Using 42 full-text papers, the authors obtained F-scores of 86 % for identifying hypothesis, 84 % for statistical method, 90 % for outcomes, and 59 % for generalizability.
Song et al. [ 26 ] used machine learning-based classifiers such as maximum entropy classifier (MaxEnt), support vector machines (SVM), multi-layer perceptron (MLP), naive Bayes (NB), and radial basis function network (RBFN) to classify the sentences into categories such as analysis (statistical facts found by clinical experiment), general (generally accepted scientific facts, process, and methodology), recommendation (recommendations about interventions), and rule (guidelines). They utilized the principle of information gain (IG) as well as genetic algorithm (GA) for feature selection. They used 346 sentences from the clinical guideline document and obtained an F-score of 98 % for classifying sentences.
Marshall et al. [ 27 ] used soft-margin support vector machines in a joint model for risk of bias assessment along with supporting sentences for random sequence generation, allocation concealment, blinding of participants and personnel, and blinding of outcome assessment, among others. They utilized presence of unigrams in the supporting sentences as features in their model. Working with full text of 2200 clinical trials, the joint model achieved F-scores of 56, 48, 35, and 38 % for identifying sentences corresponding to random sequence generation, allocation concealment, blinding of participants and personnel, and blinding of outcome assessment, respectively.
Demner-Fushman and Lin [ 28 ] used a rule-based approach to identify sentences containing PICO. Using 275 manually annotated abstracts, the authors achieved an accuracy of 80 % for population extraction and 86 % for problem extraction. They also utilized a supervised classifier for outcome extraction and achieved accuracy from 64 to 95 % across various experiments.
Kelly and Yang [ 29 ] used regular expressions and gazetteer to extract the number of participants, participant age, gender, ethnicity, and study characteristics. The authors utilized 386 abstracts from PubMed obtained with the query “soy and cancer” and achieved F-scores of 96 % for identifying the number of participants, 100 % for age of participants, 100 % for gender of participants, 95 % for ethnicity of participants, 91 % for duration of study, and 87 % for health status of participants.
Hansen et al. [ 30 ] used support vector machines [ 31 ] to extract number of trial participants from abstracts of the randomized control trials. The authors utilized features such as part-of-speech tag of the previous and next words and whether the sentence is grammatically complete (contained a verb). Using 233 abstracts from PubMed, they achieved an F-score of 86 % for identifying participants.
Xu et al. [ 32 ] utilized text classifications augmented with hidden Markov models [ 33 ] to identify sentences about subject demographics. These sentences were then parsed to extract information regarding participant descriptors (e.g., men, healthy, elderly), number of trial participants, disease/symptom name, and disease/symptom descriptors. After testing over 250 RCT abstracts, the authors obtained an accuracy of 83 % for participant descriptors: 83 %, 93 % for number of trial participants, 51 % for diseases/symptoms, and 92 % for descriptors of diseases/symptoms.
Summerscales et al. [ 34 ] used a conditional random field-based approach to identify various named entities such as treatments (drug names or complex phrases) and outcomes. The authors extracted 100 abstracts of randomized trials from the BMJ and achieved F-scores of 49 % for identifying treatment, 82 % for groups, and 54 % for outcomes.
Summerscales et al. [ 35 ] also proposed a method for automatic summarization of results from the clinical trials. The authors first identified the sentences that contained at least one integer (group size, outcome numbers, etc.). They then used the conditional random field classifier to find the entity mentions corresponding to treatment groups or outcomes. The treatment groups, outcomes, etc. were then treated as various “events.” To identify all the relevant information for these events, the authors utilized templates with slots. The slots were then filled using a maximum entropy classifier. They utilized 263 abstracts from the BMJ and achieved F-scores of 76 % for identifying groups, 42 % for outcomes, 80 % for group sizes, and 71 % for outcome numbers.
Kiritchenko et al. [ 36 ] developed ExaCT, a tool that assists users with locating and extracting key trial characteristics such as eligibility criteria, sample size, drug dosage, and primary outcomes from full-text journal articles. The authors utilized a text classifier in the first stage to recover the relevant sentences. In the next stage, they utilized extraction rules to find the correct solutions. The authors evaluated their system using 50 full-text articles describing randomized trials with 1050 test instances and achieved a P5 precision of 88 % for identifying the classifier. Precision and recall of their extraction rules was found to be 93 and 91 %, respectively.
Restificar et al. [ 37 ] utilized latent Dirichlet allocation [ 38 ] to infer the latent topics in the sample documents and then used logistic regression to compute the probability that a given candidate criterion belongs to a particular topic. Using 44,203 full-text reports of randomized trials, the authors achieved accuracies of 75 and 70 % for inclusion and exclusion criteria, respectively.
Lin et al. [ 39 ] used linear-chain conditional random field for extracting various metadata elements such as number of patients, age group of the patients, geographical area, intervention, and time duration of the study. Using 93 full-text articles, the authors achieved a threefold cross validation precision of 43 % for identifying number of patients, 63 % for age group, 44 % for geographical area, 40 % for intervention, and 83 % for time period.
De Bruijn et al. [ 40 ] used support vector machine classifier to first identify sentences describing information elements such as eligibility criteria, sample size, etc. The authors then used manually crafted weak extraction rules to extract various information elements. Testing this two-stage architecture on 88 randomized trial reports, they obtained a precision of 69 % for identifying eligibility criteria, 62 % for sample size, 94 % for treatment duration, 67 % for intervention, 100 % for primary outcome estimates, and 67 % for secondary outcomes.
Zhu et al. [ 41 ] also used manually crafted rules to extract various subject demographics such as disease, age, gender, and ethnicity. The authors tested their method on 50 articles and for disease extraction obtained an F-score of 64 and 85 % for exactly matched and partially matched cases, respectively.
In general, many studies have a high risk of selection bias because the gold standards used in the respective studies were not randomly selected. The risk of performance bias is also likely to be high because the investigators were not blinded. For the systems that used rule-based approaches, it was unclear whether the gold standard was used to train the rules or if there were a separate training set. The risk of attrition bias is unclear based on the study design of these non-randomized studies evaluating the performance of NLP methods. Lastly, the risk of reporting bias is unclear because of the lack of protocols in the development, implementation, and evaluation of NLP methods.
Extracting the data elements.
Participants — Sixteen studies explored the extraction of the number of participants [ 12 , 13 , 16 – 20 , 23 , 24 , 28 – 30 , 32 , 39 ], their age [ 24 , 29 , 39 , 41 ], sex [ 24 , 39 ], ethnicity [ 41 ], country [ 24 , 39 ], comorbidities [ 21 ], spectrum of presenting symptoms, current treatments, and recruiting centers [ 21 , 24 , 28 , 29 , 32 , 41 ], and date of study [ 39 ]. Among them, only six studies [ 28 – 30 , 32 , 39 , 41 ] extracted data elements as opposed to highlighting the sentence containing the data element. Unfortunately, each of these studies used a different corpus of reports, which makes direct comparisons impossible. For example, Kelly and Yang [ 29 ] achieved high F-scores of 100 % for age of participants, 91 % for duration of study, 95 % for ethnicity of participants, 100 % for gender of subjects, 87 % for health status of participants, and 96 % for number of participants on a dataset of 386 abstracts.
Intervention — Thirteen studies explored the extraction of interventions [ 12 , 13 , 16 – 20 , 22 , 24 , 28 , 34 , 39 , 40 ], intervention groups [ 34 , 35 ], and intervention details (for replication if feasible) [ 36 ]. Of these, only six studies [ 28 , 34 – 36 , 39 , 40 ] extracted intervention elements. Unfortunately again, each of these studies used a different corpus. For example, Kiritchenko et al. [ 36 ] achieved an F-score of 75–86 % for intervention data elements on a dataset of 50 full-text journal articles.
Outcomes and comparisons — Fourteen studies also explored the extraction of outcomes and time points of collection and reporting [ 12 , 13 , 16 – 20 , 24 , 25 , 28 , 34 – 36 , 40 ] and extraction of comparisons [ 12 , 16 , 22 , 23 ]. Of these, only six studies [ 28 , 34 – 36 , 40 ] extracted the actual data elements. For example, De Bruijn et al. [ 40 ] obtained an F-score of 100 % for extracting primary outcome and 67 % for secondary outcome from 88 full-text articles. Summerscales [ 35 ] utilized 263 abstracts from the BMJ and achieved an F-score of 42 % for extracting outcomes.
Results — Two studies [ 36 , 40 ] extracted sample size data element from full text on two different data sets. De Bruijn et al. [ 40 ] obtained an accuracy of 67 %, and Kiritchenko et al. [ 36 ] achieved an F-score of 88 %.
Interpretation — Three studies explored extraction of overall evidence [ 26 , 42 ] and external validity of trial findings [ 25 ]. However, all these studies only highlighted sentences containing the data elements relevant to interpretation.
Objectives — Two studies [ 24 , 25 ] explored the extraction of research questions and hypotheses. However, both these studies only highlighted sentences containing the data elements relevant to interpretation.
Methods — Twelve studies explored the extraction of the study design [ 13 , 18 , 20 , 24 ], study duration [ 12 , 29 , 40 ], randomization method [ 25 ], participant flow [ 36 , 37 , 40 ], and risk of bias assessment [ 27 ]. Of these, only four studies [ 29 , 36 , 37 , 40 ] extracted the corresponding data elements from text using different sets of corpora. For example, Restificar et al. [ 37 ] utilized 44,203 full-text clinical trial articles and achieved accuracies of 75 and 70 % for inclusion and exclusion criteria, respectively.
Miscellaneous — One study [ 26 ] explored extraction of key conclusion sentence and achieved a high F-score of 98 %.
Previous reviews on the automation of systematic review processes describe technologies for automating the overall process or other steps. Tsafnat et al. [ 43 ] surveyed the informatics systems that automate some of the tasks of systematic review and report systems for each stage of systematic review. Here, we focus on data extraction. None of the existing reviews [ 43 – 47 ] focus on the data extraction step. For example, Tsafnat et al. [ 43 ] presented a review of techniques to automate various aspects of systematic reviews, and while data extraction has been described as a task in their review, they only highlighted three studies as an acknowledgement of the ongoing work. In comparison, we identified 26 studies and critically examined their contribution in relation to all the data elements that need to be extracted to fully support the data extraction step.
Thomas et al. [ 44 ] described the application of text mining technologies such as automatic term recognition, document clustering, classification, and summarization to support the identification of relevant studies in systematic reviews. The authors also pointed out the potential of these technologies to assist at various stages of the systematic review. Slaughter et al. [ 45 ] discussed necessary next steps towards developing “living systematic reviews” rather than a static publication, where the systematic reviews can be continuously updated with the latest knowledge available. The authors mentioned the need for development of new tools for reporting on and searching for structured data from clinical trials.
Tsafnat et al. [ 46 ] described four main tasks in systematic review: identifying the relevant studies, evaluating risk of bias in selected trials, synthesis of the evidence, and publishing the systematic reviews by generating human-readable text from trial reports. They mentioned text extraction algorithms for evaluating risk of bias and evidence synthesis but remain limited to one particular method for extraction of PICO elements.
Most natural language processing research has focused on reducing the workload for the screening step of systematic reviews (Step 3). Wallace et al. [ 48 , 49 ] and Miwa et al. [ 50 ] proposed an active learning framework to reduce the workload in citation screening for inclusion in the systematic reviews. Jonnalagadda et al. [ 51 ] designed a distributional semantics-based relevance feedback model to semi-automatically screen citations. Cohen et al. [ 52 ] proposed a module for grouping studies that are closely related and an automated system to rank publications according to the likelihood for meeting the inclusion criteria of a systematic review. Choong et al. [ 53 ] proposed an automated method for automatic citation snowballing to recursively pursue relevant literature for helping in evidence retrieval for systematic reviews. Cohen et al. [ 54 ] constructed a voting perceptron-based automated citation classification system to classify each article as to whether it contains high-quality, drug-specific evidence. Adeva et al. [ 55 ] also proposed a classification system for screening articles for systematic review. Shemilt et al. [ 56 ] also discussed the use of text mining to reduce screening workload in systematic reviews.
No standard gold standards or dataset.
Among the 26 studies included in this systematic review, only three of them use a common corpus, namely 1000 medical abstracts from the PIBOSO corpus. Unfortunately, even that corpus facilitates only classification of sentences into whether they contain one of the data elements corresponding to the PIBOSO categories. No two other studies shared the same gold standard or dataset for evaluation. This limitation made it impossible for us to compare and assess the relative significance of the reported accuracy measures.
Few data elements, which are also relatively straightforward to extract automatically, such as the total number of participants (14 overall and 5 for extracting the actual data elements), have a relatively higher number of studies aiming towards extracting the same data element. This is not the case with other data elements. There are 27 out of 52 potential data elements that have not been explored for automated extraction, even if for highlighting the sentences containing them; seven more data elements were explored just by one study. There are 38 out of 52 potential data elements (>70 %) that have not been explored for automated extraction of the actual data elements; three more data elements were explored just by one study. The highest number of data elements extracted by a single study is only seven (14 %). This finding means that not only are more studies needed to explore the remaining 70 % data elements, but that there is an urgent need for a unified framework or system to extract all necessary data elements. The current state of informatics research for data extraction is exploratory, and multiple studies need to be conducted using the same gold standard and on the extraction of the same data elements for effective comparison.
Our study has limitations. First, there is a possibility that data extraction algorithms were not published in journals or that our search might have missed them. We sought to minimize this limitation by searching in multiple bibliographic databases, including PubMed, IEEExplore, and ACM Digital Library. However, investigators may have also failed to publish algorithms that had lower F-scores than were previously reported, which we would not have captured. Second, we did not publish a protocol a priori, and our initial findings may have influenced our methods. However, we performed key steps, including screening, full-text review, and data extraction in duplicate to minimize potential bias in our systematic review.
“On demand” access to summarized evidence and best practices has been considered a sound strategy to satisfy clinicians’ information needs and enhance decision-making [ 57 – 65 ]. A systematic review of 26 studies concluded that information-retrieval technology produces positive impact on physicians in terms of decision enhancement, learning, recall, reassurance, and confirmation [ 62 ]. Slaughter et al. [ 45 ] discussed necessary next steps towards developing “living systematic reviews” rather than a static publication, where the systematic reviews can be continuously updated with the latest knowledge available. The authors mention the need for development of new tools for reporting on and searching for structured data from published literature. Automated information extraction framework that extract data elements have the potential to assist the systematic reviewers and to eventually automate the screening and data extraction steps.
Medical science is currently witnessing a rapid pace at which medical knowledge is being created—75 clinical trials a day [ 66 ]. Evidence-based medicine [ 67 ] requires clinicians to keep up with published scientific studies and use them at the point of care. However, it has been shown that it is practically impossible to do that even within a narrow specialty [ 68 ]. A critical barrier is that finding relevant information, which may be located in several documents, takes an amount of time and cognitive effort that is incompatible with the busy clinical workflow [ 69 , 70 ]. Rapid systematic reviews using automation technologies will enable clinicians with up-to-date and systematic summaries of the latest evidence.
Our systematic review describes previously reported methods to identify sentences containing some of the data elements for systematic reviews and only a few studies that have reported methods to extract these data elements. However, most of the data elements that would need to be considered for systematic reviews have been insufficiently explored to date, which identifies a major scope for future work. We hope that these automated extraction approaches might first act as checks for manual data extraction currently performed in duplicate; then serve to validate manual data extraction done by a single reviewer; then become the primary source for data element extraction that would be validated by a human; and eventually completely automate data extraction to enable living systematic reviews.
natural language processing
CONsolidated Standards Of Reporting Trials
Standards for Reporting of Diagnostic Accuracy
Population, Intervention, Comparison, Outcomes
Patient-Population-Problem, Exposure-Intervention, Comparison, Outcome, Duration and Results
Population, Intervention, Background, Outcome, Study Design, Other
conditional random fields
naive Bayes
randomized control trial
British Medical Journal
Higgins J, Green S. Cochrane handbook for systematic reviews of interventions version 5.1. 0 [updated March 2011]. The Cochrane Collaboration. 2011. Available at [ http://community.cochrane.org/handbook ]
Khan KS, Ter Riet G, Glanville J, Sowden AJ, Kleijnen J. Undertaking systematic reviews of research on effectiveness: CRD’s guidance for carrying out or commissioning reviews, NHS Centre for Reviews and Dissemination. 2001.
Google Scholar
Woolf SH. Manual for conducting systematic reviews, Agency for Health Care Policy and Research. 1996.
Field MJ, Lohr KN. Clinical practice guidelines: directions for a new program, Clinical Practice Guidelines. 1990.
Elliott J, Turner T, Clavisi O, Thomas J, Higgins J, Mavergames C, et al. Living systematic reviews: an emerging opportunity to narrow the evidence-practice gap. PLoS Med. 2014;11:e1001603.
Article PubMed PubMed Central Google Scholar
Shojania KG, Sampson M, Ansari MT, Ji J, Doucette S, Moher D. How quickly do systematic reviews go out of date? A survival analysis. Ann Intern Med. 2007;147(4):224–33.
Article PubMed Google Scholar
Hearst MA. Untangling text data mining. Proceedings of the 37th annual meeting of the Association for Computational Linguistics. College Park, Maryland: Association for Computational Linguistics; 1999. p. 3–10.
Morton S, Levit L, Berg A, Eden J. Finding what works in health care: standards for systematic reviews. Washington D.C.: National Academies Press; 2011. Available at [ http://www.nap.edu/catalog/13059/finding-what-works-in-health-care-standards-for-systematic-reviews ]
Begg C, Cho M, Eastwood S, Horton R, Moher D, Olkin I, et al. Improving the quality of reporting of randomized controlled trials: the CONSORT statement. JAMA. 1996;276(8):637–9.
Article CAS PubMed Google Scholar
Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Clin Chem Lab Med. 2003;41(1):68–73. doi: 10.1515/CCLM.2003.012 .
Richardson WS, Wilson MC, Nishikawa J, Hayward RS. The well-built clinical question: a key to evidence-based decisions. ACP J Club. 1995;123(3):A12–3.
CAS PubMed Google Scholar
Dawes M, Pluye P, Shea L, Grad R, Greenberg A, Nie J-Y. The identification of clinically important elements within medical journal abstracts: Patient–Population–Problem, Exposure–Intervention, Comparison, Outcome, Duration and Results (PECODR). Inform Prim Care. 2007;15(1):9–16.
PubMed Google Scholar
Kim S, Martinez D, Cavedon L, Yencken L. Automatic classification of sentences to support evidence based medicine. BMC Bioinform. 2011;12 Suppl 2:S5.
Article Google Scholar
Whiting P, Rutjes AWS, Reitsma JB, Bossuyt PMM, Kleijnen J. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3(1):25.
Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data, Proceedings of the Eighteenth International Conference on Machine Learning. 2001. p. 282–9. %L 3140.
Boudin F, Nie JY, Bartlett JC, Grad R, Pluye P, Dawes M. Combining classifiers for robust PICO element detection. BMC Med Inform Decis Mak. 2010;10:29. doi: 10.1186/1472-6947-10-29 .
Huang K-C, Liu C-H, Yang S-S, Liao C-C, Xiao F, Wong J-M, et al, editors. Classification of PICO elements by text features systematically extracted from PubMed abstracts. Granular Computing (GrC), 2011 IEEE International Conference on; 2011: IEEE.
Verbeke M, Van Asch V, Morante R, Frasconi P, Daelemans W, De Raedt L, editors. A statistical relational learning approach to identifying evidence based medicine categories. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning; 2012: Association for Computational Linguistics.
Huang K-C, Chiang IJ, Xiao F, Liao C-C, Liu CC-H, Wong J-M. PICO element detection in medical text without metadata: are first sentences enough? J Biomed Inform. 2013;46(5):940–6.
Hassanzadeh H, Groza T, Hunter J. Identifying scientific artefacts in biomedical literature: the evidence based medicine use case. J Biomed Inform. 2014;49:159–70.
Robinson DA. Finding patient-oriented evidence in PubMed abstracts. Athens: University of Georgia; 2012.
Chung GY-C. Towards identifying intervention arms in randomized controlled trials: extracting coordinating constructions. J Biomed Inform. 2009;42(5):790–800.
Hara K, Matsumoto Y. Extracting clinical trial design information from MEDLINE abstracts. N Gener Comput. 2007;25(3):263–75.
Zhao J, Bysani P, Kan MY. Exploiting classification correlations for the extraction of evidence-based practice information. AMIA Annu Symp Proc. 2012;2012:1070–8.
PubMed PubMed Central Google Scholar
Hsu W, Speier W, Taira R. Automated extraction of reported statistical analyses: towards a logical representation of clinical trial literature. AMIA Annu Symp Proc. 2012;2012:350–9.
Song MH, Lee YH, Kang UG. Comparison of machine learning algorithms for classification of the sentences in three clinical practice guidelines. Healthcare Informatics Res. 2013;19(1):16–24.
Marshall IJ, Kuiper J, Wallace BC, editors. Automating risk of bias assessment for clinical trials. Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics; 2014: ACM.
Demner-Fushman D, Lin J. Answering clinical questions with knowledge-based and statistical techniques. Comput Linguist. 2007;33(1):63–103.
Kelly C, Yang H. A system for extracting study design parameters from nutritional genomics abstracts. J Integr Bioinform. 2013;10(2):222. doi: 10.2390/biecoll-jib-2013-222 .
Hansen MJ, Rasmussen NO, Chung G. A method of extracting the number of trial participants from abstracts describing randomized controlled trials. J Telemed Telecare. 2008;14(7):354–8. doi: 10.1258/jtt.2008.007007 .
Joachims T. Text categorization with support vector machines: learning with many relevant features, Machine Learning: ECML-98, Tenth European Conference on Machine Learning. 1998. p. 137–42.
Xu R, Garten Y, Supekar KS, Das AK, Altman RB, Garber AM. Extracting subject demographic information from abstracts of randomized clinical trial reports. 2007.
Eddy SR. Hidden Markov models. Curr Opin Struct Biol. 1996;6(3):361–5.
Summerscales RL, Argamon S, Hupert J, Schwartz A. Identifying treatments, groups, and outcomes in medical abstracts. The Sixth Midwest Computational Linguistics Colloquium (MCLC 2009). 2009.
Summerscales R, Argamon S, Bai S, Huperff J, Schwartzff A. Automatic summarization of results from clinical trials, the 2011 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2011. p. 372–7.
Kiritchenko S, de Bruijn B, Carini S, Martin J, Sim I. ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC Med Inform Decis Mak. 2010;10:56.
Restificar A, Ananiadou S. Inferring appropriate eligibility criteria in clinical trial protocols without labeled data, Proceedings of the ACM sixth international workshop on Data and text mining in biomedical informatics. 2012. ACM.
Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003;3(4–5):993–1022.
Lin S, Ng J-P, Pradhan S, Shah J, Pietrobon R, Kan M-Y, editors. Extracting formulaic and free text clinical research articles metadata using conditional random fields. Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents; 2010: Association for Computational Linguistics.
De Bruijn B, Carini S, Kiritchenko S, Martin J, Sim I, editors. Automated information extraction of key trial design elements from clinical trial publications. AMIA Annual Symposium Proceedings; 2008: American Medical Informatics Association.
Zhu H, Ni Y, Cai P, Qiu Z, Cao F. Automatic extracting of patient-related attributes: disease, age, gender and race. Stud Health Technol Inform. 2011;180:589–93.
Davis-Desmond P, Mollá D, editors. Detection of evidence in clinical research papers. Proceedings of the Fifth Australasian Workshop on Health Informatics and Knowledge Management-Volume 129; 2012: Australian Computer Society, Inc.
Tsafnat G, Glasziou P, Choong M, Dunn A, Galgani F, Coiera E. Systematic review automation technologies. Syst Rev. 2014;3(1):74.
Thomas J, McNaught J, Ananiadou S. Applications of text mining within systematic reviews. Res Synthesis Methods. 2011;2(1):1–14.
Slaughter L, Berntsen CF, Brandt L, Mavergames C. Enabling living systematic reviews and clinical guidelines through semantic technologies. D-Lib Magazine. 2015;21(1/2). Available at [ http://www.dlib.org/dlib/january15/slaughter/01slaughter.html ]
Tsafnat G, Dunn A, Glasziou P, Coiera E. The automation of systematic reviews. BMJ. 2013;346:f139.
O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev. 2015;4(1):5.
Wallace BC, Trikalinos TA, Lau J, Brodley C, Schmid CH. Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinformatics. 2010;11(1):55.
Wallace BC, Small K, Brodley CE, Trikalinos TA, editors. Active learning for biomedical citation screening. Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining; 2010: ACM.
Miwa M, Thomas J, O’Mara-Eves A, Ananiadou S. Reducing systematic review workload through certainty-based screening. J Biomed Inform. 2014;51:242–53.
Jonnalagadda S, Petitti D. A new iterative method to reduce workload in systematic review process. Int J Comput Biol Drug Des. 2013;6(1–2):5–17. doi: 10.1504/IJCBDD.2013.052198 .
Cohen A, Adams C, Davis J, Yu C, Yu P, Meng W, et al. Evidence-based medicine, the essential role of systematic reviews, and the need for automated text mining tools. Proceedings of the 1st ACM International Health Informatics Symposium. 2010:376–80.
Choong MK, Galgani F, Dunn AG, Tsafnat G. Automatic evidence retrieval for systematic reviews. J Med Inter Res. 2014;16(10):e223.
Cohen AM, Hersh WR, Peterson K, Yen P-Y. Reducing workload in systematic review preparation using automated citation classification. J Am Med Inform Assoc. 2006;13(2):206–19.
Article CAS PubMed PubMed Central Google Scholar
García Adeva JJ, Pikatza Atxa JM, Ubeda Carrillo M, Ansuategi ZE. Automatic text classification to support systematic reviews in medicine. Expert Syst Appl. 2014;41(4):1498–508.
Shemilt I, Simon A, Hollands GJ, Marteau TM, Ogilvie D, O’Mara‐Eves A, et al. Pinpointing needles in giant haystacks: use of text mining to reduce impractical screening workload in extremely large scoping reviews. Res Synthesis Methods. 2014;5(1):31–49.
Cullen RJ. In search of evidence: family practitioners’ use of the Internet for clinical information. J Med Libr Assoc. 2002;90(4):370–9.
Hersh WR, Hickam DH. How well do physicians use electronic information retrieval systems? A framework for investigation and systematic review. JAMA. 1998;280(15):1347–52.
Lucas BP, Evans AT, Reilly BM, Khodakov YV, Perumal K, Rohr LG, et al. The impact of evidence on physicians’ inpatient treatment decisions. J Gen Intern Med. 2004;19(5 Pt 1):402–9. doi: 10.1111/j.1525-1497.2004.30306.x .
Magrabi F, Coiera EW, Westbrook JI, Gosling AS, Vickland V. General practitioners’ use of online evidence during consultations. Int J Med Inform. 2005;74(1):1–12. doi: 10.1016/j.ijmedinf.2004.10.003 .
McColl A, Smith H, White P, Field J. General practitioner’s perceptions of the route to evidence based medicine: a questionnaire survey. BMJ. 1998;316(7128):361–5.
Pluye P, Grad RM, Dunikowski LG, Stephenson R. Impact of clinical information-retrieval technology on physicians: a literature review of quantitative, qualitative and mixed methods studies. Int J Med Inform. 2005;74(9):745–68. doi: 10.1016/j.ijmedinf.2005.05.004 .
Rothschild JM, Lee TH, Bae T, Bates DW. Clinician use of a palmtop drug reference guide. J Am Med Inform Assoc. 2002;9(3):223–9.
Rousseau N, McColl E, Newton J, Grimshaw J, Eccles M. Practice based, longitudinal, qualitative interview study of computerised evidence based guidelines in primary care. BMJ. 2003;326(7384):314.
Westbrook JI, Coiera EW, Gosling AS. Do online information retrieval systems help experienced clinicians answer clinical questions? J Am Med Inform Assoc. 2005;12(3):315–21. doi: 10.1197/jamia.M1717 .
Bastian H, Glasziou P, Chalmers I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med. 2010;7(9):e1000326. doi: 10.1371/journal.pmed.1000326 .
Lau J. Evidence-based medicine and meta-analysis: getting more out of the literature. In: Greenes RA, editor. Clinical decision support: the road ahead. 2007. p. 249.
Fraser AG, Dunstan FD. On the impossibility of being expert. BMJ (Clinical Res). 2010;341:c6815.
Ely JW, Osheroff JA, Chambliss ML, Ebell MH, Rosenbaum ME. Answering physicians’ clinical questions: obstacles and potential solutions. J Am Med Inform Assoc. 2005;12(2):217–24. doi: 10.1197/jamia.M1608 .
Ely JW, Osheroff JA, Maviglia SM, Rosenbaum ME. Patient-care questions that physicians are unable to answer. J Am Med Inform Assoc. 2007;14(4):407–14. doi: 10.1197/jamia.M2398 .
Download references
Authors and affiliations.
Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, 750 North Lake Shore Drive, 11th Floor, Chicago, IL, 60611, USA
Siddhartha R. Jonnalagadda
Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, 721302, West Bengal, India
Pawan Goyal
Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, USA
Mark D. Huffman
You can also search for this author in PubMed Google Scholar
Correspondence to Siddhartha R. Jonnalagadda .
Competing interests.
The authors declare that they have no competing interests.
SRJ and PG had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design were done by SRJ. SRJ, PG, and MDH did the acquisition, analysis, or interpretation of data. SRJ and PG drafted the manuscript. SRJ, PG, and MDH did the critical revision of the manuscript for important intellectual content. SRJ obtained funding. PG and SRJ provided administrative, technical, or material support. SRJ did the study supervision. All authors read and approved the final manuscript.
This project was partly supported by the National Library of Medicine (grant 5R00LM011389). The Cochrane Heart Group US Satellite at Northwestern University is supported by an intramural grant from the Northwestern University Feinberg School of Medicine.
The funding source had no role in the design and conduct of the study; collection, management, analysis, or interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Library of Medicine.
Mark Berendsen (Research Librarian, Galter Health Sciences Library, Northwestern University Feinberg School of Medicine) provided insights on the design of this study, including the search strategies, and Dr. Kalpana Raja (Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine) reviewed the manuscript. None of them received compensation for their contributions.
Below, we provide the search strategies used in PubMed, ACM Digital Library, and IEEExplore. The search was conducted on January 6, 2015.
(“identification” [Title] OR “extraction” [Title] OR “extracting” [Title] OR “detection” [Title] OR “identifying” [Title] OR “summarization” [Title] OR “learning approach” [Title] OR “automatically” [Title] OR “summarization” [Title] OR “identify sections” [Title] OR “learning algorithms” [Title] OR “Interpreting” [Title] OR “Inferring” [Title] OR “Finding” [Title] OR “classification” [Title]) AND (“medical evidence”[Title] OR “PICO”[Title] OR “PECODR” [Title] OR “intervention arms” [Title] OR “experimental methods” [Title] OR “study design parameters” [Title] OR “Patient oriented Evidence” [Title] OR “eligibility criteria” [Title] OR “clinical trial characteristics” [Title] OR “evidence based medicine” [Title] OR “clinically important elements” [Title] OR “evidence based practice” [Title] “results from clinical trials” [Title] OR “statistical analyses” [Title] OR “research results” [Title] OR “clinical evidence” [Title] OR “Meta Analysis” [Title] OR “Clinical Research” [Title] OR “medical abstracts” [Title] OR “clinical trial literature” [Title] OR ”clinical trial characteristics” [Title] OR “clinical trial protocols” [Title] OR “clinical practice guidelines” [Title]).
We performed this search only in the metadata.
(“identification” OR “extraction” OR “extracting” OR “detection” OR “Identifying” OR “summarization” OR “learning approach” OR “automatically” OR “summarization” OR “identify sections” OR “learning algorithms” OR “Interpreting” OR “Inferring” OR “Finding” OR “classification”) AND (“medical evidence” OR “PICO” OR “intervention arms” OR “experimental methods” OR “eligibility criteria” OR “clinical trial characteristics” OR “evidence based medicine” OR “clinically important elements” OR “results from clinical trials” OR “statistical analyses” OR “clinical evidence” OR “Meta Analysis” OR “clinical research” OR “medical abstracts” OR “clinical trial literature” OR “clinical trial protocols”).
((Title: “identification” or Title: “extraction” or Title: “extracting” or Title: “detection” or Title: “Identifying” or Title: “summarization” or Title: “learning approach” or Title: “automatically” or Title: “summarization “or Title: “identify sections” or Title: “learning algorithms” or Title: “scientific artefacts” or Title: “Interpreting” or Title: “Inferring” or Title: “Finding” or Title: “classification” or “statistical techniques”) and (Title: “medical evidence” or Abstract: “medical evidence” or Title: “PICO” or Abstract: “PICO” or Title: “intervention arms” or Title: “experimental methods” or Title: “study design parameters” or Title: “Patient oriented Evidence” or Abstract: “Patient oriented Evidence” or Title: “eligibility criteria” or Abstract: “eligibility criteria” or Title: “clinical trial characteristics” or Abstract: “clinical trial characteristics” or Title: “evidence based medicine” or Abstract: “evidence based medicine” or Title: “clinically important elements” or Title: “evidence based practice” or Title: “treatments” or Title: “groups” or Title: “outcomes” or Title: “results from clinical trials” or Title: “statistical analyses” or Abstract: “statistical analyses” or Title: “research results” or Title: “clinical evidence” or Abstract: “clinical evidence” or Title: “Meta Analysis” or Abstract:“Meta Analysis” or Title:“Clinical Research” or Title: “medical abstracts” or Title: “clinical trial literature” or Title: “Clinical Practice” or Title: “clinical trial protocols” or Abstract: “clinical trial protocols” or Title: “clinical questions” or Title: “clinical trial design”)).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.
Reprints and permissions
Cite this article.
Jonnalagadda, S.R., Goyal, P. & Huffman, M.D. Automating data extraction in systematic reviews: a systematic review. Syst Rev 4 , 78 (2015). https://doi.org/10.1186/s13643-015-0066-7
Download citation
Received : 20 March 2015
Accepted : 21 May 2015
Published : 15 June 2015
DOI : https://doi.org/10.1186/s13643-015-0066-7
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
ISSN: 2046-4053
You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.
All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .
Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.
Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.
Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.
Original Submission Date Received: .
Find support for a specific problem in the support section of our website.
Please let us know what you think of our products and services.
Visit our dedicated information section to learn more about MDPI.
The impact of nonpharmacological interventions on opioid use for chronic noncancer pain: a scoping review.
2.1. inclusion and exclusion criteria, 2.2. search strategy, 2.3. data screening, 2.4. data extraction, 2.5. data analysis, 3.1. characteristics of included studies.
First Author, Year | Design | N | NPI Type | Duration | Pain and Opioid Use Measures | Pain Intensity and Opioid Use Results |
---|---|---|---|---|---|---|
Garcia, 2021 | RCT | 179 | Device | 56 days | Pain: Defense and Veterans Pain Rating Scale Opioid use: Self-reported and converted to morphine milligram equivalent (MME) | Pain: Pain intensity reduced by an average of 42.8% for the virtual reality (EaseVRx) group and 25% for the sham virtual reality group. Opioid use: Did not reach statistical significance for either group. |
Jensen, 2020 | RCT | 173 | Hypnosis | 4 sessions | Pain: Numeric Rating Scale Opioid use: Self-reported and converted to MME | Pain: No statistically significant between-group differences on omnibus test for pain intensity. On average, pain intensity reduced between pre- vs. post-treatment for all groups. Opioid use: No changes in opioid use were found. |
Zheng, 2019 | RCT | 108 | Acupuncture | 12 wks | Pain: Visual Analogue Scale Opioid use: Self-reported and converted to MME | Pain: No group differences were found in pain intensity. No changes in pain intensity were found over time. Opioid use: Opioid use reduced by 20.5% (p < 0.05) and 13.7% (p < 0.01) in the two acupuncture groups and by 4.5% in the education group post-treatment, but without any group differences. For follow-up, the education group had a 47% decrease in opioid use after a course of electroacupuncture. |
Garland, 2022 | RCT | 250 | Mindfulness | 8 wks | Pain: Brief Pain Inventory Opioid use: Urine toxicologic screening, Self-reported and converted to MME | Pain: MORE showed greater reductions in pain severity (between-group effect: 0.49; 95% CI, 0.17–0.81; p = 0.003) than the control group. Opioid use: MORE reduced the opioid use more than the control group (between-group effect: 0.15 log mg; 95% CI, 0.03–0.27 log mg; p = 0.009). At 9-month follow-up, 22 of 62 participants (35.5%) in MORE group reduced opioid use by at least 50%, compared to 11 of 69 participants (15.9%) in the control group (p = 0.009). At 9-months, 36 of 80 participants (45.0%) in MORE were no longer misusing opioids compared with 19 of 78 participants (24.4%) in the control group. |
Hudak, 2021 | RCT | 62 * | Mindfulness | 8 wks | Pain: NA Opioid use: Self-reported and converted to MME | Pain: NA Opioid use: Participants in MORE showed greater reduction in opioid use over time than the control group. |
Wilson, 2023 | RCT | 402 | Educational Program | 8 wks | Pain: Brief Pain Inventory Opioid use: Opioid prescription information was collected from the participants medical record and converted to MME | Pain: 24 (14.5%) of 166 E-Health participants achieved a >2 point decrease in pain intensity compared to 13 (6.8%) of 192 TAU participants (odds ratio, 2.4 [95% CL, 1.2–4.9]; p = 0.02). Opioid use: 105 (53.6%) of 196 E-Health participants achieved a >15% reduction in opioid use compared with 85 (42.3%) of 201 TAU participants (odds ratio, 1.6 [95% CL, 1.1–2.3]; p = 0.02). |
Garland, 2024 | RCT | 230 * | Mindfulness | 8 wks | Pain: Brief Pain Inventory Opioid use: Urine drug screens, opioid prescription information was collected from the participants medical record and converted to MME | Pain: MORE showed significantly greater reduction in pain outcomes than the control group (p = 0.025). Opioid use: MORE reduced opioid dose significantly compared to control group (B = 0.65, 95% CI = 0.07–1.23, p = 0.029); 20.7% reduction in mean opioid use (18.88 mg, SD = 8.40 mg) for MORE compared to 3.9% reduction (3.19 mg, SD = 4.38 mg) for control group. MORE showed significantly greater reduction in opioid dose than control group (p = 0.025). |
DeBar, 2022 | RCT | 850 | CBT | 12 wks | Pain: Pain Intensity and Interference with Enjoyment of Life, General Activity, and Sleep Opioid use: Self-reported and converted to MME per 90-day period | Pain: CBT had larger reductions in pain outcomes at 12-month follow-up compared to usual care (difference, −0.434 point [95% CI, 0.690 to −0.178 point]) and post-treatment (difference, −0.565 point [CI, −0.796 to −0.333 point]). Opioid use: No differences were seen in opioid use at post-treatment (difference, −2.260 points [CI, −5.509 to 0.989 points]) or at 12-month follow-up (difference, −1.969 points [CI, −6.765 to 2.827 points]). |
Gardiner, 2019 | RCT | 159 | Combined | 21 wks | Pain: Brief Pain Inventory Opioid use: Self-reported | Pain: No differences in pain outcomes at any time point. Opioid use: At 21 weeks, the IMGV group reported greater reduction in pain medications use (Odds Ratio: 0.42, CI: 0.18–0.98) compared to controls. |
Wartko, 2023 | RCT | 153 | CBT | 18 sessions/ 1 year | Pain: Pain, Enjoyment of life, and General activity Opioid use: Self-reported and converted to MME | Pain: No significant differences between intervention and usual care for pain outcomes were found (0.0 [95% CI: −0.5, 0.5], p = 0.985). Opioid use: No significant differences between intervention and usual care for opioid use were found (adjusted mean difference: −2.3 MME; 95% CI: −10.6, 5.9; p = 0.578). |
Groessl, 2017 | RCT | 150 * | Yoga | 12 wks | Pain: Brief Pain Inventory Opioid use: Self-report and verified using medical records. | Pain: Differences observed at all three time points (p = 0.001 for 6 weeks, 0.005 for 12 weeks, 0.013 for 6 months), with larger reductions in pain intensity for yoga participants. Opioid use: Significant reduction from 20% to 11% at 12 weeks (p = 0.007) and 8% after 6 months (p < 0.001). |
Roseen, 2022 | RCT | 120 * | Yoga | 12 wks | Pain: Defense and Veterans Pain Rating Scale Opioid use: Self-reported | Pain: No significant in-between differences were observed for pain. Opioid use: No significant in-between differences were observed for opioid use. Post-treatment, fewer yoga than education participants reported pain medication use (55% vs. 67%, OR = 0.56, 95% CI: 0.26–1.24, p = 0.15). |
Sandhu, 2023 | RCT | 608 | Educational Program | 3 days and 12 months maintenance | Pain: Patient-Reported Outcomes Measurement Information System Opioid use: Self-reported, with a participant report verified in a telephone call from a member of the study team and converted to MME | Pain: No significant between-group differences in pain intensity. Opioid use: At 12 months, 65 of 225 participants (29%) achieved opioid cessation in the intervention group and 15 of 208 participants (7%) achieved opioid cessation in the usual care group (odds ratio, 5.55 [95% CI, 2.80 to 10.99]. |
Does, 2024 | RCT | 376 | Educational Program | 4 sessions | Pain: Patient-Reported Outcomes Measurement Information System Opioid use: Pharmacy dispensation data from the medical record and converted to MME for the 6-month period. | Pain: No significant between-group differences in pain intensity. Opioid use: A small but not significant decrease in opioid use was found in both groups over the study period. At 12 months, intervention group demonstrated greater medication use (OR = 2.72; 95% CI 1.61–4.58). |
Naylor, 2010 | RCT | 51 | Digital Technology | 4 months | Pain: Short form of the McGill Pain Questionnaire, the Pain Symptoms Subscale from the Treatment Outcomes in Pain Survey Opioid use: Self-reported | Pain: TIVR showed significant improvement at 8-month follow-up for pain scores (p < 0.0001), compared to the control group. Opioid use: Opioid use reduced in the TIVR group in both follow-ups: 4- and 8-months post CBT. At 8-month follow-up, 21% of the TIVR participants stopped using opioids. There was significant between group differences in opioid use at 8-month follow-up (p = 0.004). |
Nielssen, 2019 | RCT | 50 | Educational Program | 8 wks | Pain: Roland–Morris Disability Questionnaire, Wisconsin Brief Pain Questionnaire Opioid use: Self-reported and converted to MME | Pain: Significantly larger reduction in pain outcomes with the intervention compared to the control group. Opioid use: Significant reduction in opioid use compared to control group. |
Day, 2019 | RCT | 69 | Combined | 8 wks | Pain: Numeric Rating Scale Opioid use: Self-reported opioid use in the past week | Pain: Post-treatment, the intent-to-treat group showed significant improvements for pain intensity (p < 0.001), with no significant between group differences. Opioid use: For the intent-to-treat group, there was no significant difference (p = 0.549) in opioid use between pre-treatment (48%) and post-treatment (43%). Opioid use decreased significantly (p = 0.012) from pre-treatment (49%) to 3-month follow-up (28%), but opioid use at post-treatment (40%) and 6-month follow-up (33%) were not significantly reduced (p = 0.289) than at pre-treatment. |
Spangeus, 2023 | RCT | 21 | Educational Program | 10 wks | Pain: Numeric Pain Scale Opioid use: Self-reported opioid use | Pain: Significant improvements post-treatment on pain outcomes were found. Opioid use: Significant reduction in opioid use (25%) at baseline and (14%) at post-treatment were found. |
Nelli, 2023 | OB | 45 | Device | 2 wks | Pain: Numeric Scale Opioid use: Self-reported and converted to MME | Pain: The reduction in pain scores was 67%, 50%, and 45% for the green, blue, and clear glasses groups (p = 0.56). No significant differences in pain score reduction between groups was found. Opioid use: Greater than 10% reduction in opioid use was achieved and found 33%, 11%, and 8% of the green, blue, and clear eyeglasses groups (p = 0.23). |
Moffat, 2023 | OB | 13,968 * | Combined | 22 months | Pain: NA Opioid use: Identified using the Australian Pharmaceutical Benefits Scheme item number and converted to MME | Pain: NA. Opioid use: Calculated change in predicted trends with and without the intervention 25,387 (95% CI 24,676, 26,131). |
Zeliadt, 2022 | OB | 4869 * | Combined | 18 months | Pain: NA Opioid use: Extracted from VA’s pharmacy managerial cost accounting national data extract and converted to MME. | Pain: NA. Opioid use: Opioid use decreased by −12% in one year among veterans who began CIH compared to similar veterans who used conventional care; −4.4% among veterans who used only Whole Health services compared to conventional care, and −8.5% among veterans who used both CIH combined with Whole Health services compared to conventional care. |
Huffman, 2019 | OB | 1681 | Combined | 4 wks | Pain: Numeric Rating Scale Opioid use: Self-reported | Pain: Pain on discharge, and at 6 months and 12 months was significantly lower compared to on admission (p < 0.05). Opioid use: There were significantly fewer patients using opioids p < 0.05) post-treatment. At 6-month follow-up, 76.3% maintained opioid cessation, 14.6% resumed opioid use, 5.8% remained continued to use opioids, and 3.4% discontinued opioid use. At 12-month follow-up, 14.6% maintained opioid cessation, 5.8% resumed opioids, 3.4% continued to use opioids, and 76.3% discontinued opioid use. |
Townsend, 2008 | OB | 373 | Combined | 3 wks | Pain: Multidimensional Pain Inventory Opioid use: Verified using medical records and converted to MME | Pain: Significant improvement was found in pain outcomes post- treatment (p < 0.001) and six months post-treatment (p < 0.001). Opioid use: At discharge, 176 (92.6%) of the opioid group had completed the taper of opioids (x = 20.57; df = 1, p < 0.001). |
Ward, 2022 | OB | 237 * | Combined | 10 wks | Pain: Pain Numeric Scale Opioid use: Number of days with prescription opioids determined from VA pharmacy data | Pain: No significant improvement to pain scores noted. Opioid use: No significant differences in percentage of opioid use found one year pre-post treatment for both EVP engaged and not engaged participants. |
Van Der Merwe, 2021 | OB | 164 * | Combined | 10 days | Pain: Brief Pain Inventory Opioid use: Self-reported | Pain: Significant improvement with treatment (p < 0.001). Opioid use: Approximately, 25% ceased opioid use and 17% had reduced opioid use post-treatment. |
Hooten, 2007 | OB | 159 | Combined | 3 wks | Pain: Multidimensional Pain Inventory Opioid use: Medical chart review | Pain: Significant improvement with program treatment (p < 0.001). Opioid use: Compared with admission, opioid use at post-treatment was significantly reduced (p < 0.001). |
Davis, 2018 | OB | 156 | Acupuncture | 12 sessions/60 days | Pain: Patient-Reported Outcomes Measurement Information System Opioid use: Self-reported | Pain: Significant improvements in pain intensity (p < 0.01). Opioid use: Approximately 32% of patients using opioids reported reductions in use post-intervention. |
Schumann, 2020 | OB | 134 | Combined | 3 wks | Pain: West Haven Yale Multidisciplinary Pain Inventory Opioid use: Self-reported and converted to MME | Pain: Significant treatment effects (p < 0.001) with large effect sizes were observed. Opioid use: Significant reductions (p < 0.01) in opioids were found post-treatment. All participants in the opioid group completed the opioid taper and discontinued use. |
Gibson, 2020 | OB | 99 * | Combined | 3 months | Pain: Brief Pain Inventory Opioid use: Self-reported | Pain: No significant change in pain severity (p = 0.11, ES = 0.16). Opioid use: At baseline, 77 participants were prescribed opioids, 6 (7%) discontinued use between baseline and follow-up. |
Van Hooff, 2012 | OB | 85 | Combined | 10 days | Pain: Visual Analogue Scale Opioid use: Self-reported | Pain: No significant improvement at 1-year follow-up (p = 0.34). Opioids use: Minimal reduction was found, 25% of patients used opioids (15% weak opioid, 10% strong opioid) at pre-treatment, and 14% of patients used opioids (11% weak opioid, 3% strong opioid) at 2-year follow-up. |
Gilliam, 2020 | OB | 762 | Combined | 15 days | Pain: West Haven Yale Multidimensional Pain Inventory Opioid use: Medical records, medicine bottles, patient report, and state prescription monitoring programs and converted to MME | Pain: Significant improvements were found for pain outcomes. Opioid use: Significant improvements were found for opioid use. At discharge, all patients (31.8%, n = 242) taking opioids at pre-treatment had completed the taper and discontinued opioid use. |
Trinh, 2023 | OB | 74 | Device | 30 days | Pain: Brief Pain Inventory, Visual Analogue Scale Opioid use: Self-reported, compensation claimants | Pain: Significant reduction in pain post H-Wave treatment (p < 0.0001) Opioid use: Approximately, 49% of the patients taking opioids prior to the H-Wave device intervention subsequently reduced or stopped their usage. |
Passmore, 2022 | OB | 62 | Chiropractic | NA | Pain: Numeric Rating Scale Opioid use: Self-reported | Pain: Significant decrease in pain intensity was found. Opioid use: Significant reduction of opioid use was found (p = 0.012), approximately 59.0% reduction post-treatment. |
Buchfuhrer, 2023 | OB | 20 | Device | 21 days | Pain: Clinician Global Impression of Improvement Opioid use: Self-reported and converted to MME | Pain: No changes to restless legs syndrome severity found. Opioid use: Approximately, 70% of participants (14/20) successfully reduced opioid use >20%, 29.9% mean opioid reduction (SD = 23.7%, n = 20) from 39.0 to 26.8 MME per day post-TOMAC treatment. |
Barrett, 2021 | OB | 17 | Combined | 8 wks | Pain: Brief Pain Inventory Opioid use: Self-reported and converted to MME | Pain: No significant changes in pain severity (5.9 vs. 5.93, p = 0.913). Opioid use: Five participants (38.5%) reported decreasing their opioid use since baseline. Of these five, opioid use reductions were 17%, 25%, 34%, 55%, and 74%. The mean opioid use decreased from 138.17 mg (SD = 83.99) to 101.21 mg (SD = 45.71). |
Matyac, 2022 | OB | 13 | Educational Program | 5 wks | Pain: Pain, Enjoyment, and General Activity Opioid use: Self-reported and converted to MME | Pain: The program was associated with decreased pain intensity. Opioid use: Although not significant, the program was associated with reduced opioid use. |
Nilsen, 2010 | OB | 11 | CBT | 8 wks | Pain: Brief Pain Inventory Opioid use: Codeine (milligram) use and blood sample taken at the first session for genetic polymorphism CTP2D6 | Pain: No significant changes (p > 0.05) were found to mid-treatment (d = 0.3), post-treatment (d = 0.4), or to follow-up (d = 0.4). Opioid use: A significant decrease in codeine use was found from pre- to mid-treatment (t = 11.4, p < 0.001; d = 2.2), pre-to post-treatment (t = 11.8, p < 0.001; d = 2.9), pre-treatment to follow-up (t = 11.7, p < 0.001; d = 2.9) and from mid- to post-treatment (t = 6.1, p < 0.001; d = 1.4). |
McCrae, 2020 | SA | 113 | CBT | 8 wks | Pain: NA Opioid use: Self-reported | Pain: NA. Opioid use: There were no significant effects for frequency of opioid use between groups (CBT-insomnia, CBT-pain, waitlist control). |
Miller-Matero, 2022 | SA | 60 | Combined | 5 sessions | Pain: Brief Pain Inventory Opioid use: EHRs verified and converted to MME | Pain: Intervention significantly reduced pain outcomes (p = 0.048). Opioid use: Though not significant, the intervention showed lower odds of having an opioid prescription 6 months post-intervention (p = 0.09, OR = 0.32). |
3.2.1. combination npi, 3.2.2. educational programs, 3.2.3. noninvasive devices or digital technology, 3.2.4. cognitive behavioral therapy (cbt), 3.2.5. mindfulness, 3.2.6. acupuncture, 3.2.7. yoga, 3.2.8. hypnosis, 3.2.9. chiropractic, 3.3. reported effect sizes, 3.4. measures, 3.5. assessment of methodological quality, 4. discussion, limitations, 5. conclusions, supplementary materials, author contributions, institutional review board statement, informed consent statement, data availability statement, acknowledgments, conflicts of interest.
Section | Item | Prisma-ScR Checklist Item | Reported on Page # |
---|---|---|---|
Title | 1 | Identify the report as a scoping review. | 1 |
Structured summary | 2 | Provide a structured summary that includes (as applicable) background, objectives, eligibility criteria, sources of evidence, charting methods, results, and conclusions that relate to the review questions and objectives. | 1 |
Rationale | 3 | Describe the rationale for the review in the context of what is already known. Explain why the review questions/objectives lend themselves to a scoping review approach. | 1–3 |
Objectives | 4 | Provide an explicit statement of the questions and objectives being addressed with reference to their key elements (e.g., population or participants, concepts, and context) or other relevant key elements used to conceptualize the review questions and/or objectives. | 3 |
Protocol and registration | 5 | Indicate whether a review protocol exists; state if and where it can be accessed (e.g., a Web address); and if available, provide registration information, including the registration number. | 2–3 |
Eligibility criteria | 6 | Specify characteristics of the sources of evidence used as eligibility criteria (e.g., years considered, language, and publication status), and provide a rationale. | 3 |
Information sources | 7 | Describe all information sources in the search (e.g., databases with dates of coverage and contact with authors to identify additional sources), as well as the date the most recent search was executed. | 4 |
Search | 8 | Present the full electronic search strategy for at least 1 database, including any limits used, such that it could be repeated. | 28–29 |
Selection of sources of evidence† | 9 | State the process for selecting sources of evidence (i.e., screening and eligibility) included in the scoping review. | 3 |
Data charting process‡ | 10 | Describe the methods of charting data from the included sources of evidence (e.g., calibrated forms or forms that have been tested by the team before their use, and whether data charting was performed independently or in duplicate) and any processes for obtaining and confirming data from investigators. | 4 |
Data items | 11 | List and define all variables for which data were sought and any assumptions and simplifications made. | 3 |
Critical appraisal of individual sources of evidence§ | 12 | If performed, provide a rationale for conducting a critical appraisal of included sources of evidence; describe the methods used and how this information was used in any data synthesis (if appropriate). | 3 |
Synthesis of results | 13 | Describe the methods of handling and summarizing the data that were charted. | 4 |
Selection of sources of evidence | 14 | Give numbers of sources of evidence screened, assessed for eligibility, and included in the review, with reasons for exclusions at each stage, ideally using a flow diagram. | 5 |
Characteristics of sources of evidence | 15 | For each source of evidence, present characteristics for which data were charted and provide the citations. | 5 |
Critical appraisal within sources of evidence | 16 | If performed, present data on critical appraisal of included sources of evidence (see item 12). | 23–24 |
Results of individual sources of evidence | 17 | For each included source of evidence, present the relevant data that were charted that relate to the review questions and objectives. | 6–11 |
Synthesis of results | 18 | Summarize and/or present the charting results as they relate to the review questions and objectives. | 13–18 |
Summary of evidence | 19 | Summarize the main results (including an overview of concepts, themes, and types of evidence available), link to the review questions and objectives, and consider the relevance to key groups. | 24–26 |
Limitations | 20 | Discuss the limitations of the scoping review process. | 26–27 |
Conclusions | 21 | Provide a general interpretation of the results with respect to the review questions and objectives, as well as potential implications and/or next steps. | 27 |
Funding | 22 | Describe sources of funding for the included sources of evidence, as well as sources of funding for the scoping review. Describe the role of the funders of the scoping review. | 27 |
Click here to enlarge figure
NPI | Gardiner, 2019 | Day, 2019 | Moffat, 2023 | Zeliadt, 2022 | Huffman, 2019 | Townsend, 2008 | Ward, 2022 | Van Der Merwe, 2021 | Hooten, 2007 | Schumann, 2020 | Gibson, 2020 | Van Hoof, 2012 | Gilliam, 2020 | Barrett, 2021 | Miller- Matero, 2022 |
Mindfulness | X | X | X | X | X | ||||||||||
Relaxation Techniques | X | X | X | ||||||||||||
CBT | X | X | X | X | X | X | X | X | X | X | |||||
Education | X | X | X | X | X | X | X | X | X | ||||||
Biofeedback | X | X | X | ||||||||||||
Yoga | X | ||||||||||||||
Audit and Feedback | X | ||||||||||||||
Taper Protocol | X | X | X | X | X | ||||||||||
Physical or Occupational Therapy or Movement | X | X | X | X | X | X | X | X | |||||||
Guided Imagery | X | ||||||||||||||
Group Visits | X | X | X | X | |||||||||||
Hypnosis | X | ||||||||||||||
Acupuncture | X | ||||||||||||||
ACT | X | X | X | ||||||||||||
Psychotherapy | X | ||||||||||||||
Stress Management | X | X | |||||||||||||
Chiropractic | X | X | |||||||||||||
Tai Chi/Qigong | X | ||||||||||||||
Meditation | X | X | X | ||||||||||||
Massage | X | X | X | ||||||||||||
Whole Health Coaching | X | ||||||||||||||
Hydrotherapy | X | ||||||||||||||
Breathing Practices | X | ||||||||||||||
Device or Digital Technology | X | ||||||||||||||
Reduced Pain and Opioid Use? | N | N | N | N | Y | Y | N | Y | Y | Y | N | N | Y | N | N |
Integrated Approach? | Y | Y | N | N | Y | Y | Y | Y | Y | Y | N | Y | Y | Y | N |
First Author, Year | Additional Measures | Additional Results |
---|---|---|
Garcia, 2021 | Pain Interference with Activity, Sleep, Mood, and Stress (DVPRS-II, PROMIS), Pain Catastrophizing Scale (PCS), Pain Efficacy (PSEQ-2), Chronic Pain Acceptance (CPAQ-8), Patient’s Global Impression of Change, Satisfaction with VR Device Use, Cybersickness, Over-the-Counter Analgesic Medication Use | EaseVRx intervention decreased pain-related interference with activity, mood, and stress, and nonopioid medication use. Pain catastrophizing, pain self-efficacy, and pain acceptance did not reach statistical significance for either group. |
Jensen, 2020 | Pain Interference (BPI), Depressive (PHQ-8), Global Impression of Change (IMMPACT), Satisfaction (PGATS) | All 4 treatment groups showed improvements on pain-related interference and depressive symptoms, with some return to pre-treatment levels at 12-month follow-up. |
Zheng, 2019 | Medication Quantification Scale III was used to quantify nonopioid medications, Unpleasantness was measured with a 0–20 Numerical Rating Scale, Depression (BDI), Quality of Life (SF-36), Disability (RMDQ), Perception of Electroacupuncture Treatment Questionnaire | There were no significant differences found across the treatment groups on mental health, feelings of unpleasantness, nonopioid medication doses, disability, and opioid-related adverse events. |
Garland, 2022 | Pain Interference (BPI), Emotional distress (DASS), Opioid Misuse and Cravings (DMI, COMM) | MORE group experienced greater reductions in pain-related functional interference and lower emotional distress and opioid cravings than the supportive psychotherapy group. |
Hudak, 2021 * | Self-referential Processing (NADA-state, PBBS) | MORE group demonstrated significantly increased alpha and theta power and increased frontal midline theta coherence compared to the control group—neural changes with altered self-referential processing were noted. |
Wilson, 2023 | Opioid Misuse (COMM), Global Health (PROMIS), Pain Knowledge (The Pain Knowledge Questionnaire), Pain Self-Efficacy (PSEQ), Pain Coping (CSQ-R) | No significant effect found from baseline to 10-month posttest for COMM and Global Health. Improvements were found in pain knowledge, pain self-efficacy, and pain coping. |
Garland, 2024 * | Emotional Stress (DASS), Post-Traumatic Stress Disorder Checklist—Military Version, Pain Catastrophizing subscale of the Coping Strategies Questionnaire, the Snaith–Hamilton Anhedonia and Pleasure Scale, the positive affect subscale of the Positive and Negative Affect Schedule, the Cognitive Reappraisal of Pain Scale, and Nonreactivity Subscale of the Five Facet Mindfulness Questionnaire, Opioid Cravings (COMM) | MORE group reduced opioid use while maintaining pain control and preventing mood disturbances. MORE group reduced opioid cravings, opioid cue reactivity, anhedonia, pain catastrophizing, and opioid attentional bias and increased positive affect more than the control group. |
DeBar, 2022 | Roland–Morris Disability Questionnaire (RMDQ) | CBT intervention sustained larger reductions in pain related disability. |
Gardiner, 2019 | Depression (PHQ-9), Patient Activation Measure, Health-related Quality of Life (short form 12 Health Survey version 2: SF-12), Opioid Misuse (COMM) | Significant differences between the intervention and control group for activation and opioid misuse. No differences in depression at any time point. At 21 weeks, the intervention group had higher quality of life compared with the control group |
Wartko, 2023 | Pain Self-Efficacy (PSEQ), Depression (PHQ-8), Generalized Anxiety (GAD-7), Patient Global Impression of Change, Prescription Opioid Difficulties Scale, Prescription Opioid Misuse Index | No significant differences between intervention and usual care were found for any of the secondary outcomes. |
Groessl, 2017 * | Roland–Morris Disability Questionnaire (RMDQ) | Improvements in disability scores did not differ between the two groups at 12 weeks, but yoga showed greater reductions in disability scores than delayed treatment group at 6 months. |
Roseen, 2022 * | Post-Traumatic Stress Symptoms (PCL-C), Roland–Morris Disability Questionnaire (RMDQ) | No significant differences between intervention and education were found for secondary outcomes. |
Sandhu, 2023 | Patient-Reported Outcomes Measurement Information System (PROMIS-PI-SF-8a), Short Opioid Withdrawal Scale (SHOWS), Health-related Quality of Life (SF-12v2 health survey and EuroQol 5-dimension 5-level), Sleep Quality (Pittsburgh Sleep Quality Index), Emotional Wellbeing (HADS), Pain Self-Efficacy (PSEQ) | At 4-month follow-up, the education intervention showed significant improvements in mental health, pain self-efficacy, and health-related quality of life, but did not show improvements at any other data collection time point. No statistically significant between-group differences in opioid withdrawal symptoms, sleep quality, or pain interference were found. |
Does, 2024 | Depression (PHQ-9), Quality of Life, Health, and Functional Status (PROMIS), Patient Activation Measure (PAM-13) | The intervention demonstrated less moderate/severe depression symptoms and higher overall health and function status. The intervention had no effect on activation scores at 12 months. |
Naylor, 2010 | Function/Disability from the Treatment Outcomes in Pain Survey, Depression (BDI), Pain Coping (CSQ). | TIVR intervention group demonstrated improved coping, depression symptoms, function, and disability, compared to the standard follow-up group. |
Nielssen, 2019 | Depression (PHQ9), Anxiety (GAD-7) | Reduction in opioid consumption was strongly associated with decreases in anxiety and depression symptoms. |
Day, 2019 | Physical Function, Depression, and Pain Interference (PROMIS) | MBCT group improved significantly more than MM group on pain interference, physical function, and depression symptoms. MBCT and CT group did not differ significantly on any of the measures. |
Spangeus, 2023 | Health-related Quality of Life (EQ-5D-3L, RAND-36, Qualeffo-41), Static and Dynamic Balance Tests, Fall Risk and Physical Activity (FES-I), Theoretical Knowledge (open-ended questions) | Significant improvements were found for quality of life, balance, tandem walking backwards, and theoretical knowledge. These changes were maintained at the 1-year follow-up. |
Nelli, 2023 | NA | NA |
Moffat, 2023 * | NA | NA |
Zeliadt, 2022 * | NA | NA |
Huffman, 2019 | Pain-related Functional Impairment (PDI), Depression and Anxiety (DASS) | Intervention showed significant pre-post treatment improvements in functional impairment, depression, and anxiety symptoms. |
Townsend, 2008 | Health Status (SF-36), Pain Catastrophizing Scale (PCS), Depression (CES-D) | Significant improvements were found on health status, pain catastrophizing, and depression symptoms following treatment and six-month post-treatment irrespective of opioid status at admission. |
Ward, 2022 * | Depression (PHQ9), VA Stratification Tool for Opioid Risk Mitigation (STORM) | Reduced depression scores in the post-treatment year were found in the engaged group. EVP showed a 65% lower mortality risk compared to the untreated group. |
Van Der Merwe, 2021 * | Pain Interference (BPI), Pain Catastrophizing Scale (PCS), Mood (CORE), Post-traumatic Stress Symptoms (Impact of Events Scale: IES-6), Self-Efficacy and Confidence (PSEQ) | Pain management program significantly improved pain-related interference, mood, self-efficacy, and confidence, post-traumatic stress symptoms, and pain catastrophizing. |
Hooten, 2007 | Health Status (SF-36), Pain Coping (CSQ), Depression (CES-D) | Health status, coping, and depression scores demonstrated improvement with the intervention. |
Davis, 2018 | Pain Interference, Fatigue, Physical Function, Sleep Disturbance, Emotional Distress—Anxiety, Emotional Distress—Depression, and Social Isolation Short Forms (PROMIS) | Significant improvements were found in pain-related interference, physical function, fatigue, anxiety, depression, sleep disturbance, and social isolation. |
Schumann, 2020 | Pain Catastrophizing Scale (PCS), Depressive symptoms (CES-D, PHQ-9), Quality of Life (Medical Outcomes Study 36-Item Short Form Survey) | Significant treatment effects with large effect sizes were observed for all outcome measures at post-treatment and 6-month follow-up. |
Gibson, 2020 * | Pain Catastrophizing Scale (PCS), Current Opioid Misuse Measure (COMM), Patient Treatment Satisfaction Scale (PTSS) | Significant decrease in pain-related interference, pain catastrophizing, pain magnification, pain helplessness, and opioid misuse were found. |
Van Hoof, 2012 | Roland and Morris Disability Questionnaire (RMDQ), SF36 PCS Short Form 36 Physical Component Scale, SF36 MCS Short Form 36 Mental Component Scale, pain disturbance of ADLs (0–100 scale) | For the 1 and 2-year follow-up, only pain disturbance of ADLs significantly improved: df (1,84), t = 2.57, p = 0.01. |
Gilliam, 2020 | PTSD Checklist with a brief Criterion A assessment (PCL-5), Pain Catastrophizing Scale (PCS), Depression (PHQ-9), Physical performance measures | Intervention showed significant improvements in PTSD, depression, physical performance, and pain outcomes. |
Trinh, 2023 | Depression (PHQ9), Anxiety (GAD-7), Pain Disability Questionnaire | Intervention showed a 24.4% reduction in depression, 31% reduction in anxiety, and significant improvement in function/disability. |
Passmore, 2022 | NA | NA |
Buchfuhrer, 2023 | NA | NA |
Barrett, 2021 | Pain Interference (BPI), Pain willingness and activity engagement (CPAQ), Depression (PHQ-9) | No significant changes in pain interference, but significant improvements in pain willingness, activity engagement, and depression were found. |
Matyac, 2022 | Opioid Risk (ORT), Pain Catastrophizing (PCS) | The program showed reduction in pain catastrophizing and pain scores. Combining data from opioid risk and data on sleep apnea, the results showed that 31% of participants were at high risk of opioid overdose. |
Nilsen, 2010 | Health-related Quality of Life (SF-36), Neurocognitive Tests | Neuropsychological functioning improved on some tests; others remained unchanged. Opioid use decreased without significant reduction in quality of life. |
McCrae, 2020 | NA | NA |
Miller-Matero, 2022 | Pain Interference (BPI), Pain Catastrophizing (PCS), Depressive Symptoms (HADS) | Intervention showed decreases in pain catastrophizing and depression symptoms. There were significant improvements in pain-related interferences. |
First Arthur, Year | Total Quality Index Score (Range, 0–29 Points) |
---|---|
Garcia, 2021 | 29 |
Jensen, 2020 | 29 |
Zheng, 2019 | 29 |
Garland, 2022 | 27 |
Hudak, 2021 | 27 |
Wilson, 2023 | 27 |
Garland, 2024 | 27 |
DeBar, 2022 | 27 |
Gardiner, 2019 | 26 |
Wartko, 2023 | 26 |
Groessl, 2017 | 26 |
Roseen, 2022 | 26 |
Sandhu, 2023 | 25 |
Does, 2024 | 24 |
Naylor, 2010 | 24 |
Nielssen, 2019 | 22 |
Day, 2019 | 24 |
Spangeus, 2023 | 23 |
Nelli, 2023 | 24 |
Moffat, 2023 | 21 |
Zeliadt, 2022 | 23 |
Huffman, 2019 | 23 |
Townsend, 2008 | 23 |
Ward, 2022 | 23 |
Van Der Merwe, 2020 | 23 |
Hooten, 2007 | 22 |
Davis, 2018 | 23 |
Schumann, 2020 | 23 |
Gibson, 2020 | 22 |
Van Hooff, 2012 | 17 |
Gilliam, 2020 | 21 |
Trinh, 2023 | 23 |
Passmore, 2022 | 23 |
Buchfuhrer, 2023 | 21 |
Barrett, 2023 | 23 |
Matyac, 2022 | 20 |
Nilsen, 2010 | 19 |
McCrae, 2020 | 21 |
Miller-Matero, 2022 | 23 |
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
Coffee, Z.; Cheng, K.; Slebodnik, M.; Mulligan, K.; Yu, C.H.; Vanderah, T.W.; Gordon, J.S. The Impact of Nonpharmacological Interventions on Opioid Use for Chronic Noncancer Pain: A Scoping Review. Int. J. Environ. Res. Public Health 2024 , 21 , 794. https://doi.org/10.3390/ijerph21060794
Coffee Z, Cheng K, Slebodnik M, Mulligan K, Yu CH, Vanderah TW, Gordon JS. The Impact of Nonpharmacological Interventions on Opioid Use for Chronic Noncancer Pain: A Scoping Review. International Journal of Environmental Research and Public Health . 2024; 21(6):794. https://doi.org/10.3390/ijerph21060794
Coffee, Zhanette, Kevin Cheng, Maribeth Slebodnik, Kimberly Mulligan, Chong Ho Yu, Todd W. Vanderah, and Judith S. Gordon. 2024. "The Impact of Nonpharmacological Interventions on Opioid Use for Chronic Noncancer Pain: A Scoping Review" International Journal of Environmental Research and Public Health 21, no. 6: 794. https://doi.org/10.3390/ijerph21060794
Article access statistics, supplementary material.
ZIP-Document (ZIP, 279 KiB)
Mdpi initiatives, follow mdpi.
Subscribe to receive issue release notifications and newsletters from MDPI journals
COMMENTS
A librarian can advise you on data extraction for your systematic review, including: What the data extraction stage of the review entails; Finding examples in the literature of similar reviews and their completed data tables; How to choose what data to extract from your included articles ; How to create a randomized sample of citations for a ...
Background: The reliable and usable (semi)automation of data extraction can support the field of systematic review by reducing the workload required to gather information about the conduct and results of the included studies. This living systematic review examines published approaches for data extraction from reports of clinical studies. Methods: We systematically and continually search PubMed ...
Data extraction is the process of a systematic review that occurs between identifying eligible studies and analysing the data, whether it can be a qualitative synthesis or a quantitative synthesis involving the pooling of data in a meta-analysis. The aims of data extraction are to obtain information about the included studies in terms of the characteristics of each study and its population and ...
This article is a practical guide to conducting data analysis in general literature reviews. The general literature review is a synthesis and analysis of published research on a relevant clinical issue, and is a common format for academic theses at the bachelor's and master's levels in nursing, physiotherapy, occupational therapy, public health and other related fields.
Methods. We reviewed guidance on the development and pilot testing of data extraction forms and the data extraction process. We reviewed four types of sources: 1) methodological handbooks of systematic review organisations (SRO); 2) textbooks on conducting systematic reviews; 3) method documents from health technology assessment (HTA) agencies and 4) journal articles.
Training of data extractors is intended to familiarize them with the review topic and methods, the data collection form or data system, and issues that may arise during data extraction. Results of the pilot testing of the form should prompt discussion among review authors and extractors of ambiguous questions or responses to establish consistency.
1. Review published methods and tools aimed at automating or semi-automating the process of data extraction in the context of a systematic review of medical research studies. 2. Review this evidence in the scope of a living review, keeping information up to date and relevant to the challenges faced by systematic reviewers at any time.
The entire literature review process, including literature search, data extraction and analysis, and reporting, should be tailored to answer the research question (Kitchenham and Charters 2007). Second, choose a review type suitable for the review purpose.
Data Extraction Templates. Data extraction is often performed using a single form to extract data from all included (relevant) studies in a uniform manner.Because the data extraction stage is driven by the scope and goals of a systematic review, there is not a gold standard or one-size-fits all approach to developing a data extraction form.. However, there are templates and guidance available ...
the review workflow. Data extraction Data extraction is the process of collecting and gathering data from various sources. It identifies relevant information and extracts it from documents, databases, or other sources. Data extraction focuses on obtaining the necessary data that will be used for subsequent analysis, synthesis or evaluation
What the data extraction stage of the review entails; Finding examples in the literature of similar reviews and their completed data tables; How to choose what data to extract from your included articles ; How to create a randomized sample of citations for a pilot test; Export specific data elements from the included studies like title, authors ...
Extracting data from reviewed studies should be done in accordance to pre-established guidelines, such as the ones from PRISMA. From each included study, the following data may need to be extracted, depending on the review's purpose: title, author, year, journal, research question and specific aims, conceptual framework, hypothesis, research ...
Data extraction (DE) is a challenging step in systematic reviews (SRs). Complex SRs can involve multiple interventions and/or outcomes and encompass multiple research questions. Attempts have been made to clarify DE aspects focusing on the subsequent meta-analysis; there are, however, no guidelines for DE in complex SRs. Comparing datasets extracted independently by pairs of reviewers to ...
Literature reviews offer a critical synthesis of empirical and theoretical literature to assess the strength of evidence, develop guidelines for practice and policymaking, and identify areas for future research.1 It is often essential and usually the first task in any research endeavour, particularly in masters or doctoral level education. For effective data extraction and rigorous synthesis ...
For an overview on RevMan, including how it may be used to extract and analyze data, watch the RevMan Web Quickstart Guide or check out the RevMan Knowledge Base. SRDR. SRDR (Systematic Review Data Repository) is a Web-based tool for the extraction and management of data for systematic review or meta-analysis. It is also an open and searchable ...
Systematic literature reviews (SLR) are the foundation informing clinical and cost-effectiveness analyses in healthcare decision-making. Established guidelines have encouraged the use of standardised data extraction templates (DET) to guide extraction, ensure transparency in information collected across the studies and allow qualitative and/or ...
Types of literature review, methods, & resources; Protocol and registration; Search strategy; Medical Literature Databases to search; ... The next step is for the researchers to read the full text of each article identified for inclusion in the review and extract the pertinent data using a standardized data extraction/coding form. The data ...
Data analytics offers myriad advantages to modern-day organizations, for example, organizations are able to derive knowledge and intelligence to make strategic decisions through data analytics [5, 17].In addition to decision support and other numerous advantages reported in literature [], data analytics provides an avenue for organizations to make accurate forecast about sales, revenue ...
Objectives To investigate the validity of data extraction in systematic reviews of adverse events, the effect of data extraction errors on the results, and to develop a classification framework for data extraction errors to support further methodological research. Design Reproducibility study. Data sources PubMed was searched for eligible systematic reviews published between 1 January 2015 and ...
Information extraction (IE) is a challenging task, particularly when dealing with highly heterogeneous data. State-of-the-art data mining technologies struggle to process information from textual data. Therefore, various IE techniques have been developed to enable the use of IE for textual data. However, each technique differs from one another because it is designed for different data types ...
The systematic literature review (SLR) is the gold standard that provides firm scientific evidence to support decision-making. SLRs play a vital role in offering a holistic assessment of efficacy, safety, and cost-effectiveness of a diagnostic aid or therapy by synthesizing data from various clinical studies.
This Systematic Literature Review (SLR) rests on two ideas: that "there is no research in this area that stems directly from the communication curriculum" (Walker, 2014: 36), and an attempt to respond to Sellnow et al.'s (2015) call to link communication and forensic education. The potential impact of bridging the media and criminal ...
Despite their widely acknowledged usefulness [], the process of systematic review, specifically the data extraction step (step 4), can be time-consuming.In fact, it typically takes 2.5-6.5 years for a primary study publication to be included and published in a new systematic review [].Further, within 2 years of the publication of systematic reviews, 23 % are out of date because they have not ...
The draft data extraction tool will be piloted on 3 sources and modified as necessary. All completed sources will be reviewed if the data extraction tool is revised. Modifications will be detailed in the scoping review. Any disagreements that arise between the reviewers will be resolved through discussion or with a third reviewer.
Data extraction and analysis. A systematic search of the literature identified 222 eligible papers for inclusion in the final review. A data extraction table was used to extract information regarding location of the research, type of paper (e.g. review, empirical), service of interest and key findings.
The data extraction form will be piloted on a sample of the included studies and possibly modified. Inclusion of a primary source provided; we intend to contact authors for further information when necessary. Concerning the data extraction—in alignment with the aims of this project—our current data extraction form contains the following items:
Data extraction forms link systematic reviews with primary research and provide the foundation for appraising, analysing, summarising and interpreting a body of evidence. This makes their development, pilot testing and use a crucial part of the systematic reviews process. Several studies have shown that data extraction errors are frequent in systematic reviews, especially regarding outcome data.
Studies providing information about the cycle's steps related to orthodontics were included. Study selection and data extraction were performed by two of the authors. Results. A total of 3,923 articles were retrieved. After review of titles and abstracts, 41 articles were selected for full-text review and 25 articles were eligible for inclusion.
Automation of the parts of systematic review process, specifically the data extraction step, may be an important strategy to reduce the time necessary to complete a systematic review. However, the state of the science of automatically extracting data elements from full texts has not been well described. This paper performs a systematic review of published and unpublished methods to automate ...
The present scoping review followed recommendations for rigorous reviews with four independent reviewers conducting the literature search, extracting data, and assessing study quality. To identify as many applicable studies as possible and reduce the risk of bias for this review, a thorough and highly sensitive search strategy was employed.