• Corpus ID: 19081372

Single case studies vs. multiple case studies: A comparative study

  • Johanna Gustafsson
  • Published 2017

Tables from this paper

table 1

318 Citations

Categorization of case in case study research method: new approach, rigour in the management case study method: a study on master's dissertations, what is a case study, grounded theory: a guide for exploratory studies in management research.

  • Highly Influenced

Cross-Platform Mobile App Development in Industry: A Multiple Case-Study

Integrating strategic planning and performance management in universities: a multiple case-study analysis, advantages and disadvantages of using qualitative and quantitative approaches and methods in language, a review of the participant observation method in journalism: designing and reporting.

  • 12 Excerpts

Managing Platform Business Growth: A Case Study of TikTok

A multiple case design for the investigation of information management processes for work-integrated learning., 58 references, what is a case study and what is it good for.

  • Highly Influential

A Case in Case Study Methodology

Qualitative case study guidelines, methodology or method a critical review of qualitative case study reports., persuasion with case studies, a typology for the case study in social science following a review of definition, discourse, and structure, what are case studies good for nesting comparative case study research into the lakatosian research program, case study research design and methods, better stories and better constructs: the case for rigor and comparative logic.

  • 10 Excerpts

Case Study Research

Related papers.

Showing 1 through 3 of 0 Related Papers

Multiple Case Research Design

  • First Online: 10 November 2021

Cite this chapter

comparative multi case study

  • Stefan Hunziker 3 &
  • Michael Blankenagel 3  

5717 Accesses

7 Citations

This chapter addresses the peculiarities, characteristics, and major fallacies of multiple case research designs. The major advantage of multiple case research lies in cross-case analysis. A multiple case research design shifts the focus from understanding a single case to the differences and similarities between cases. Thus, it is not just conducting more (second, third, etc.) case studies. Rather, it is the next step in developing a theory about factors driving differences and similarities. Also, researchers find relevant information on how to write a multiple case research design paper and learn about typical methodologies used for this research design. The chapter closes with referring to overlapping and adjacent research designs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bruns, W. J., & McKinnon, S. M. (1993). Information and managers: A field study. Journal of Management Accounting Research, 5 , 84–108.

Google Scholar  

Eisenhardt, K. M., & Graebner, M. E. (2007). Theory building from cases: Opportunities and challenges. Academy of Management Journal, 50 (1), 25–32.

Article   Google Scholar  

Ferreira, L. D. & Merchant, K. A. (1992). Field research in management accounting and control: A review and evaluation . Emerald Group Publishing Limited.

Keating, P. J. (1995). A framework for classifying and evaluating the theoretical contributions of case research in management accounting. Journal of Management Accounting Research, 7 , 66–86.

Lillis, A. M., & Mundy, J. (2005). Cross-sectional field studies in management accounting research—closing the gaps between surveys and case studies. Journal of Management Accounting Research, 17 (1), 119–141.

Ragin, C. C. (2009). Reflections on casing and case-oriented research (pp. 522–534). The Sage handbook of case-based method.

Ridder, H.-G. (2017). The theory contribution of case study research designs. Business Research, 10 (2), 281–305.

Stake, R. E. (2005). Qualitative case studies. In N.K. Denzin & Y.S. Lincoln (Eds.), The SAGE handbook of qualitative research (3rd ed., pp. 443–466).

Vaughan, D. (1992). Theory elaboration: The heuristics of case analysis. What is a case?. In C.C. Ragin & H.S. Becker (Eds.), Exploring the foundations of social inquiry (pp. 173–202). Cambridge University Press.

Walsham, G. (2006). Doing interpretive research. European Journal of Information Systems, 15 (3), 320–330.

Yin, R. K. (2014). Case study research. Design and methods (5th ed.). SAGE.

Download references

Author information

Authors and affiliations.

Wirtschaft/IFZ – Campus Zug-Rotkreuz, Hochschule Luzern, Zug-Rotkreuz, Zug , Switzerland

Stefan Hunziker & Michael Blankenagel

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Stefan Hunziker .

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature

About this chapter

Hunziker, S., Blankenagel, M. (2021). Multiple Case Research Design. In: Research Design in Business and Management. Springer Gabler, Wiesbaden. https://doi.org/10.1007/978-3-658-34357-6_9

Download citation

DOI : https://doi.org/10.1007/978-3-658-34357-6_9

Published : 10 November 2021

Publisher Name : Springer Gabler, Wiesbaden

Print ISBN : 978-3-658-34356-9

Online ISBN : 978-3-658-34357-6

eBook Packages : Business and Economics (German Language)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

22 August 2024: Due to technical disruption, we are experiencing some delays to publication. We are working to restore services and apologise for the inconvenience. For further updates please visit our website: https://www.cambridge.org/universitypress/about-us/news-and-blogs/cambridge-university-press-publishing-update-following-technical-disruption

We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings .

Login Alert

comparative multi case study

  • > The Case for Case Studies
  • > Selecting Cases for Comparative Sequential Analysis

comparative multi case study

Book contents

  • The Case for Case Studies
  • Strategies for Social Inquiry
  • Copyright page
  • Contributors
  • Preface and Acknowledgments
  • 1 Using Case Studies to Enhance the Quality of Explanation and Implementation
  • Part I Internal and External Validity Issues in Case Study Research
  • Part II Ensuring High-Quality Case Studies
  • 6 Descriptive Accuracy in Interview-Based Case Studies
  • 7 Selecting Cases for Comparative Sequential Analysis
  • 8 The Transparency Revolution in Qualitative Social Science
  • Part III Putting Case Studies to Work: Applications to Development Practice

7 - Selecting Cases for Comparative Sequential Analysis

Novel Uses for Old Methods

from Part II - Ensuring High-Quality Case Studies

Published online by Cambridge University Press:  05 May 2022

Pavone analyzes how our evolving understanding of case-based causal inference via process-tracing should alter how we select cases for comparative inquiry. The chapter explicates perhaps the most influential and widely used means to conduct qualitative research involving two or more cases: Mill’s methods of agreement and difference. It then argues that the traditional use of Millian methods of case selection can lead us to treat cases as static units to be synchronically compared rather than as social processes unfolding over time. As a result, Millian methods risk prematurely rejecting and otherwise overlooking (1) ordered causal processes, (2) paced causal processes, and (3) equifinality, or the presence of multiple pathways that produce the same outcome. To address these issues, the chapter develops a set of recommendations to ensure the alignment of Millian methods of case selection with within-case sequential analysis.

7.1 Introduction

In the lead article of the first issue of Comparative politics , Harold Lasswell posited that the “scientific approach” and the “comparative method” are one and the same ( Reference Lasswell Lasswell 1968 : 3). So important is comparative case study research to the modern social sciences that two disciplinary subfields – comparative politics in political science and comparative-historical sociology – crystallized in no small part because of their shared use of comparative case study research ( Reference Collier and Finifter Collier 1993 ; Reference Adams, Clemens, Orloff, Adams, Clemens and Orloff Adams, Clemens, and Orloff 2005 : 22–26; Reference Mahoney and Thelen Mahoney and Thelen 2015 ). As a result, a first-principles methodological debate emerged about the appropriate ways to select cases for causal inquiry. In particular, the diffusion of econometric methods in the social sciences exposed case study researchers to allegations that they were “selecting on the dependent variable” and that “selection bias” would hamper the “answers they get” ( Reference Geddes Geddes 1990 ). Lest they be pushed to randomly select cases or turn to statistical and experimental approaches, case study researchers had to develop a set of persuasive analytic tools for their enterprise.

It is unsurprising, therefore, that there has been a profusion of scholarship discussing case selection over the years. Footnote 1 Reference Gerring and Cojocaru Gerring and Cojocaru (2015) synthesize this literature by deriving no less than five distinct types (representative, anomalous, most-similar, crucial, and most-different) and eighteen subtypes of cases, each with its own logic of case selection. It falls outside the scope of this chapter to provide a descriptive overview of each approach to case selection. Rather, the purpose of the present inquiry is to place the literature on case selection in constructive dialogue with the equally lively and burgeoning body of scholarship on process tracing ( Reference George and Bennett George and Bennett 2005 ; Reference Brady and Collier Brady and Collier 2010 ; Reference Beach and Pedersen Beach and Pedersen 2013 ; Reference Bennett and Checkel Bennett and Checkel 2015 ). I ask a simple question: Should our evolving understanding of causation and our toolkit for case-based causal inference courtesy of process-tracing scholars alter how scholars approach case selection? If so, why, and what may be the most fruitful paths forward?

To propose an answer, this chapter focuses on perhaps the most influential and widely used means to conduct qualitative research involving two or more cases: Mill’s methods of agreement and difference. Also known as the “most-different systems/cases” and “most-similar systems/cases” designs, these strategies have not escaped challenge – although, as we will see, many of these critiques were fallaciously premised on case study research serving as a weaker analogue to econometric analysis. Here, I take a different approach: I argue that the traditional use of Millian methods of case selection can indeed be flawed, but rather because it risks treating cases as static units to be synchronically compared rather than as social processes unfolding over time. As a result, Millian methods risk prematurely rejecting and otherwise overlooking (1) ordered causal processes, (2) paced causal processes, and (3) equifinality, or the presence of multiple pathways that produce the same outcome. While qualitative methodologists have stressed the importance of these processual dynamics, they have been less attentive to how these factors may problematize pairing Millian methods of case selection with within-case process tracing (e.g., Reference Hall, Mahoney and Rueschemeyer Hall 2003 ; Reference Tarrow Tarrow 2010 ; Reference Falleti, Mahoney, Mahoney and Thelen Falleti and Mahoney 2015 ). This chapter begins to fill that gap.

Taking a more constructive and prescriptive turn, the chapter provides a set of recommendations for ensuring the alignment of Millian methods of case selection with within-case sequential analysis. It begins by outlining how the deductive use of processualist theories can help reformulate Millian case selection designs to accommodate ordered and paced processes (but not equifinal processes). More originally, the chapter concludes by proposing a new, alternative approach to comparative case study research: the method of inductive case selection . By making use of Millian methods to select cases for comparison after a causal process has been identified within a particular case, the method of inductive case selection enables researchers to assess (1) the generalizability of the causal sequences, (2) the logics of scope conditions on the causal argument, and (3) the presence of equifinal pathways to the same outcome. In so doing, scholars can convert the weaknesses of Millian approaches into strengths and better align comparative case study research with the advances of processualist researchers.

Organizationally, the chapter proceeds as follows. Section 7.2 provides an overview of Millian methods for case selection and articulates how the literature on process tracing fits within debates about the utility and shortcomings of the comparative method. Section 7.3 articulates why the traditional use of Millian methods risks blinding the researcher to ordered, paced, and equifinal causal processes, and describes how deductive, processualist theorizing helps attenuate some of these risks. Section 7.4 develops a new inductive method of case selection and provides a number of concrete examples from development practice to illustrate how it can be used by scholars and policy practitioners alike. Section 7.5 concludes.

7.2 Case Selection in Comparative Research

7.2.1 case selection before the processual turn.

Before “process tracing” entered the lexicon of social scientists, the dominant case selection strategy in case study research sought to maximize causal leverage via comparison, particularly via the “methods of agreement and difference” of John Stuart Reference Mill Mill (1843 [1974] : 388–391).

In Mill’s method of difference, the researcher purposively chooses two (or more) cases that experience different outcomes, despite otherwise being very similar on a number of relevant dimensions. Put differently, the researcher seeks to maximize variation in the outcome variable while minimizing variation amongst a set of plausible explanatory variables. It is for this reason that the approach also came to be referred to as the ‘most-similar systems’ or ‘most-similar cases’ design – while Mill’s nomenclature highlights variation in the outcome of interest, the alternative terminology highlights minimal variation amongst a set of possible explanatory factors. The underlying logic of this case selection strategy is that because the cases are so similar, the researcher can subsequently probe for the explanatory factor that actually does exhibit cross-case variation and isolate it as a likely cause.

Mill’s method of agreement is the mirror image of the method of difference. Here, the researcher chooses two (or more) cases that experience similar outcomes despite being very different on a number of relevant dimensions. That is, the researcher seeks to minimize variation in the outcome variable while maximizing variation amongst a set of plausible explanatory variables. An alternative, independent variable-focused terminology for this approach was developed – the ‘most-different systems’ or ‘most-different cases’ design – breeding some confusion. The underlying logic of this case selection strategy is that it helps the researcher isolate the explanatory factor that is similar across the otherwise different cases as a likely cause. Footnote 2

comparative multi case study

Figure 7.1 Case selection setup under Mill’s methods of difference and agreement

Mill himself did not believe that such methods could yield causal inferences outside of the physical sciences ( Reference Mill Mill 1843 [1974] : 452). Nevertheless, in the 1970s a number of comparative social scientists endorsed Millian methods as the cornerstones of the comparative method. For example, Reference Przeworski and Teune Przeworski and Teune (1970) advocated in favor of the most-different cases design, whereas Reference Lijphart Lijphart (1971) favored the most-similar cases approach. In so doing, scholars sought case selection techniques that would be as analogous as possible to regression analysis: focused on controlling for independent variables across cases, maximizing covariation between the outcome and a plausible explanatory variable, and treating cases as a qualitative equivalent to a row of dataset observations. It is not difficult to see why this contributed to the view that case study research serves as the “inherently flawed” version of econometrics ( Reference Adams, Clemens, Orloff, Adams, Clemens and Orloff Adams, Clemens, and Orloff 2005 : 25; Reference Tarrow Tarrow 2010 ). Indeed, despite his prominence as a case study researcher, Reference Lijphart Lijphart (1975 : 165; Reference Lijphart 1971 : 685) concluded that “because the comparative method must be considered the weaker method,” then “if at all possible one should generally use the statistical (or perhaps even the experimental) method instead.” As Reference Hall, Mahoney and Rueschemeyer Hall (2003 : 380; 396) brilliantly notes, case study research

was deeply influenced by [Lijphart’s] framing of it … [where] the only important observations to be drawn from the cases are taken on the values of the dependent variable and a few explanatory variables … From this perspective, because the number of pertinent observations available from small-N comparison is seriously limited, the analyst lacks the degrees of freedom to consider more than a few explanatory variables, and the value of small-N comparison for causal inference seems distinctly limited.

In other words, the predominant case selection approach through the 1990s sought to do its best to reproduce a regression framework in a small-N setting – hence Lijphart’s concern with the “many variables, small number of cases” problem, which he argued could only be partially mitigated if, inter alia , the researcher increases the number of cases and decreases the number of variables across said cases ( Reference Lijphart 1971 : 685–686). Later works embraced Lijphart’s formulation of the problem even as they sought to address it: for example, Reference Eckstein, Greenstein and Polsby Eckstein (1975 : 85) argued that a “case” could actually be comprised of many “cases” if the unit of analysis shifted from being, say, the electoral system to, say, the voter. Predictably, such interventions invited retorts: Reference Lieberson Lieberson (1994) , for example, claimed that Millian methods’ inability to accommodate probabilistic causation, Footnote 3 interaction effects, and multivariate analysis would remain fatal flaws.

7.2.2 Enter Process Tracing

It is in this light that ‘process tracing’ – a term first used by Reference Hobarth Hobarth (1972) but popularized by Reference George and Lauren George (1979 ) and particularly Reference George and Bennett George and Bennett (2005) , Reference Brady and Collier Brady and Collier (2010) , Reference Beach and Pedersen Beach and Pedersen (2013) , and Reference Bennett and Checkel Bennett and Checkel (2015) – proved revolutionary for the ways in which social scientists conceive of case study research. Cases have gradually been reconceptualized not as dataset observations but as concatenations of concrete historical events that produce a specific outcome ( Reference Mahoney Goertz and Mahoney 2012 ). That is, cases are increasingly treated as social processes, where a process is defined as “a particular type of sequence in which the temporally ordered events belong to a single coherent pattern of activity” ( Reference Falleti, Mahoney, Mahoney and Thelen Falleti and Mahoney 2015 : 214). Although there exist multiple distinct conceptions of process tracing – from Bayesian approaches ( Reference Bennett, Bennett and Checkel Bennett 2015 ) to set-theoretic approaches ( Reference Mahoney, Kimball and Koivu Mahoney et al. 2009 ) to mechanistic approaches ( Reference Beach and Pedersen Beach and Pedersen 2013 ) to sequentialist approaches ( Reference Falleti, Mahoney, Mahoney and Thelen Falleti and Mahoney 2015 ) – their overall esprit is the same: reconstructing the sequence of events and interlinking causal logics that produce an outcome – isolating the ‘causes of effects’ – rather than probing a variable’s mean impact across cases via an ‘effects of causes’ approach. Footnote 4

For this intellectual shift to occur, processualist social scientists had to show how a number of assumptions underlying Millian comparative methods – as well as frequentist approaches more generally – are usually inappropriate for case study research. For example, the correlational approach endorsed by Reference Przeworski and Teune Przeworski and Teune (1970) , Reference Lijphart Lijphart (1971) , and Reference Eckstein, Greenstein and Polsby Eckstein (1975) treats observational units as homogeneous and independent ( Reference Hall, Mahoney and Rueschemeyer Hall 2003 : 382; Reference Mahoney Goertz and Mahoney 2012 ). Unit homogeneity means that “different units are presumed to be fully identical to each other in all relevant respects except for the values of the main independent variable,” such that each observation contributes equally to the confidence we have in the accuracy and magnitude of our causal estimates ( Reference Brady and Collier Brady and Collier 2010 : 41–42). Given this assumption, more observations are better – hence, Reference Lijphart Lijphart (1971) ’s dictum to “increase the number of cases” and, in its more recent variant, to “increase the number of observations” ( Reference King, Keohane and Verba King, Keohane, and Verba 1994 : 208–230). By independence, we mean that “for each observation, the value of a particular variable is not influenced by its value in other observations”; thus, each observation contributes “new information about the phenomenon in question” ( Reference Brady and Collier Brady and Collier 2010 : 43).

By contrast, practitioners of process tracing have shown that treating cases as social processes implies that case study observations are often interdependent and derived from heterogeneous units ( Reference Mahoney Goertz and Mahoney 2012 ). Unit heterogeneity means that not all historical events, and the observable evidence they generate, are created equal. Hence, some observations may better enable the reconstruction of a causal process because they are more proximate to the central events under study. Correlatively, this is why historians accord greater ‘weight’ to primary than to secondary sources, and why primary sources concerning actors central to a key event are more important than those for peripheral figures ( Reference Trachtenberg Trachtenberg 2009 ; Reference Tansey Tansey 2007 ). In short, while process tracing may yield a bounty of observable evidence, we seek not to necessarily increase the number, but rather the quality, of observations. Finally, by interdependence we mean that because time is “fateful” ( Reference Sewell Sewell 2005 : 6), antecedent events in a sequence may influence subsequent events. This “fatefulness” has multiple sources. For instance, historical institutionalists have shown how social processes can exhibit path dependencies where the outcome of interest becomes a central driver of its own reproduction ( Reference Pierson Pierson 1996 ; Reference Pierson Pierson 2000 ; Reference Mahoney Mahoney 2000 ; Reference Hall, Mahoney and Rueschemeyer Hall 2003 ; Reference Falleti, Mahoney, Mahoney and Thelen Falleti and Mahoney 2015 ). At the individual level, processual sociologists have noted that causation in the social world is rarely a matter of one billiard ball hitting another, as in Reference Hume Hume’s (1738 [2003]) frequentist concept of “constant conjunction.” Rather, it hinges upon actors endowed with memory, such that the micro-foundations of social causation rest on individuals aware of their own historicality ( Reference Sewell Sewell 2005 ; Reference Abbott Abbott 2001 ; Reference Abbott 2016 ).

At its core, eschewing the independence and unit homogeneity assumptions simply means situating case study evidence within its spatiotemporal context ( Reference Hall, Mahoney and Rueschemeyer Hall 2003 ; Reference Falleti and Lynch Falleti and Lynch 2009 ). This commitment is showcased by the language which process-sensitive case study researchers use when making causal inferences. First, rather than relating ‘independent variables’ to ‘dependent variables’, they often privilege the contextualizing language of relating ‘events’ to ‘outcomes’ ( Reference Falleti, Mahoney, Mahoney and Thelen Falleti and Mahoney 2015 ). Second, they prefer to speak not of ‘dataset observations’ evocative of cross-sectional analysis, but of ‘causal process observations’ evocative of sequential analysis ( Reference Brady and Collier Brady and Collier 2010 ; Reference Mahoney Goertz and Mahoney 2012 ). Third, they may substitute the language of ‘causal inference via concatenation’ – a terminology implying that unobservable causal mechanisms are embedded within a sequence of observable events – for that of ‘causal inference via correlation’, evocative of the frequentist billiard-ball analogy ( Reference Waldner and Kincaid Waldner 2012 : 68). The result is that case study research is increasingly hailed as a “distinctive approach that offers a much richer set of observations, especially about causal processes, than statistical analyses normally allow” ( Reference Hall, Mahoney and Rueschemeyer Hall 2003 : 397).

7.3 Threats to Processual Inference and the Role of Theory

While scholars have shown how process-tracing methods have reconceived the utility of case studies for causal inference, there remains some ambiguity about the implications for case selection, particularly using Millian methods. While several works have touched upon this theme (e.g., Reference Hall, Mahoney and Rueschemeyer Hall 2003 ; Reference George and Bennett George and Bennett 2005 ; Reference Levy Levy 2008 ; Reference Tarrow Tarrow 2010 ), the contribution that most explicitly wrestles with this topic is Reference Falleti, Mahoney, Mahoney and Thelen Falleti and Mahoney (2015) , who acknowledge that “the application of Millian methods for sequential arguments has not been systematically explored, although we believe it is commonly used in practice” ( Reference Falleti, Mahoney, Mahoney and Thelen Falleti and Mahoney 2015 : 226). Falleti and Mahoney argue that process tracing can remedy the weaknesses of Millian approaches: “When used in isolation, the methods of agreement and difference are weak instruments for small-N causal inference … small-N researchers thus normally must combine Millian methods with process tracing or other within-case methods to make a positive case for causality” ( Reference Falleti, Mahoney, Mahoney and Thelen 2015 : 225–226). Their optimism about the synergy between Millian methods and process tracing leads them to conclude that “by fusing these two elements, the comparative sequential method merits the distinction of being the principal overarching methodology for [comparative-historical analysis] in general” ( Reference Falleti, Mahoney, Mahoney and Thelen 2015 : 236).

Falleti and Mahoney’s contribution is the definitive statement of how comparative case study research has long abandoned its Lijphartian origins and fully embraced treating cases as social processes. It is certainly true that process-tracing advocates have shown that some past critiques of Millian methods may not have been as damning as they first appeared. For example, Reference Lieberson Lieberson’s (1994) critique that Millian case selection requires a deterministic understanding of causation has been countered by set-theoretic process tracers who note that causal processes can indeed be conceptualized as concatenations of necessary and sufficient conditions ( Reference Mahoney Goertz and Mahoney 2012 ; Reference Mahoney and Vanderpoel Mahoney and Vanderpoel 2015 ). After all, “at the individual case level, the ex post (objective) probability of a specific outcome occurring is either 1 or 0” ( Reference Mahoney Mahoney 2008 : 415). Even for those who do not explicitly embrace set-theoretic approaches and prefer to perform a series of “process tracing tests” (such as straw-in-the-wind, hoop, smoking gun, and doubly-decisive tests), the objective remains to evaluate the deterministic causal relevance of a historical event on the next linkage in a sequence ( Reference Collier Collier 2011 ; Reference Mahoney Mahoney 2012 ). In this light, Millian methods appear to have been thrown a much-needed lifeline.

Yet processualist researchers have implicitly exposed new, and perhaps more damning, weaknesses in the traditional use of the comparative method. Here, Reference Falleti, Mahoney, Mahoney and Thelen Falleti and Mahoney (2015) are less engaged in highlighting how their focus on comparing within-case sequences should push scholars to revisit strategies for case selection premised on assumptions that process-tracing advocates have undermined. In this light, I begin by outlining three hitherto underappreciated threats to inference associated with the traditional use of Millian case selection: potentially ignoring (1) ordered and (2) paced causal processes, and ignoring (3) the possibility of equifinality. I then demonstrate how risks (1) and (2) can be attenuated deductively by formulating processualist theories and tweaking Millian designs for case selection.

Risk 1: Ignoring Ordered Processes

Process-sensitive social scientists have long noted that “the temporal order of the events in a sequence [can be] causally consequential for the outcome of interest” ( Reference Falleti, Mahoney, Mahoney and Thelen Falleti and Mahoney 2015 : 218; see also Reference Pierson Pierson 2004 : 54–78). For example, where individual acts of agency play a critical role – such as political elites’ response to a violent protest – “reordering can radically change [a] subject’s understanding of the meaning of particular events,” altering their response and the resulting outcomes ( Reference Abbott Abbott 1995 : 97).

An evocative illustration is provided by Reference Sewell Sewell’s (1996) analysis of how the storming of the Bastille in 1789 produced the modern concept of “revolution.” After overrunning the fortress, the crowd freed the few prisoners held within it; shot, stabbed, and beheaded the Bastille’s commander; and paraded his severed head through the streets of Paris ( Reference Sewell Sewell 1996 : 850). When the French National Assembly heard of the taking of the Bastille, it first interpreted the contentious event as “disastrous news” and an “excess of fury”; yet, when the king subsequently responded by retreating his troops to their provincial barracks, the Assembly recognized that the storming of the Bastille had strengthened its hand, and proceeded to reinterpret the event as a patriotic act of protest in support of political change ( Reference Sewell Sewell 1996 : 854–855). The king’s reaction to the Bastille thus bolstered the Assembly’s resolve to “invent” the modern concept of revolution as a “legitimate rising of the sovereign people that transformed the political system of a nation” ( Reference Sewell Sewell 1996 : 854–858). Proceeding counterfactually, had the ordering of events been reversed – had the king withdrawn his troops before the Bastille had been stormed – the National Assembly would have had little reason to interpret the popular uprising as a patriotic act legitimating reform rather than a violent act of barbarism.

Temporal ordering may also alter a social process’s political outcomes through macro-level mechanisms. For example, consider Reference Falleti Falleti’s (2005 , Reference Falleti 2010 ) analysis of the conditions under which state decentralization – the devolution of national powers to subnational administrative bodies – increases local political autonomy in Latin America. Through process tracing, Falleti demonstrates that when fiscal decentralization precedes electoral decentralization, local autonomy is increased, since this sequence endows local districts with the monetary resources necessary to subsequently administer an election effectively. However, when the reverse occurs, such that electoral decentralization precedes fiscal decentralization, local autonomy is compromised. For although the district is being offered the opportunity to hold local elections, it lacks the monetary resources to administer them effectively, endowing the national government with added leverage to impose conditions upon the devolution of fiscal resources.

For our purposes, what is crucial to note is not simply that temporal ordering matters, but that in ordered processes it is not the presence or absence of events that is most consequential for the outcome of interest. For instance, in Falleti’s analysis both fiscal and electoral decentralization occur. This means that a traditional Millian framework risks dismissing some explanatory events as causally irrelevant on the grounds that their presence is insufficient for explicating the outcome of interest (see Figure 7.2 ).

comparative multi case study

Figure 7.2 How ordered processes risk being ignored by a Millian setup

The way to deductively attenuate the foregoing risk is to develop an ordered theory and then modify the traditional Millian setup to assess the effect of ordering on an outcome of interest. That is, deductive theorizing aimed at probing the causal effect of ordering can guide us in constructing an appropriate Millan case selection design, such as that in Figure 7.3 . In this example, we redefine the fourth independent variable to measure not the presence or absence of a fourth event, but rather to measure the ordering of two previously defined events (in this case, events 1 and 2). This case selection setup would be appropriate if deductive theorizing predicts that the outcome of interest is produced when event 1 is followed by event 2 (such that, unless this specific ordering occurs, the presence of events 1 and 2 is insufficient to generate the outcome). In other words, if Millian methods are to be deductively used to select cases for comparison, the way to guard against prematurely dismissing the causal role of temporal ordering is to explicitly theorize said ordering a priori . If this proves difficult, or if the researcher lacks sufficient knowledge to develop such a theory, it is advisable to switch to the more inductive method for case selection outlined in the next section .

comparative multi case study

Figure 7.3 Deductively incorporating ordered processes within a Millian setup

Risk 2: Ignoring Paced Processes

Processualist researchers have also emphasized that, beyond temporal order, “the speed or duration of events … is causally consequential” ( Reference Falleti, Mahoney, Mahoney and Thelen Falleti and Mahoney 2015 : 219). For example, social scientists have long distinguished an “eventful temporality” ( Reference Sewell Sewell 1996 ) from those “big, slow moving” incremental sequences devoid of rapid social change ( Reference Pierson, Mahoney and Rueschemeyer Pierson 2003 ). For historical institutionalists, this distinction is illustrated by “critical junctures” – defined as “relatively short periods of time during which there is a substantially heightened probability that agents’ choices will affect the outcome of interest” ( Reference Capoccia and Kelemen Capoccia and Kelemen 2007 : 348; Reference Capoccia, Mahoney and Thelen Capoccia 2015 : 150–151) – on the one hand, and those “causal forces that develop over an extended period of time,” such as “cumulative” social processes, sequences involving “threshold effects,” and “extended causal chains” on the other hand ( Reference Pierson Pierson 2004 : 82–90; Reference Mahoney, Thelen, Mahoney and Thelen Mahoney and Thelen 2010 ).

An excellent illustration is provided by Reference Beissinger Beissinger (2002) ’s analysis of the contentious events that led to the collapse of the Soviet State. Descriptively, the sequence of events has its origins in the increasing transparency of Soviet institutions and freedom of expression accompanying Gorbachev’s Glasnost ( Reference Beissinger Beissinger 2002 : 47). As internal fissures within the Politburo began to emerge in 1987, Glasnost facilitated media coverage of the split within the Soviet leadership ( Reference Beissinger 2002 : 64). In response, “interactive attempts to contest the state grew regularized and began to influence one another” ( Reference Beissinger 2002 : 74). These challenging acts mobilized around previously dormant national identities, and for the first time – often out of state incompetence – these early protests were not shut down ( Reference Beissinger 2002 : 67). Protests reached a boiling point in early 1989 as the first semicompetitive electoral campaign spurred challengers to mobilize the electorate and cultivate grievances in response to regime efforts to “control nominations and electoral outcomes” ( Reference Beissinger 2002 : 86). By 1990 the Soviet State was crumbling, and “in many parts of the USSR demonstration activity … had become a normal means for dealing with political conflict” ( Reference Beissinger 2002 : 90).

Crucially, Beissinger stresses that to understand the causal dynamics of the Soviet State’s collapse, highlighting the chronology of events is insufficient. The 1987–1990 period comprised a moment of “thickened history” wherein “what takes place … has the potential to move history onto tracks otherwise unimaginable … all within an extremely compressed period of time” ( Reference Beissinger 2002 : 27). Information overload, the density of interaction between diverse social actors, and the diffusion of contention engendered “enormous confusion and division within Soviet institutions,” allowing the hypertrophy of challenging acts to play “an increasingly significant role in their own causal structure” ( Reference Beissinger 2002 : 97, 27). In this light, the temporal compression of a sequence of events can bolster the causal role of human agency and erode the constraints of social structure. Proceeding counterfactually, had the exact same sequence of contentious events unfolded more slowly, it is doubtful that the Soviet State would have suddenly collapsed.

Many examples of how the prolongation of a sequence of events can render them invisible, and thus produce different outcomes, could be referenced. Consider, for example, how global climate change – which is highlighted by Reference Pierson Pierson (2004 : 81) as a prototypical process with prolonged time horizons – conditions the psychological response of social actors. As a report from the American Psychological Association underscores, “climate change that is construed as rapid is more likely to be dreaded,” for “people often apply sharp discounts to costs or benefits that will occur in the future … relative to experiencing them immediately” ( Reference Swim Swim et al. 2009 : 24–25; Reference Loewenstein and Elster Loewenstein and Elster 1992 ). This logic is captured by the metaphor of the “boiling frog”: “place a frog in a pot of cool water, and gradually raise the temperature to boiling, and the frog will remain in the water until it is cooked” ( Reference Boyatzis Boyatzis 2006 : 614).

What is important to note is that, once more, paced processes are not premised on the absence or presence of their constitutive events being causally determinative; rather, they are premised on the duration of events (or their temporal separation) bearing explanatory significance. Hence the traditional approach to case selection risks neglecting the causal impact of temporal duration on the outcome of interest (see Figure 7.4 ).

comparative multi case study

Figure 7.4 Paced processes risk being ignored by a Millian setup

Here, too, the way to deductively assess the causal role of pacing on an outcome of interest is to explicitly develop a paced theory before selecting cases for empirical analysis. On the one hand, we might theorize that it is the duration of a given event that is causally consequential; on the other hand, we might theorize that it is the temporal separation of said event from other events that is significant. Figure 7.5 suggests how a researcher can assess both theories through a revised Millian design. In the first example, we define a fourth independent variable measuring not the presence of a fourth event, but rather the temporal duration of a previously defined event (in this case, event 1). This would be an appropriate case selection design to assess a theory predicting that the outcome of interest occurs when event 1 unfolds over a prolonged period of time (such that if event 1 unfolds more rapidly, its mere occurrence is insufficient for the outcome). In the second example, we define a fourth independent variable measuring the temporal separation between two previously defined events (in this case, events 1 and 2). This would be an appropriate case selection design for a theory predicting that the outcome of interest only occurs when event 1 is temporally distant to event 2 (such that events 1 and 2 are insufficient for the outcome if they are proximate). Again, if the researcher lacks a priori knowledge to theorize how a paced process may be generating the outcome, it is advisable to adopt the inductive method of case selection described in Section 7.4 .

comparative multi case study

Figure 7.5 Deductively incorporating paced processes within a Millian setup

Risk 3: Ignoring Equifinal Causal Processes

Finally, researchers have noted that causal processes may be mired by equifinality: the fact that “multiple combinations of values … produce the same outcome” ( Reference Mahoney Mahoney 2008 : 424; see also Reference George and Bennett George and Bennett 2005 ; Reference Mahoney Goertz and Mahoney 2012 ). More formally, set-theoretic process tracers account for equifinality by emphasizing that, in most circumstances, “necessary” conditions or events are actually INUS conditions – individually necessary components of an unnecessary but sufficient combination of factors ( Reference Mahoney and Vanderpoel Mahoney and Vanderpoel 2015 : 15–18).

One of the reasons why processualist social scientists increasingly take equifinality seriously is the recognition that causal mechanisms may be context-dependent. Sewell’s work stresses that “the consequences of a given act … are not intrinsic to the act but rather will depend on the nature of the social world within which it takes place” ( Reference Sewell Sewell 2005 : 9–10). Similarly, Reference Falleti and Lynch Falleti and Lynch (2009 : 2; 11) argue that “causal effects depend on the interaction of specific mechanisms with aspects of the context within which these mechanisms operate,” hence the necessity of imposing “scope conditions” on theory building. One implication is that the exact same sequence of events in two different settings may produce vastly different causal outcomes. The flip side of this conclusion is that we should not expect a given outcome to always be produced by the same sequence of events.

For example, consider Sewell’s critique of Reference Skocpol Skocpol (1979) ’s States and Social Revolutions for embracing an “experimental temporality.” Skocpol deploys Millian methods of case selection to theorize that the great social revolutions – the French, Russian, and Chinese revolutions – were caused by a conjunction of three necessary conditions: “(1) military backwardness, (2) politically powerful landlord classes, and (3) autonomous peasant communities” ( Reference Sewell Sewell 2005 : 93). Yet to permit comparison, Skocpol assumes that the outcomes of one revolution, and the processes of historical change more generally, have no effect on a subsequent revolution ( Reference Sewell Sewell 2005 : 94–95). This approach amounts to “cutting up the congealed block of historical time into artificially interchangeable units,” ignoring the fatefulness of historical sequences ( Reference Sewell Sewell 2005 ). For example, the Industrial Revolution “intervened” between the French and Russian Revolutions, and consequently one could argue that “the revolt of the Petersburg and Moscow proletariat was a necessary condition for social revolution in Russia in 1917, even if it was not a condition for the French Revolution in 1789” ( Reference Sewell Sewell 2005 : 94–95). What Sewell is emphasizing, in short, is that peasant rebellion is an INUS condition (as is a proletariat uprising), rather than a necessary condition.

Another prominent example of equifinality is outlined by Reference Collier Collier’s (1999 : 5–11) review of the diverse pathways through which democratization occurs. In the elite-driven pathway, emphasized by Reference O’Donnell and Schmitter O’Donnell and Schmitter (1986 ), an internal split amongst authoritarian incumbents emerges; this is followed by liberalizing efforts by some incumbents, which enables the resurrection of civil society and popular mobilization; finally, authoritarian incumbents negotiate a pacted transition with opposition leaders. By contrast, in the working-class-driven pathway, emphasized by Reference Rueschemeyer, Stephens and Stephens Rueschemeyer, Stephens, and Stephens (1992) , a shift in the material balance of power in favor of the democracy-demanding working class and against the democracy-resisting landed aristocracy causes the former to overpower the latter, and via a democratic revolution from below a regime transition occurs. Crucially, Reference Collier Collier (1999 : 12) emphasizes that these two pathways need not be contradictory (or exhaustive): the elite-driven pathway appears more common in the Latin American context during the second wave of democratization, whereas the working-class-driven pathway appears more common in Europe during the first wave of democratization.

What is crucial is that Millian case selection is premised on there being a single cause underlying the outcome of interest. As a result, Millian methods risk dismissing a set of events as causally irrelevant ex ante in one case simply because that same set of events fails to produce the outcome in another case (see Figure 7.6 ). Unlike ordered and paced processes, there is no clear way to leverage deductive theorizing to reconfigure Millian methods for case selection and accommodate equifinality. However, I argue that the presence of equifinal pathways can be fruitfully probed if we embrace a more inductive approach to comparative case selection, as the next section outlines.

comparative multi case study

Figure 7.6 Equifinal causal processes risk being ignored by a Millian setup

7.4 A New Approach: The Method of Inductive Case Selection

If a researcher wishes to guard against ignoring consequential temporal dynamics but lacks the a priori knowledge necessary to develop a processual theory and tailor their case selection strategy, is there an alternative path forward? Yes, indeed: I suggest that researchers could wield most-similar or most-different cases designs to (1) probe causal generalizability, (2) reveal scope conditions, and (3) explore the presence of equifinality. Footnote 5 To walk through this more inductive case selection approach, I engage some case studies from development practice to illustrate how researchers and practitioners alike could implement and benefit from the method.

7.4.1 Tempering the Deductive Use of Millian Methods

To begin, one means to ensure against a Millian case selection design overlooking an ordered, paced, or equifinal causal process (in the absence of deductive theorizing) is to be wary of leveraging the methods of agreement and difference to eliminate potential explanatory factors ( Reference Falleti, Mahoney, Mahoney and Thelen Falleti and Mahoney 2015 : 225–226). That is, the decision to discard an explanatory variable or historical event as causally unnecessary (via the method of agreement) or insufficient (via the method of difference) may be remanded to the process-tracing stage, rather than being made ex ante at the case selection stage.

Notice how this recommendation is particularly intuitive in light of the advances in process-tracing methods. Before this burgeoning literature existed, Millian methods were called upon to accomplish two things at once: (1) provide a justification for selecting two or more cases for social inquiry, and (2) yield causal leverage via comparison and the elimination of potential explanatory factors as unnecessary or insufficient. But process-tracing methodologists have showcased how the analysis of temporal variation disciplined via counterfactual analysis, congruence testing, and process-tracing tests renders within-case causal inference possible even in the absence of an empirical comparative case ( Reference George and Bennett George and Bennett 2005 ; Reference Gerring Gerring 2007 ; Reference Collier Collier 2011 ; Reference Mahoney Mahoney 2012 ; Reference Beach and Pedersen Beach and Pedersen 2013 ; Reference Bennett and Checkel Bennett and Checkel 2015 ; Reference Levy Levy 2015 ). That is, the ability to make causal inferences need not be primarily determined at the case selection stage.

The foregoing implies that if a researcher does not take temporal dynamics into account when developing their theory, the use of Millian methods should do no more than to provisionally discount the explanatory purchase of a given explanatory factor. The researcher should then bear in mind that as the causal process is reconstructed from a given outcome, the provisionally discounted factor may nonetheless be shown to be of causal relevance – particularly if the underlying process is ordered or paced, or if equifinal pathways are possible.

Despite these limitations, Millian methods might fruitfully serve additional functions from the standpoint of case selection, particularly if researchers shift (1) when and (2) why they make use of them. First, Millian methods may be as – if not more – useful after process tracing of a particular case is completed rather than to set the stage for within-case analysis. Such a chronological reversal – process tracing followed by Millian case selection, instead of Millian case selection followed by process tracing – inherently embraces a more inductive, theory-building approach to case study research ( Reference Falleti, Mahoney, Mahoney and Thelen Falleti and Mahoney 2015 : 229–231) which, I suspect, is far more commonly used in practice than is acknowledged. I refer to this approach as the method of inductive case selection , wherein “theory-building process tracing” ( Reference Beach and Pedersen Beach and Pedersen 2013 : 16–18) of a single case is subsequently followed by the use of a most-similar or most-different cases design.

7.4.2 Getting Started: Selecting the Initial Case

The method of inductive case selection begins by assuming that the researcher has justifiable reasons for picking a particular case for process tracing and is subsequently looking to contextualize the findings or build a theory outwards. Hence, the first step involves picking an initial case. Qualitative methodologists have already supplied a number of plausible logics for selecting a single case, and I describe three nonexhaustive possibilities here: (1) theoretical or historical importance; (2) policy relevance and salience; and (3) empirically puzzling nature.

First, an initial case may be selected due to its theoretical or historical importance. Reference Eckstein, Greenstein and Polsby Eckstein (1975) , for example, defines an idiographic case study as a case where the specific empirical events/outcome serve as a central referent for a scholarly literature. As an illustration, Reference Gerring and Cojocaru Gerring and Cojocaru (2015 : 11) point to Reference North and Weingast North and Weingast (1989) ’s influential study of how the Glorious Revolution in seventeenth-century Britain favorably shifted the constitutional balance of power for the government to make credible commitments to protecting property rights (paving the way for the financial revolution of the early eighteenth century). Given that so much of the scholarly debate amongst economic historians centers on the institutional foundations of economic growth, North and Weingast’s case study was “chosen (it would appear) because of its central importance in the [historical political economy] literature on the topic, and because it is … a prominent and much-studied case” ( Reference Gerring and Cojocaru Gerring and Cojocaru 2015 : 11). In other words, Reference North and Weingast North and Weingast (1989) ’s study is idiographic in that it “aim[s] to explain and/or interpret a single historical episode,” but it remains “theory-guided” in that it “focuses attention on some theoretically specified aspects of reality and neglects others” ( Reference Levy Levy 2008 : 4).

While the causes of the Glorious Revolution are a much-debated topic amongst economic historians, they have less relevance to researchers and practitioners focused on assessing the effects of contemporary public policy interventions. Hence, a second logic for picking a first case for process tracing is its policy relevance and salience. Reference George and Bennett George and Bennett (2005 : 263–286) define a policy-relevant case study as one where the outcome is of interest to policy-makers and its causes are at least partially amenable to policy manipulation. For example, one recent World Bank case study ( Reference El-Saharty and Nagaraj El-Saharty and Nagaraj 2015 ) analyzes how HIV/AIDS prevalence amongst vulnerable subpopulations – particularly female sex workers – can be reduced via targeted service delivery. To study this outcome, two states in India – Andhra Pradesh and Karnataka – were selected for process tracing. There are three reasons why this constitutes an appropriate policy-relevant case selection choice. First, the outcome of interest – a decline in HIV/AIDS prevalence amongst female sex workers – was present in both Indian states. Second, because India accounts for almost 17.5 percent of the world population and has a large population of female sex workers, this outcome was salient to the government ( Reference El-Saharty and Nagaraj El-Saharty and Nagaraj 2015 : 3). Third, the Indian government had created a four-phase National AIDS Control Program (NACP) spanning from 1986 through 2017, meaning that at least one set of possible explanatory factors for the decline in HIV/AIDS prevalence comprised policy interventions that could be manipulated. Footnote 6

A third logic for picking an initial case for process tracing is its puzzling empirical nature. One obvious instantiation is when an exogenous shock or otherwise significant event/policy intervention yields a different outcome from the one scholars and practitioners expected. Footnote 7 For example, in 2004 the federal government of Nigeria partnered with the World Bank to improve the share of Nigeria’s urban population with access to piped drinking water. This partnership – the National Urban Water Sector Reform Project (NUWSRP1) – aimed to “increase access to piped water supply in selected urban areas by improving the reliability and financial viability of selected urban water utilities” and by shifting resources away from “infrastructure rehabilitation” that had failed in the past ( Reference Hima and Santibanez Hima and Santibanez 2015 : 2). Despite $200 million worth of investments, ultimately the NUWSRP1 “did not perform as strongly on the institutional reforms needed to ensure sustainability” ( Reference Hima and Santibanez Hima and Santibanez 2015 ). Given this puzzling outcome, the World Bank conducted an intensive case study to ask why the program did “not fully meet its essential objective of achieving a sustainable water delivery service” ( Reference Hima and Santibanez Hima and Santibanez 2015 ). Footnote 8

The common thread of these three logics for selecting an initial case is that the case itself is theoretically or substantively important and that its empirical dynamics – underlying either the outcome itself or its relationship to some explanatory events – are not well understood. That being said, the method of inductive case selection merely presumes that there is some theoretical, policy-related, empirical, or normative justification to pick the initial case.

7.4.3 Probing Generalizability Via a Most-Similar Cases Design

It is after picking an initial case that the method of inductive case selection contributes novel guidelines for case study researchers by reconfiguring how Millian methods are used. Namely, how should one (or more) additional cases be selected for comparison, and why? This question presumes that the researcher wishes to move beyond an idiographic, single-case study for the purposes of generating inferences that can travel. Yet in this effort, we should take seriously process-tracing scholars’ argument that causal mechanisms are often context-dependent. As a result, the selection of one or more comparative cases is not meant to uncover universally generalizable abstractions; rather, it is meant to contextualize the initial case within a set or family of cases that are spatiotemporally bounded.

That being said, the first logical step is to understand whether the causal inferences yielded by the process-traced case can indeed travel to other contexts ( Reference Goertz Goertz 2017 : 239). This constitutes the first reconfiguration of Millian methods: the use of comparative case studies to assess generalizability. Specifically, after within-case process tracing reveals a factor or sequence of factors as causally important to an outcome of interest, the logic is to select a case that is as contextually analogous as possible such that there is a higher probability that the causal process will operate similarly in the second case. This approach exploits the context-dependence of causal mechanisms to the researcher’s advantage: Similarity of context increases the probability that a causal mechanism will operate similarly across both cases. By “context,” it is useful to follow Reference Falleti and Lynch Falleti and Lynch (2009 : 14) and to be

concerned with a variety of contextual layers: those that are quite proximate to the input (e.g., in a study of the emergence of radical right-wing parties, one such layer might be the electoral system); exogenous shocks quite distant from the input that might nevertheless effect the functioning of the mechanism and, hence, the outcome (e.g., a rise in the price of oil that slows the economy and makes voters more sensitive to higher taxes); and the middle-range context that is neither completely exogenous nor tightly coupled to the input and so may include other relevant institutions and structures (the tax system, social solidarity) as well as more atmospheric conditions, such as rates of economic growth, flows of immigrants, trends in partisan identification, and the like.

For this approach to yield valuable insights, the researcher focuses on ‘controlling’ for as many of these contextual explanatory factors (crudely put, for as many independent variables) as possible. In other words, the researcher selects a most-similar case: if the causal chain similarly operates in the second case, this would support the conclusion that the causal process is likely at work across the constellation of cases bearing ‘family resemblances’ to the process-traced case ( Reference Soifer Soifer 2020 ). Figure 7.7 displays the logic of this design:

comparative multi case study

Figure 7.7 Probing generalizability by selecting a most-similar case

As in Figure 7.7 , suppose that process tracing of Case 1 reveals that some sequence of events (in this example, event 4 followed by event 5) caused the outcome of interest. The researcher would then select a most-similar case (a case with similar values/occurrences of other independent variables/events (here, IV1–IV3) that might also influence the outcome). The researcher would then scout whether the sequence in Case 1 (event 4 followed by event 5) also occurs in the comparative case. If it does, the expectation for a minimally generalizable theory is that it would produce a similar outcome in Case 2 as in Case 1. Correlatively, if the sequence does not occur in Case 2, the expectation is that it would not experience the same outcome as Case 1. These findings would provide evidence that the explanatory sequence (event 4 followed by event 5) has causal power that is generalizable across a set of cases bearing family resemblances.

For example, suppose a researcher studying democratization in Country A finds evidence congruent with the elite-centric theory of democratization of Reference O’Donnell and Schmitter O’Donnell and Schmitter (1986 ) described previously. To assess causal generalizability, the researcher would subsequently select a case – Country B – that is similar in the background conditions that the literature has shown to be conducive to democratization, such as level of GDP per capita ( Reference Przeworski and Limongi Przeworski and Limongi 1997 ; Reference Boix and Stokes Boix and Stokes 2003 ) or belonging to the same “wave” of democratization via spatial and temporal proximity ( Reference Collier, Rustow and Erickson Collier 1991 ; Reference Huntington Huntington 1993 ). Notice that these background conditions in Case B have to be at least partially exogenous to the causal process whose generalizability is being probed – that is, they cannot constitute the events that directly comprise the causal chain revealed in Case A. One way to think about them is as factors that in Case A appear to have been necessary, but less proximate and important, conditions for the outcome. Here, importance is determined by the “extent that they are [logically/counterfactually] present only when the outcome is present” ( Reference Mahoney, Kimball and Koivu Mahoney et al. 2009 : 119), whereas proximity is determined by the degree to which the condition is “tightly coupled” with the chain of events directly producing the outcome ( Reference Falleti, Mahoney, Mahoney and Thelen Falleti and Mahoney 2015 : 233).

An example related to the impact of service delivery in developmental contexts can be drawn from the World Bank’s case study of HIV/AIDS interventions in India. Recall that this case study actually spans across two states: Andhra Pradesh and Karnataka. In a traditional comparative case study setup, the selection of both cases would seem to yield limited insights. After all, they are contextually similar: “Andhra Pradesh and Karnataka … represent the epicenter of the HIV/AIDS epidemic in India. In addition, they were early adopters of the targeted interventions”; and they also experience a similar outcome: “HIV/AIDS prevalence among female sex workers declined from 20 percent to 7 percent in Andhra Pradesh and from 15 percent to 5 percent in Karnataka between 2003 and 2011” ( Reference El-Saharty and Nagaraj El-Saharty and Nagaraj 2015 : 7; 3). In truth, this comparative case study design makes substantial sense: had the researchers focused on the impact of the Indian government’s NACP program only in Andhra Pradesh or only in Karnataka, one might have argued that there was something unique about either state that rendered it impossible to generalize the causal inferences. By instead demonstrating that favorable public health outcomes can be traced to the NACP program in both states, the researchers can support the argument that the intervention would likely prove successful in other contexts to the extent that they are similar to Andhra Pradesh and Karnataka.

One risk of the foregoing approach is highlighted by Reference Sewell Sewell (2005 : 95–96): contextual similarity may suggest cross-case interactions that hamper the ability to treat the second, most-similar case as if it were independent of the process-traced case. For example, an extensive body of research has underscored how protests often diffuse across proximate spatiotemporal contexts through mimicry and the modularity of repertoires of contention ( Reference Tilly Tilly 1995 ; Reference Tarrow Tarrow 1998 ). And, returning to the World Bank case study of HIV/AIDS interventions in Andhra Pradesh and Karnataka, one concern is that because these states share a common border, cross-state learning or other interactions might limit the value-added of a comparative design over a single case study, since the second case may not constitute truly new data. The researcher should be highly sensitive to this possibility when selecting and subsequently process tracing the most-similar case: the greater the likelihood of cross-case interactions, the lesser the likelihood that it is a case-specific causal process – as opposed to cross-case diffusion mechanism – that is doing most of the explanatory work.

Conversely, if the causal chain is found to operate differently in the second, most-similar case, then the researcher can make an argument for rejecting the generalizability of the causal explanation with some confidence. The conclusion would be that the causal process is sui generis and requires the “localization” of the theoretical explanation for the outcome of interest ( Reference Tarrow Tarrow 2010 : 251–252). In short, this would suggest that the process-traced case is an exceptional or deviant case, given a lack of causal generalizability even to cases bearing strong family resemblances. Here, we are using the ‘strong’ notion of ‘deviant’: the inability of a causal process to generalize to similar contexts substantially decreases the likelihood that “other cases” could be explained with reference to (or even in opposition to) the process-traced case.

There is, of course, the risk that by getting mired in the weeds of the first case, the researcher is unable to recognize how the overall chronology of events and causal logics in the most-similar case strongly resembles the process-traced case. That is, a null finding of generalizability in a most-similar context calls on the researcher to probe whether they have descended too far down the “ladder of generality,” requiring more abstract conceptual categories to compare effectively ( Reference Sartori Sartori 1970 ; Reference Collier and Levitsky Collier and Levitsky 1997 ).

7.4.4 Probing Scope Conditions and Equifinality Via a Most-Different Cases Design

A researcher that has process-traced a given case and revealed a factor or sequence of factors as causally relevant may also benefit from leveraging a most-different cases approach. This case selection technique yields complementary insights to the most-similar cases design described in the previous section , but its focus is altogether different: instead of uncovering the degree to which an identified causal process travels, the objective is to try to understand where and why it fails to travel and whether alternative pathways to the same outcome may be possible.

More precisely, by selecting a case that differs substantially from the process-traced case in background characteristics, the researcher maximizes contextual heterogeneity and the likelihood that the causal process will not generalize to the second case ( Reference Soifer Soifer 2020 ). Put differently, the scholar would be selecting a least-likely case for generalizability, because the context-dependence of causal mechanisms renders it unlikely that the same sequence of events will generate the same outcome in the second case. This would offer a first cut at establishing “scope conditions” upon the generalizability of the theory ( Reference Tarrow Tarrow 2010 : 251) by isolating which contextual factors prevented the process from producing the outcome in the most-different case.

Figure 7.8 provides a visual illustration of what this design could look like. Suppose, once more, that process tracing in Case 1 has revealed that some event 4 followed by event 5 generated the outcome of interest. To maximize the probability that we will be able to place scope conditions on this finding, we would select a comparative case that is most different to the process-traced case (a case with different values/occurrences of other independent variables/events [denoted as IV1–IV3 in Figure 7.8 ] that might also influence the outcome) but which also experienced the sequence of event 4 followed by event 5. Given the contextual differences between these two cases, the likelihood that the same sequence will produce the same outcome in both is low, which then opens up opportunities for the researcher to probe the logic of scope conditions. In this endeavor, temporality can serve as a useful guide: a means for restricting the set of potential contextual factors that prevented the causal process from reproducing the outcome in Case 2 is to identify at what chronological point the linkages between events 4 and 5 on the one hand and the outcome of interest on the other hand branched off from the way they unfolded in Case 1. The researcher can then scout which contextual factors exuded the greatest influence at that temporal location and identify them as central to the scope conditions to be placed upon the findings.

comparative multi case study

Figure 7.8 Probing scope conditions by selecting a most-different case

To provide an example for how this logic of inquiry can work, consider a recent case study focused on understanding the effectiveness of Mexico’s conditional cash transfer program – Opportunitades , the first program of its kind – in providing monetary support to the female heads of Indigenous households ( Reference Alva Estrabridis and Ortega Nieto Alva Estrabridis and Ortega Nieto 2015 ). The program suffered from the fact that Indigenous beneficiaries dropped out at higher rates than their non-Indigenous counterparts. In 2009 the World Bank spearheaded an Indigenous Peoples Plan (IPP) to bolster service delivery of cash transfers to Indigenous populations, which crucially included “catering to indigenous peoples in their native languages and disseminating information in their languages” ( Reference Alva Estrabridis and Ortega Nieto Alva Estrabridis and Ortega Nieto 2015 : 2). A subsequent impact evaluation found that “[w]hen program messages were offered in beneficiaries’ mother tongues, they were more convincing, and beneficiaries tended to participate and express themselves more actively” ( Reference Alva Estrabridis and Ortega Nieto Alva Estrabridis and Ortega Nieto 2015 ; Reference Mir, Gámez, Loyola, Martí and Veraza Mir et al. 2011 ).

Researchers might well be interested in the portability of the foregoing finding, in which case the previously described most-similar cases design is appropriate – for example, a comparison with the Familias en Accion program in Colombia may be undertaken ( Reference Attanasio, Battistin, Fitzsimons, Mesnard and Vera-Hernandez. Attanasio et al. 2005 ). But they might also be interested in the limits of the policy intervention – in understanding where and why it is unlikely to yield similar outcomes. To assess the scope conditions upon the “bilingualism” effect of cash transfer programs, a most-different cases design is appropriate. Thankfully, conditional cash transfer programs are increasingly common even in historical, cultural, and linguistic contexts markedly different from Mexico, most prominently in sub-Saharan Africa ( Reference Lagarde, Haines and Palmer Lagarde et al. 2007 ; Reference Garcia and Moore Garcia and Moore 2012 ). Selecting a comparative case from sub-Saharan Africa should prove effective for probing scope conditions: the more divergent the contextual factors, the less likely it is that the policy intervention will produce the same outcome in both contexts.

On the flip side, in the unlikely event that part or all of the causal process is nonetheless reproduced in the most-different case, the researcher would obtain a strong signal that they have identified one of those rare causal explanations of general scope. In coming to this conclusion, however, the researcher should be wary of “conceptual stretching” ( Reference Sartori Sartori 1970 : 1034), such that there is confidence that the similarity in the causal chain across the most-different cases lies at the empirical level and is not an artificial by-product of imprecise conceptual categories ( Reference Bennett and Checkel Bennett and Checkel 2015 : 10–11). Here process tracing, by pushing researchers to not only specify a sequence of “tightly-coupled” events ( Reference Falleti, Mahoney, Mahoney and Thelen Falleti and Mahoney 2015 : 233), but also to collect observable implications about the causal mechanisms concatenating these events, can guard against conceptual stretching. By opening the “black box” of causation through detailed within-case analysis, process tracing limits the researcher’s ability to posit “pseudo-equivalences” across contexts ( Reference Sartori Sartori 1970 : 1035).

Selecting a most-different case vis-à-vis the process-traced case is also an excellent strategy for probing equifinality – for maximizing the likelihood that the scholar will be able to probe multiple causal pathways to the same outcome. To do so, it is not sufficient to merely ensure divergence in background conditions; it is equally necessary to follow Mill’s method of agreement by ensuring that the outcome in the process-traced case is also present in the second, most-different case. By ensuring minimal variation in outcome, the scholar guarantees that process tracing the second case will lead to the desired destination; by ensuring maximal variation in background conditions, the scholar substantially increases the likelihood that process tracing will reveal a slightly or significantly different causal pathway to said destination. Should an alternative route to the outcome be found, then its generalizability could be assessed using the most-similar cases approach described previously.

Figure 7.9 visualizes what this case selection design might look like. Here, as in previous examples, suppose process tracing in Case 1 provides evidence that event 4 followed by event 5 produced the outcome of interest. The researcher then selects a case with the same outcome, but with different values/occurrences of some independent variables/events (in this case, IV1–IV3) that may influence the outcome. Working backwards from the outcome to reconstruct the causal chain that produced it, the researcher then probes whether (i) the sequence (event 4 followed by event 5) also occurred in Case 2, and (ii) whether the outcome of interest can be retraced to said sequence. Given the contextual dissimilarities between these most-different cases, such a finding is rather unlikely, which would subsequently enable to the researcher to probe whether some other factor (perhaps IV2/event 2 in the example of Figure 7.9 ) produced the outcome in the comparative case instead, which would comprise clear evidence of equifinality.

comparative multi case study

Figure 7.9 Probing equifinality by selecting a most-different case with the same outcome

To return to the concrete example of Mexico’s conditional cash transfer program’s successful outreach to marginalized populations via bilingual service provision, an alternative route to the same outcome might be unearthed if a cash transfer program without bilingual outreach implemented in a country characterized by different linguistic, gender, and financial decision-making norms proves similarly successful in targeting marginalized populations. Several factors – including recruitment procedures, the size of the cash transfers, the requirements for participation, and the supply of other benefits ( Reference Lagarde, Haines and Palmer Lagarde et al. 2007 : 1902) – could interact with the different setting to produce similar intervention outcomes, regardless of whether multilingual services are provided. Such a finding would suggest that these policy interventions can be designed in multiple ways and still prove effective.

To conclude, the method of inductive case selection complements within-case analysis by supplying a coherent logic for probing generalizability, scope conditions, and equifinality. To summarize, Figure 7.10 provides a roadmap of this approach to comparative case selection.

comparative multi case study

Figure 7.10 Case selection roadmap to assess generalizability, scope conditions, equifinality

In short, if the researcher has the requisite time and resources, a multistage use of Millian methods to conduct four comparative case studies could prove very fertile. The researcher would begin by selecting a second, most-similar case to assess causal generalizability to a family of cases similar to the process-traced case; subsequently, a third, most-different case would be selected to surface possible scope conditions blocking the portability of the theory to divergent contexts; and a fourth, most-different case experiencing the same outcome would be picked to probe equifinal pathways. This sequential, four-case comparison would substantially improve the researcher’s ability to map the portability and contours of both their empirical analysis and their theoretical claims. Footnote 9

7.5 Conclusion

The method of inductive case selection converts process tracing meant to simply “craft a minimally sufficient explanation of a particular outcome” into a methodology used to build and refine a causal theory – a form of “theory-building process-tracing” ( Reference Beach and Pedersen Beach and Pedersen 2013 : 16–18). Millian methods are called upon to probe the portability of a particular causal process or causal mechanism and to specify the logics of its relative contextual-dependence. In so doing, they enable theory-building without presuming that the case study researcher holds the a priori knowledge necessary to account for complex temporal dynamics at the deductive theorizing stage. Both of these approaches – deductive, processualist theorizing on the one hand, and the method of inductive case selection on the other hand – provide some insurance against Millian methods leading the researcher into ignoring the ordered, paced, or equifinal structure that may underlie the pathway(s) to the outcome of interest. But, I would argue, the more inductive approach is uniquely suited for research that is not only process-sensitive, but also open to novel insights supplied by the empirical world that may not be captured by existing theories.

Furthermore, case study research often does (and should!) proceed with the scholar outlining why an outcome is of interest, and then seeking ways to not only make inferences about what produced said outcome (via process tracing) but situating it within a broader empirical and theoretical landscape (via the method of inductive case selection). This approach pushes scholars to answer that pesky yet fundamental question – why should we care or be interested in this case/outcome? – before disciplining their drive for generalizable causal inferences. After all, the deductive use of Millian methods tells us nothing about why we should care about the cases selected, yet arguably this is an essential component of any case selection justification. By deploying a most-similar or most-different cases design after an initial case has been justifiably selected due to its theoretical or historical importance, policy relevance, or puzzling empirical nature, the researcher is nudged toward undertaking case study research yielding causal theories that are not only comparatively engaged, but also substantively interesting.

The method of inductive case selection is most useful when the foregoing approach constitutes the esprit of the case study researcher. Undoubtedly, deductively oriented case study research (see Reference Lieberman Lieberman 2005 ; Reference Lieberman, Mahoney and Thelen 2015 ) and traditional uses of Millian methods will continue to contribute to social scientific understanding. Nevertheless, the perils of ignoring important sequential causal dynamics – particularly in the absence of good, processualist theories – should caution researchers to proceed with the greatest of care. In particular, researchers should be willing to revise both theory building and research design to its more inductive variant should process tracing reveal temporal sequences that eschew the analytic possibilities of the traditional comparative method.

I would like to thank Jennifer Widner and Michael Woolcock for the invitation to write this chapter, and Daniel Ortega Nieto for pointing me to case studies conducted by the World Bank’s Global Delivery Initiative that I use as illustrative examples, as well as Jack Levy, Hillel Soifer, Andrew Moravcsik, Cassandra Emmons, Rory Truex, Dan Tavana, Manuel Vogt, and Killian Clarke for constructive feedback.

1 See, for example, Reference Przeworski and Teune Przeworski and Teune (1970) , Reference Lijphart Lijphart (1971) , Reference Eckstein, Greenstein and Polsby Eckstein (1975) , Reference Yin Yin (1984) , Reference Geddes Geddes (1990) , Reference Collier and Finifter Collier (1993) , Reference Faure Faure (1994) , Reference George and Bennett George and Bennett (2005) , Reference Flyvbjerg Flyvbjerg (2006) , Reference Levy Levy (2008) , Reference Seawright and Gerring Seawright and Gerring (2008) , Reference Gerring Gerring (2007) , Reference Brady and Collier Brady and Collier (2010) , and Reference Tarrow Tarrow (2010) .

2 Some scholars, such as Reference Faure Faure (1994) , distinguish Mill’s dependent-variable driven methods of agreement and difference from the independent-variable driven most-similar and most-different systems designs, suggesting they are distinct. But because, as Figure 7.1 shows, Mill’s dependent-variable driven methods also impose requirements on the array of independent variables to permit causal inference via exclusion, this distinction is not particularly fertile.

3 In Mill’s method of difference, factors present in both cases are eliminated for being insufficient for the outcome (in the method of agreement, factors that vary across the cases are eliminated for being unnecessary).

4 Note that Mill himself distinguished between deductively assessing the average “effect of causes” and inductively retracing the “causes of effects” using the methods of agreement and disagreement ( Reference Mill Mill 1843 [1974] , pp. 449, 764).

5 The proposed approach bears several similarities to Reference Soifer Soifer’s (2020) fertile analysis of how “shadow cases” in comparative research can contribute to theory-building and empirical analysis.

6 This study found that the expansion of clinical services into government facilities embedded in the public health system, the introduction of peer educators, and the harmonization of large quantities of public health data underlay the timing and breadth of the decline in HIV/AIDS amongst female sex workers.

7 What Reference Levy Levy (2008 :13) calls a “deviant” case – which “focus[es] on observed empirical anomalies in existing theoretical propositions” – would also fit within the category of a puzzling case.

8 Process tracing revealed that a conjunction of factors – management turnover and a lackluster culture of staff performance at the state level, inadequate coordination at the federal level, premature disbursement of funds, and citizen aversion to the commercialization of the public water supply – underlay the initially perplexing underperformance of the urban water delivery project.

9 Many thanks to Rory Truex for highlighting this implication of the roadmap in Figure 7.5 .

Figure 0

Save book to Kindle

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle .

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service .

  • Selecting Cases for Comparative Sequential Analysis
  • By Tommaso Pavone
  • Edited by Jennifer Widner , Princeton University, New Jersey , Michael Woolcock , Daniel Ortega Nieto
  • Book: The Case for Case Studies
  • Online publication: 05 May 2022
  • Chapter DOI: https://doi.org/10.1017/9781108688253.008

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox .

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive .

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 25 February 2020

Writing impact case studies: a comparative study of high-scoring and low-scoring case studies from REF2014

  • Bella Reichard   ORCID: orcid.org/0000-0001-5057-4019 1 ,
  • Mark S Reed 1 ,
  • Jenn Chubb 2 ,
  • Ged Hall   ORCID: orcid.org/0000-0003-0815-2925 3 ,
  • Lucy Jowett   ORCID: orcid.org/0000-0001-7536-3429 4 ,
  • Alisha Peart 4 &
  • Andrea Whittle 1  

Palgrave Communications volume  6 , Article number:  31 ( 2020 ) Cite this article

24k Accesses

14 Citations

82 Altmetric

Metrics details

  • Language and linguistics

This paper reports on two studies that used qualitative thematic and quantitative linguistic analysis, respectively, to assess the content and language of the largest ever sample of graded research impact case studies, from the UK Research Excellence Framework 2014 (REF). The paper provides the first empirical evidence across disciplinary main panels of statistically significant linguistic differences between high- versus low-scoring case studies, suggesting that implicit rules linked to written style may have contributed to scores alongside the published criteria on the significance, reach and attribution of impact. High-scoring case studies were more likely to provide specific and high-magnitude articulations of significance and reach than low-scoring cases. High-scoring case studies contained attributional phrases which were more likely to attribute research and/or pathways to impact, and they were written more coherently (containing more explicit causal connections between ideas and more logical connectives) than low-scoring cases. High-scoring case studies appear to have conformed to a distinctive new genre of writing, which was clear and direct, and often simplified in its representation of causality between research and impact, and less likely to contain expressions of uncertainty than typically associated with academic writing. High-scoring case studies in two Main Panels were significantly easier to read than low-scoring cases on the Flesch Reading Ease measure, although both high-scoring and low-scoring cases tended to be of “graduate” reading difficulty. The findings of our work enable impact case study authors to better understand the genre and make content and language choices that communicate their impact as effectively as possible. While directly relevant to the assessment of impact in the UK’s Research Excellence Framework, the work also provides insights of relevance to institutions internationally who are designing evaluation frameworks for research impact.

Similar content being viewed by others

comparative multi case study

Research impact evaluation and academic discourse

comparative multi case study

Demystifying the process of scholarly peer-review: an autoethnographic investigation of feedback literacy of two award-winning peer reviewers

comparative multi case study

Aspiring to greater intellectual humility in science

Introduction.

Academics are under increasing pressure to engage with non-academic actors to generate “usable” knowledge that benefits society and addresses global challenges (Clark et al., 2016 ; Lemos, 2015 ; Rau et al., 2018 ). This is largely driven by funders and governments that seek to justify the societal value of public funding for research (Reed et al., 2020 ; Smith et al., 2011 ) often characterised as ‘impact’. While this has sometimes been defined narrowly as reflective of the need to demonstrate a return on public investment in research (Mårtensson et al., 2016 ; Tsey et al., 2016 ; Warry, 2006 ), there is also a growing interest in the evaluation of “broader impacts” from research (cf. Bozeman and Youtie, 2017 ; National Science Foundation, 2014 ), including less tangible but arguably equally relevant benefits for society and culture. This shift is exemplified by the assessment of impact in the UK’s Research Excellence Framework (REF) in 2014 and 2021, the system for assessing the quality of research in UK higher education institutions, and in the rise of similar policies and evaluation systems in Australia, Hong Kong, the United States, Horizon Europe, The Netherlands, Sweden, Italy, Spain and elsewhere (Reed et al., 2020 ).

The evaluation of research impact in the UK has been criticised by scholars largely for its association with a ‘market logic’ (Olssen and Peters, 2005 ; Rhoads and Torres, 2005 ). Critics argue that a focus of academic performativity can be seen to “destabilise” professional identities (Chubb and Watermeyer, 2017 ), which in the context of research impact evaluation can further “dehumanise and deprofessionalise” academic performance (Watermeyer, 2019 ), whilst leading to negative unintended consequences (which Derrick et al., 2018 , called “grimpact”). MacDonald ( 2017 ), Chubb and Reed ( 2018 ) and Weinstein et al. ( 2019 ) reported concerns from researchers that the impact agenda may be distorting research priorities, “encourag[ing] less discovery-led research” (Weinstein et al., 2019 , p. 94), though these concerns were questioned by University managers in the same study who were reported to “not have enough evidence to support that REF was driving specific research agendas in either direction” (p. 94), and further questioned by Hill ( 2016 ).

Responses to this critique have been varied. Some have called for civil disobedience (Watermeyer, 2019 ) and organised resistance (Back, 2015 ; MacDonald, 2017 ) against the impact agenda. In a review of Watermeyer ( 2019 ), Reed ( 2019 ) suggested that attitudes towards the neoliberal political roots of the impact agenda may vary according to the (political) values and beliefs of researchers, leading them to pursue impacts that either support or oppose neoliberal political and corporate interests. Some have defended the benefits of research impact evaluation. For example, Weinstein et al. ( 2019 ) found that “a focus on changing the culture outside of academia is broadly valued” by academics and managers. The impact agenda might enhance stakeholder engagement (Hill, 2016 ) and give “new currency” to applied research (Chubb, 2017 ; Watermeyer, 2019 ). Others have highlighted the long-term benefits for society of incentivising research impact, including increased public support and funding for a more accountable, outward-facing research system (Chubb and Reed, 2017 ; Hill, 2016 ; Nesta, 2018 ; Oancea, 2010 , 2014 ; Wilsdon et al., 2015 ).

In the UK REF, research outputs and impact are peer reviewed at disciplinary level in ‘Units of Assessment’ (36 in 2014, 34 in 2021), grouped into four ‘Main Panels’. Impact is assessed through case studies that describe the effects of academic research and are given a score between 1* (“recognised but modest”) and 4* (“outstanding”). The case studies follow a set structure of five sections: 1—Summary of the impact; 2—Underpinning research; 3—References to the research; 4—Details of the impact; 5—Sources to corroborate the impact (HEFCE, 2011 ). The publication of over 6000 impact case studies in 2014 Footnote 1 by Research England (formerly Higher Education Funding Council for England, HEFCE) was unique in terms of its size, and unlike the recent selective publication of high-scoring case studies from Australia’s 2018 Engagement and Impact Assessment, both high-scoring and low-scoring case studies were published. This provides a unique opportunity to evaluate the construction of case studies that were perceived by evaluation panels to have successfully demonstrated impact, as evidenced by a 4* rating, and to compare these to case studies that were judged as less successful.

The analysis of case studies included in this research is based on the definition of impact used in REF2014, as “an effect on, change or benefit to the economy, society, culture, public policy or services, health, the environment or quality of life, beyond academia” (HEFCE, 2011 , p. 26). According to REF2014 guidance, the primary functions of an impact case study were to articulate and evidence the significance and reach of impacts arising from research beyond academia, clearly demonstrating the contribution that research from a given institution contributed to those impacts (HEFCE, 2011 ).

In addition to these explicit criteria driving the evaluation of impact in REF2014, a number of analyses have emphasised the role of implicit criteria and subjectivity in shaping the evaluation of impact. For example, Pidd and Broadbent ( 2015 ) emphasised the implicit role a “strong narrative” plays in high-scoring case studies (p. 575). This was echoed by the fears of one REF2014 panellist interviewed by Watermeyer and Chubb ( 2018 ) who said, “I think with impact it is literally so many words of persuasive narrative” as opposed to “giving any kind of substance” (p. 9). Similarly, Watermeyer and Hedgecoe ( 2016 ), reporting on an internal exercise at Cardiff University to evaluate case studies prior to submission, emphasised that “style and structure” were essential to “sell impact”, and that “case studies that best sold impact were those rewarded with the highest evaluative scores” (p. 651).

Recent research based on interviews with REF2014 panellists has also emphasised the subjectivity of the peer-review process used to evaluate impact. Derrick’s ( 2018 ) research findings based on panellist interviews and participant observation of REF2014 sub-panels argued that scores were strongly influenced by who the evaluators were and how the group assessed impact together. Indeed, a panellist interviewed by Watermeyer and Chubb ( 2018 ) concurred that “the panel had quite an influence on the criteria” (p. 7), including an admission that some types of (more intangible) evidence were more likely to be overlooked than other (more concrete) forms of evidence, “privileg[ing] certain kinds of impact”. Other panellists interviewed spoke of their emotional and intellectual vulnerability in making judgements about an impact criterion that they had little prior experience of assessing (Watermeyer and Chubb, 2018 ). Derrick ( 2018 ) argued that this led many evaluators to base their assessments on more familiar proxies for excellence linked to scientific excellence, which led to biased interpretations and shortcuts that mimicked “groupthink” (p. 193).

This paper will for the first time empirically assess the content and language of the largest possible sample of research impact case studies that received high versus low scores from assessment panels in REF2014. Combining qualitative thematic and quantitative linguistic analysis, we ask:

How do high-scoring versus low-scoring case studies articulate and evidence impacts linked to underpinning research?

Do high-scoring and low-scoring case studies have differences in their linguistic features or styles?

Do high-scoring and low-scoring case studies have lexical differences (words and phrases that are statistically more likely to occur in high- or low-scoring cases) or text-level differences (including reading ease, narrative clarity, use of cohesive devices)?

By answering these questions, our goal is to provide evidence for impact case study authors and their institutions to reflect on in order to optimally balance the content and to use language that communicates their impact as effectively as possible. While directly relevant to the assessment of impact in the UK’s REF, the work also provides insights of relevance to institutions internationally who are designing evaluation frameworks for research impact.

Research design and sample

The datasets were generated by using published institutional REF2014 impact scores to deduce the scores of some impact case studies themselves. Although scores for individual case studies were not made public, we were able to identify case studies that received the top mark of 4* based on the distribution of scores received by some institutions, where the whole submission by an institution in a given Unit of Assessment was awarded the same score. In those 20 Units of Assessment (henceforth UoA) where high-scoring case studies could be identified in this way, we also accessed all case studies known to have scored either 1* or 2* in order to compare the features of high-scoring case studies to those of low-scoring case studies.

We approached our research questions with two separate studies, using quantitative linguistic and qualitative thematic analysis respectively. The thematic analysis, explained in more detail in the section “Qualitative thematic analysis” below, allowed us to find answers to research question 1 (see above). The quantitative linguistic analysis was used to extract and compare typical word combinations for high-scoring and low-scoring case studies, as well as assessing their readability. It mainly addressed research questions 2 and 3.

The quantitative linguistic analysis was based on a sample of all identifiable high-scoring case studies in any UoA ( n  = 124) and all identifiable low-scoring impact case studies in those UoAs where high-scoring case studies could be identified ( n  = 93). As the linguistic analysis focused on identifying characteristic language choices in running text, only those sections designed to contain predominantly text were included (1—Summary of the impact; 2—Underpinning research; 4—Details of the impact). Figure 1 shows the distribution of case studies across Main Panels in the quantitative analysis. Table 1 summarises the number of words included in the analysis.

figure 1

Distribution of case studies across Main Panels used for the linguistic analysis sample.

In order to detect patterns of content in high-scoring and low-scoring case studies across all four Main Panels, a sub-sample of case studies was selected for a qualitative thematic analysis. This included 60% of high-scoring case studies and 97% of low-scoring case studies from the quantitative analysis, such that only UoAs were included where both high-scoring and low-scoring case studies are available (as opposed to the quantitative sample, which includes all available high-scoring case studies). Further selection criteria were then designed to create a greater balance in the number of high-scoring and low-scoring case studies across Main Panels. Main Panels A (high) and C (low) were particularly over-represented, so a lower proportion of those case studies were selected and 10 additional high-scoring case studies were considered in Panel B, including institutions where at least 85% of the case studies scored 4* and the remaining scores were 3*. As this added a further UoA, we could also include 14 more low-scoring case studies in Main Panel B. This resulted in a total of 85 high-scoring and 90 low-scoring case studies. Figure 2 shows the distribution of case studies across Main Panels in the thematic analysis, illustrating the greater balance compared to the sample used in the quantitative analysis. The majority (75%) of the case studies analysed are included in both samples (Table 2 ).

figure 2

Distribution of case studies across Main Panels used for the thematic analysis sample.

Quantitative linguistic analysis

Quantitative linguistic analysis can be used to make recurring patterns in language use visible and to assess their significance. We treated the dataset of impact case studies as a text collection (the ‘corpus’) divided into two sections, namely high-scoring and low-scoring case studies (the two ‘sub-corpora’), in order to explore the lexical profile and the readability of the case studies.

One way to explore the lexical profile of groups of texts is to generate frequency-based word lists and compare these to word lists from a reference corpus to determine which words are characteristic of the corpus of interest (“keywords”, cf. Scott, 1997 ). Another way is to extract word combinations that are particularly frequent. Such word combinations, called “lexical bundles”, are “extended collocations” (Hyland, 2008 , p. 41) that appear across a set range of texts (Esfandiari and Barbary, 2017 ). We merged these two approaches in order to uncover meanings that could not be made visible through the analysis of single-word frequencies, comparing lexical bundles from each sub-corpus to the other. Lexical bundles of 2–4 words were extracted with AntConc (specialist software developed by Anthony, 2014 ) firstly from the corpus of all high-scoring case studies and then separately from the sub-corpora of high-scoring case studies in Main Panel A, C and D. Footnote 2 The corresponding lists were extracted from low-scoring case studies overall and separated by panel. The lists of lexical bundles for each of the high-scoring corpus parts were then compared to the corresponding low-scoring parts (High-Overall vs. Low-Overall, High-Main Panel A vs. Low-Main Panel A, etc.) to detect statistically significant over-use and under-use in one set of texts relative to another.

Two statistical measures were used in the analysis of lexical bundles. Log Likelihood was used as a measure of the statistical significance of frequency differences (Rayson and Garside, 2000 ), with a value of >3.84 corresponding to p  < 0.05. This measure had the advantage, compared to the more frequently used chi-square test, of not assuming a normal distribution of data (McEnery et al., 2006 ). The Log Ratio (Hardie, 2014 ) was used as a measure of effect size, which quantifies the scale, rather than the statistical significance, of frequency differences between two datasets. The Log Ratio is technically the binary log of the relative risk, and a value of >0.5 or <−0.5 is considered meaningful in corpus linguistics (Hardie, 2014 ), with values further removed from 0 reflecting a bigger difference in the relative frequencies found in each corpus. There is currently no agreed standard effect size measure for keywords (Brezina, 2018 , p. 85) and the Log Ratio was chosen because it is straightforward to interpret. Each lexical bundle that met the ‘keyness’ threshold (Log Likelihood > 3.84 in the case of expected values > 12, with higher significance levels needed for expected values < 13—see Rayson et al., 2004 , p. 8) was then assigned a code according to its predominant meaning in the texts, as reflected in the contexts captured in the concordance lines extracted from the corpus.

In the thematic analysis, it appeared that high-scoring case studies were easier to read. In order to quantify the readability of the texts, we therefore analysed them using the Coh-Metrix online tool (www.cohmetrix.com, v3.0) developed by McNamara et al. ( 2014 ). This tool provides 106 descriptive indices of language features, including 8 principal component scores developed from combinations of the other indices (Graesser et al., 2011 ). We selected these principal component scores as comprehensive measures of “reading ease” because they assess multiple characteristics of the text, up to whole-text discourse level (McNamara et al., 2014 , p. 78). This was supplemented by the traditional and more wide-spread Flesch Reading Ease score of readability measuring the lengths of words and sentences, which are highly correlated with reading speed (Haberlandt and Graesser, 1985 ). The selected measures were compared across corpus sections using t -tests to evaluate significance. The effect size was measured using Cohen’s D , following Brezina ( 2018 , p. 190), where D  > 0.3 indicates a small, D  > 0.5 a medium, and D  > 0.8 a high effect size. As with the analysis of lexical bundles, comparisons were made between high- and low-scoring case studies in each of Main Panels A, C and D, as well as between all high-scoring and all low-scoring case studies across Main Panels.

Qualitative thematic analysis

While a quantitative analysis as described above can make differences in the use of certain words visible, it does not capture the narrative or content of the texts under investigation. In order to identify common features of high-scoring and low-scoring case studies, thematic analysis was chosen to complement the quantitative analysis by identifying patterns and inferring meaning from qualitative data (Auerbach and Silverstein, 2003 ; Braun and Clarke, 2006 ; Saldana, 2009 ). To familiarise themselves with the data and for inter-coder reliability, two research team members read a selection of REF2014 impact case studies from different Main Panels, before generating initial codes for each of the five sections of the impact case study template. These were discussed with the full research team, comprising three academic and three professional services staff who had all read multiple case studies themselves. They were piloted prior to defining a final set of themes and questions against which the data was coded (based on the six-step process outlined by Braun and Clarke, 2006 ) (Table 3 ). An additional category was used to code stylistic features, to triangulate elements of the quantitative analysis (e.g. readability) and to include additional stylistic features difficult to assess in quantitative terms (e.g. effective use of testimonials). In addition to this, 10 different types of impact were coded for, based on Reed’s ( 2018 ) typology: capacity and preparedness, awareness and understanding, policy, attitudinal change, behaviour change and other forms of decision-making, other social, economic, environmental, health and wellbeing, and cultural impacts. There was room for coders to include additional insights arising in each section of the case study that had not been captured in the coding system; and there was room to summarise other key factors they thought might account for high or low scores.

Coders summarised case study content pertaining to each code, for example by listing examples of effective or poor use of structure and formatting as they arose in each case study. Coders also quoted the original material next to their summaries so that their interpretation could be assessed during subsequent analysis. This initial coding of case study text was conducted by six coders, with intercoder reliability (based on 10% of the sample) assessed at over 90%. Subsequent thematic analysis within the codes was conducted by two of the co-authors. This involved categorising coded material into themes as a way of assigning meaning to features that occurred across multiple case studies (e.g. categorising types of corroborating evidence typically used in high-scoring versus low-scoring case studies).

Results and discussion

In this section, we integrate findings from the quantitative linguistic study and the qualitative analysis of low-scoring versus high-scoring case studies. The results are discussed under four headings based on the key findings that emerged from both analyses. Taken together, these findings provide the most comprehensive evidence to date of the characteristics of a top-rated (4*) impact case study in REF2014.

Highly-rated case studies provided specific, high-magnitude and well-evidenced articulations of significance and reach

One finding from our qualitative thematic analysis was that 84% of high-scoring cases articulated benefits to specific groups and provided evidence of their significance and reach, compared to 32% of low-scoring cases which typically focused instead on the pathway to impact, for example describing dissemination of research findings and engagement with stakeholders and publics without citing the benefits arising from dissemination or engagement. One way of conceptualising this difference is using the content/process distinction: whereas low-scoring cases tended to focus on the process through which impact was sought (i.e. the pathway used), the high-scoring cases tended to focus on the content of the impact itself (i.e. what change or improvement occurred as a result of the research).

Examples of global reach were evidenced across high-scoring case studies from all panels (including Panel D for Arts and Humanities research), but were less often claimed or evidenced in low-scoring case studies. Where reach was more limited geographically, many high-scoring case studies used context to create robust arguments that their reach was impressive in that context, describing reach for example in social or cultural terms or arguing for the importance of reaching a narrow but hard-to-reach or otherwise important target group.

Table 4 provides examples of evidence from high-scoring cases and low-scoring cases that were used to show significance and reach of impacts in REF2014.

Findings from the quantitative linguistic analysis in Table 5 show how high-scoring impact case studies contained more phrases that specified reach (e.g. “in England and”, “in the US”), compared to low-scoring case studies that used the more generic term “international”, leaving the reader in doubt about the actual reach. They also include more phrases that implicitly specified the significance of the impact (e.g. “the government’s” or “to the House of Commons”), compared to low-scoring cases which provided more generic phrases, such as “policy and practice”, rather than detailing specific policies or practices that had been changed.

The quantitative linguistics analysis also identified a number of words and phrases pertaining to engagement and pathways, which were intended to deliver impact but did not actually specify impact (Table 6 ). A number of phrases contained the word “dissemination”, and there were several words and phrases specifying types of engagement that could be considered more one-way dissemination than consultative or co-productive (cf. Reed et al.’s ( 2018 ) engagement typology), e.g. “the book” and “the event”. The focus on dissemination supports the finding from the qualitative thematic analysis that low-scoring case tended to focus more on pathways or routes than on impact. Although it is not possible to infer this directly from the data, it is possible that this may represent a deeper epistemological position underpinning some case studies, where impact generation was seen as one-way knowledge or technology transfer, and research findings were perceived as something that could be given unchanged to publics and stakeholders through dissemination activities, with the assumption that this would be understood as intended and lead to impact.

It is worth noting that none of the four UK countries appear significantly more often in either high-scoring or low-scoring case studies (outside of the phrase “in England and”). Wales ( n  = 50), Scotland ( n  = 71) and Northern Ireland ( n  = 32) appear slightly more often in high-scoring case studies, but the difference is not significant (England: n  = 162). An additional factor to take into account is that our dataset includes only submissions that are either high-scoring or low-scoring, and the geographical spread of the submitting institutions was not a factor in selecting texts. There was a balanced number of high-scoring and low-scoring case studies in the sample from English, Scottish and Welsh universities, but no guaranteed low-scoring submissions from Northern Irish institutions. The REF2014 guidance made it clear that impacts in each UK country would be evaluated equally in comparison to each other, the UK and other countries. While the quantitative analysis of case studies from our sample only found a statistically significant difference for the phrase “in England and”, this, combined with the slightly higher number of phrases containing the other countries of the UK in high-scoring case studies, might indicate that this panel guidance was implemented as instructed.

Figures 3 – 5 shows which types of impact could be identified in high-scoring or low-scoring case studies, respectively, in the qualitative thematic analysis (based on Reed’s ( 2018 ) typology of impacts). Note that percentages do not add up to 100% because it was possible for each case study to claim more than one type of impact (high-scoring impact case studies described on average 2.8 impacts, compared to an average of 1.8 impacts described by low-scoring case studies) Footnote 3 . Figure 3 shows the number of impacts per type as a percentage of the total number of impacts claimed in high-scoring versus low-scoring case studies. This shows that high-scoring case studies were more likely to claim health/wellbeing and policy impacts, whereas low-scoring case studies were more likely to claim understanding/awareness impacts. Looking at this by Main Panel, over 50% of high-scoring case studies in Main Panel A claimed health/wellbeing, policy and understanding/awareness impacts (Fig. 4 ), whereas over 50% of low-scoring case studies in Main Panel A claimed capacity building impacts (Fig. 5 ). There were relatively high numbers of economic and policy claimed in both high-scoring and low-scoring case studies in Main Panels B and C, respectively, with no impact type dominating strongly in Main Panel D (Figs. 4 and 5 ).

figure 3

Number of impacts claimed in high- versus low-scoring case studies by impact type.

figure 4

Percentage of high-scoring case studies that claimed different types of impact.

figure 5

Percentage of low-scoring case studies that claimed different types of impact.

Highly-rated case studies used distinct features to establish links between research (cause) and impact (effect)

Findings from the quantitative linguistic analysis show that high-scoring case studies were significantly more likely to include attributional phrases like “cited in”, “used to” and “resulting in”, compared to low-scoring case studies (Table 7 provides examples for some of the 12 phrases more frequent in high-scoring case studies). However, there were some attributional phrases that were more likely to be found in low-scoring case studies (e.g. “from the”, “of the research” and “this work has”—total of 9 different phrases).

To investigate this further, all 564 and 601 instances Footnote 4 of attributional phrases in high-scoring and low-scoring case studies, respectively, were analysed to categorise the context in which they were used, to establish the extent to which these phrases in each corpus were being used to establish attribution to impacts. The first word or phrase preceding or succeeding the attributional content was coded. For example, if the attributional content was “used the”, followed by “research to generate impact”, the first word succeeding the attributional content (in this case “research”) was coded rather than the phrase it subsequently led to (“generate impact”). According to a Pearson Chi Square test, high-scoring case studies were significantly more likely to establish attribution to impact than low-scoring cases ( p  < 0.0001, but with a small effect size based on Cramer’s V  = 0.22; bold in Table 8 ). 18% ( n  = 106) of phrases in the low-scoring corpus established attribution to impact, compared to 37% ( n  = 210) in the high-scoring corpus, for example, stating that research, pathway or something else led to impact. Instead, low-scoring case studies were more likely to establish attribution to research (40%; n  = 241) compared to high-scoring cases (28%; n  = 156; p  < 0.0001, but with a small effect size based on Cramer’s V  = 0.135). Both high- and low-scoring case studies were similarly likely to establish attribution to pathways (low: 32%; n  = 194; high: 31% n  = 176).

Moreover, low-scoring case studies were more likely to include ambiguous or uncertain phrases. For example, the phrase “a number of” can be read to imply that it is not known how many instances there were. This occurred in all sections of the impact case studies, for example in the underpinning research section as “The research explores a number of themes” or in the summary or details of the impact section as “The work has also resulted in a number of other national and international impacts”, or “has influenced approaches and practices of a number of partner organisations”. Similarly, “an impact on” could give the impression that the nature of the impact is not known. This phrase occurred only in summary and details of the impact sections, for example, “These activities have had an impact on the professional development”, “the research has had an impact on the legal arguments”, or “there has also been an impact on the work of regional agency”.

In the qualitative thematic analysis, we found that only 50% of low-scoring case studies clearly linked the underpinning research to claimed impacts (compared to 97% of high-scoring cases). This gave the impression of over-claimed impacts in some low-scoring submissions. For example, one case study claimed “significant impacts on [a country’s] society” based on enhancing the security of a new IT system in the department responsible for publishing and archiving legislation. Another claimed “economic impact on a worldwide scale” based on billions of pounds of benefits, calculated using an undisclosed method by an undisclosed evaluator in an unpublished final report by the research team. One case study claimed attribution for impact based on similarities between a prototype developed by the researchers and a product subsequently launched by a major corporation, without any evidence that the product as launched was based on the prototype. Similar assumptions were made in a number of other case studies that appeared to conflate correlation with causation in their attempts to infer attribution between research and impact. Table 9 provides examples of different ways in which links between research and impact were evidenced in the details of the research section.

Table 10 shows how corroborating sources were used to support these claims. 82% of high-scoring case studies compared to 7% of low-scoring cases were identified in the qualitative thematic analysis as having generally high-quality corroborating evidence. In contrast, 11% of high-scoring case studies, compared to 71% of low-scoring cases, were identified as having corroborating evidence that was vague and/or poorly linked to claimed impacts. Looking at only case studies that claim policy impact, 11 out of 26 high-scoring case studies in the sample described both policy and implementation (42%), compared to just 5 out of 29 low-scoring case studies that included both policy and implementation (17%; the remainder described policy impacts only with no evidence of benefits arising from implementation). High- scoring case studies were more likely to cite evidence of impacts rather than just citing evidence pertaining to the pathway (which was more common in low-scoring cases). High-scoring policy case studies also provided evidence pertaining to the pathway, but because they typically also included evidence of policy change, this evidence helped attribute policy impacts to research.

Highly-rated case studies were easy to understand and well written

In preparation for the REF, many universities invested heavily in writing assistance (Coleman, 2019 ) to ensure that impact case studies were “easy to understand and evaluation-friendly” (Watermeyer and Chubb, 2018 ) for the assessment panels, which comprised academics and experts from other sectors (HEFCE, 2011 , p. 6). With this in mind, we investigated readability and style, both in the quantitative linguistic and in the qualitative thematic analysis.

High-scoring impact case studies scored more highly on the Flesch Reading Ease score, a readability measure based on the length of words and sentences. The scores in Table 11 are reported out of 100, with a higher score indicating that a text is easier to read. While the scores reveal a significant difference between 4* and 1*/2* impact case studies, they also indicate that impact case studies are generally on the verge of “graduate” difficulty (Hartley, 2016 , p. 1524). As such our analysis should not be understood as suggesting that these technical documents should be adjusted to the readability of a newspaper article, but they should be maintained at interested and educated non-specialist level.

Interestingly, there were differences between the main panels. Footnote 5 In Social Science and Humanities case studies (Main Panels C and D), high-scoring impact case studies scored significantly higher on reading ease than low-scoring ones. There was no significant difference in Main Panel A between 4* and 1*/2* cases. However, all Main Panel A case studies showed, on average, lower reading ease scores than the low-scoring cases in Main Panels C and D. This means that their authors used longer words and sentences, which may be explained in part by more and longer technical terms needed in Main Panel A disciplines; the difference between high- and low-scoring case studies in Main Panels C and D may be explained by the use of more technical jargon (confirmed in the qualitative analysis).

The Flesch Reading Ease measure assesses the sentence- and word-level, rather than capturing higher-level text-processing difficulty. While this is recognised as a reliable indicator of comparative reading ease, and the underlying measures of sentence-length and word-length are highly correlated with reading speed (Haberlandt and Graesser, 1985 ), Hartley ( 2016 ) is right in his criticism that the tool takes neither the meaning of the words nor the wider text into account. The Coh-Metrix tool (McNamara et al., 2014 ) provides further measures for reading ease based on textual cohesion in these texts compared to a set of general English texts. Of the eight principal component scores computed by the tool, most did not reveal a significant difference between high- and low-scoring case studies or between different Main Panels. Moreover, in most measures, impact case studies overall were fairly homogenous compared to the baseline of general English texts. However, there were significant differences between high- and low-scoring impact case studies in two of the measures: “deep cohesion” and “connectivity” (Table 12 ).

“Deep cohesion” shows whether a text makes causal connections between ideas explicit (e.g. “because”, “so”) or leaves them for the reader to infer. High-scoring case studies had a higher level of deep cohesion compared to general English texts (Graesser et al., 2011 ), while low-scoring case studies tended to sit below the general English average. In addition, Main Panel A case studies (Life Sciences), which received the lowest scores in Flesch Reading Ease, on average scored higher on deep cohesion than case studies in more discursive disciplines (Main Panel C—Social Sciences and Main Panel D—Arts and Humanities). “Connectivity” measures the level of explicit logical connectives (e.g. “and”, “or” and “but”) to show relations in the text. Impact case studies were low in connectivity compared to general English texts, but within each of the Main Panels, high-scoring case studies had more explicit connectivity than low-scoring case studies. This means that Main Panel A case studies, while using on average longer words and sentences as indicated by the Flesch Reading Ease scores, compensated for this by making causal and logical relationships more explicit in the texts. In Main Panels C and D, which on average scored lower on these measures, there was a clearer difference between high- and low-scoring case studies than in Main Panel A, with high-scoring case studies being easier to read.

Linked to this, low-scoring case studies across panels were more likely than high-scoring case studies to contain phrases linked to the research process (suggesting an over-emphasis on the research rather than the impact, and a focus on process over findings or quality; Table 18 ) and filler-phrases (Table 13 ).

High-scoring case studies were more likely to clearly identify individual impacts via subheadings and paragraph headings ( p  < 0.0001, with effect size measure Log Ratio 0.54). The difference is especially pronounced in Main Panel D (Log Ratio 1.53), with a small difference in Main Panel C and no significant difference in Main Panel A. In Units of Assessment combined in Main Panel D, a more discursive academic writing style is prevalent (see e.g. Hyland, 2002 ) using fewer visual/typographical distinctions such as headings. The difference in the number of headings used in case studies from those disciplines suggests that high-scoring case studies showed greater divergence from disciplinary norms than low-scoring case studies. This may have allowed them to adapt the presentation of their research impact to the audience of panel members to a greater extent than low-scoring case studies.

The qualitative thematic analysis of Impact Case Studies indicates that it is not simply the number of subheadings that matters, although this comparison is interesting especially in the context of the larger discrepancy in Main Panel D. Table 14 summarises formatting that was considered helpful and unhelpful from the qualitative analysis.

The observations in Tables 11 – 13 stem from quantitative linguistic analysis, which, while enabling statistical testing, does not show directly the effect of a text on the reader. When conducting the qualitative thematic analysis, we collected examples of formatting and stylistic features from the writing and presentation of high and low-scoring case studies that might have affected clarity of the texts (Tables 14 and 15 ). Specifically, 38% of low-scoring case studies made inappropriate use of adjectives to describe impacts (compared to 20% of high-scoring; Table 16 ). Inappropriate use of adjectives may have given an impression of over-claiming or created a less factual impression than case studies that used adjectives more sparingly to describe impacts. Some included adjectives to describe impacts in testimonial quotes, giving third-party endorsement to the claims rather than using these adjectives directly in the case study text.

Highly-rated case studies were more likely to describe underpinning research findings, rather than research processes

To be eligible, case studies in REF2014 had to be based on underpinning research that was “recognised internationally in terms of originality, significance and rigour” (denoted by a 2* quality profile, HEFCE, 2011 , p. 29). Ineligible case studies were excluded from our sample (i.e. those in the “unclassifiable” quality profile), so all the case studies should have been based on strong research. Once this research quality threshold had been passed, scores were based on the significance and reach of impact, so case studies with higher-rated research should not, in theory, get better scores on the basis of their underpinning research. However, there is evidence that units whose research outputs scored well in REF2014 also performed well on impact (unpublished Research England analysis cited in Hill, 2016 ). This observation only shows that high-quality research and impact were co-located, rather than demonstrating a causal relationship between high-quality research and highly rated impacts. However, our qualitative thematic analysis suggests that weaker descriptions of research (underpinning research was not evaluated directly) may have been more likely to be co-located with lower-rated impacts at the level of individual case studies. We know that the majority of underpinning research in the sample was graded 2* or above (because we excluded unclassifiable case studies from the analysis) but individual ratings for outputs in the underpinning research section are not provided in REF2014. Therefore, the qualitative analysis looked for a range of indicators of strong or weak research in four categories: (i) indicators of publication quality; (ii) quality of funding sources; (iii) narrative descriptions of research quality; and (iv) the extent to which the submitting unit (versus collaborators outside the institution) had contributed to the underpinning research. As would be expected (given that all cases had passed the 2* threshold), only a small minority of cases in the sample gave grounds to doubt the quality of the underpinning research. However, both our qualitative and quantitative analyses identified research-related differences between high- and low-scoring impact case studies.

Based on our qualitative thematic analysis of indicators of research quality, a number of low-scoring cases contained indications that underpinning research may have been weak. This was very rare in high-scoring cases. In the most extreme case, one case study was not able to submit any published research to underpin the impact, relying instead on having secured grant funding and having a manuscript under review. Table 17 describes indicators that underpinning research may have been weaker (presumably closer to the 2* quality threshold for eligibility). It also describes the indications of higher quality research (which were likely to have exceeded the 2* threshold) that were found in the rest of the sample. High-scoring case studies demonstrated the quality of the research using a range of direct and indirect approaches. Direct approaches included the construction of arguments that articulated the originality, significance and rigour of the research in the “underpinning research” section of the case study (sometimes with reference to outputs that were being assessed elsewhere in the exercise to provide a quick and robust check on quality ratings). In addition to this, a wide range of indirect proxies were used to infer quality, including publication venue, funding sources, reviews and awards.

These indicators are of particular interest given the stipulation in REF2021 that case studies must provide evidence of research quality, with the only official guidance suggesting that this is done via the use of indicators. The indicators identified in Table 17 overlap significantly with example indicators proposed by panels in the REF2021 guidance. However, there are also a number of additional indicators, which may be of use for demonstrating the quality of research in REF2021 case studies. In common with proposed REF2021 research quality indicators, many of the indicators in Table 17 are highly context dependent, based on subjective disciplinary norms that are used as short-cuts to assessments of quality by peers within a given context. Funding sources, publication venues and reviews that are considered prestigious in one disciplinary context are often perceived very differently in other disciplinary contexts. While REF2021 does not allow the use of certain indicators (e.g. journal impact factors), no comment is given on the appropriateness of the suggested indicators. While this may be problematic, given that an indicator by definition sign-posts, suggests or indicates by proxy rather than representing the outcome of any rigorous assessment, we make no comment on whether it is appropriate to judge research quality via such proxies. Instead, Table 17 presents a subjective, qualitative identification of indicators of high or low research quality, which were as far as possible considered within the context of disciplinary norms in the Units of Assessments to which the case studies belonged.

The quantitative linguistic analysis also found differences between the high-scoring and low-scoring case studies relating to underpinning research. There were significantly more words and phrases in low-scoring case studies compared to high-scoring cases relating to research outputs (e.g. “the paper”, “peer-reviewed”, “journal of”, “et al”), the research process (e.g. “research project”, “the research”, “his work”, “research team”) and descriptions of research (“relationship between”, “research into”, “the research”) (Table 18 ). The word “research” itself appears frequently in both (high: 91× per 10,000 words; low: 110× per 10,000 words), which is nevertheless a small but significant over-use in the low-scoring case studies (effect size measure log ratio = 0.27, p  < 0.0001).

There are two alternative ways to interpret these findings. First, the qualitative research appears to suggest a link between higher-quality underpinning research and higher impact scores. However, the causal mechanism is not clear. An independent review of REF2014 commissioned by the UK Government (Stern, 2016 ) proposed that underpinning research should only have to meet the 2* threshold for rigour, as the academic significance and novelty of the research is not in theory a necessary precursor to significant and far-reaching impact. However, a number of the indications of weaker research in Table 17 relate to academic significance and originality, and many of the indicators that suggested research exceeded the 2* threshold imply academic significance and originality (e.g. more prestigious publication venues often demand stronger evidence of academic significance and originality in addition to rigour). As such, it may be possible to posit two potential causal mechanisms related to the originality and/or significance of research. First, it may be argued that major new academic breakthroughs may be more likely to lead to impacts, whether directly in the case of applied research that addresses societal challenges in new and important ways leading to breakthrough impacts, or indirectly in the case of major new methodological or theoretical breakthroughs that make new work possible that addresses previously intractable challenges. Second, the highest quality research may have sub-consciously biased reviewers to view associated impacts more favourably. Further research would be necessary to test either mechanism.

However, these mechanisms do not explain the higher frequency of words and phrases relating to research outputs and process in low-scoring case studies. Both high-scoring and low-scoring cases described the underpinning research, and none of the phrases that emerged from the analysis imply higher or lower quality of research. We hypothesised that this may be explained by low-scoring case studies devoting more space to underpinning research at the expense of other sections that may have been more likely to contribute towards scores. Word limits were “indicative”, and the real limit of “four pages” in REF2014 (extended to five pages in REF2021) was operationalised in various way. However, a t -test found no significant difference between the underpinning research word counts (mean of 579 and 537 words in high and low-scoring case studies, respectively; p  = 0.11). Instead, we note that words and phrases relating to research in the low-scoring case studies focused more on descriptions of research outputs and processes rather than descriptions of research findings or the quality of research, as requested in REF2014 guidelines. Given that eligibility evidenced in this section is based on whether the research findings underpin the impacts and the quality of the research (HEFCE, 2011 ), we hypothesise that the focus of low-scoring case studies on research outputs and processes was unnecessary (at best) or replaced or obscured research findings (at worst). This could be conceptualised as another instance of the content/process distinction, whereby high-scoring case studies focused on what the research found and low-scoring case studies focused on the process through which the research was conducted and disseminated. It could be concluded that this tendency may have contributed towards lower scores if unnecessary descriptions of research outputs and process, which would not have contributed towards scores, used up space that could otherwise have been used for material that may have contributed towards scores.

Limitations

These findings may be useful in guiding the construction and writing of case studies for REF2021 but it is important to recognise that our analyses are retrospective, showing examples of what was judged to be ‘good’ and ‘poor’ practice in the authorship of case studies for REF2014. Importantly, the findings of this study should not be used to infer a causal relationship between the linguistic features we have identified and the judgements of the REF evaluation panel. Our quantitative analysis has identified similarities and differences in their linguistic features, but there are undoubtedly a range of considerations taken into account by evaluation panels. It is also not possible to anticipate how REF2021 panels will interpret guidance and evaluate case studies, and there is already evidence that practice is changing significantly across the sector. This shift in expectations regarding impact is especially likely to be the case in research concerned with public policy, which are increasingly including policy implementation as well as design in their requirements, and research involving public engagement, which is increasingly being expected to provide longitudinal evidence of benefits and provide evidence of cause and effect. We are unable to say anything conclusive from our sample about case studies that focused primarily on public engagement and pedagogy because neither of these types of impact were common enough in either the high-scoring or low-scoring sample to infer reliable findings. While this is the largest sample of known high-scoring versus low-scoring case studies ever analysed, it is important to note that this represents <3% of the total case studies submitted to REF2014. Although the number of case studies was fairly evenly balanced between Main Panels in the thematic analysis, the sample only included a selection of Units of Assessment from each Main Panel, where sufficient numbers of high and low-scoring cases could be identified (14 and 20 out of 36 Units of Assessment in the qualitative and quantitative studies, respectively). As such, caution should be taken when generalising from these findings.

This paper provides empirical insights into the linguistic differences in high-scoring and low-scoring impact case studies in REF2014. Higher-scoring case studies were more likely to have articulated evidence of significant and far-reaching impacts (rather than just presenting the activities used to reach intended future impacts), and they articulated clear evidence of causal links between the underpinning research and claimed impacts. While a cause and effect relationship between linguistic features, styles and the panel’s evaluation cannot be claimed, we have provided a granularity of analysis that shows how high-scoring versus low-scoring case studies attempted to meet REF criteria. Knowledge of these features may provide useful lessons for future case study authors, submitting institutions and others developing impact assessments internationally. Specifically, we show that high-scoring case studies were more likely to provide specific and high-magnitude articulations of significance and reach, compared to low-scoring cases, which were more likely to provide less specific and lower-magnitude articulations of significance and reach. Lower-scoring case studies were more likely to focus on pathways to impact rather than articulating clear impact claims, with a particular focus on one-way modes of knowledge transfer. High-scoring case studies were more likely to provide clear links between underpinning research and impacts, supported by high-quality corroborating evidence, compared to low-scoring cases that often had missing links between research and impact and were more likely to be underpinned by corroborating evidence that was vague and/or not clearly linked to impact claims. Linked to this, high-scoring case studies were more likely to contain attributional phrases, and these phrases were more likely to attribute research and/or pathways to impact, compared to low-scoring cases, which contained fewer attributional phrases, which were more likely to provide attribution to pathways rather than impact. Furthermore, there is evidence that high-scoring case studies had more explicit causal connections between ideas and more logical connective words (and, or, but) than low-scoring cases.

However, in addition to the explicit REF2014 rules, which appear to have been enacted effectively by sub-panels, there is evidence that implicit rules, particularly linked to written style, may also have played a role. High-scoring case studies appear to have conformed to a distinctive new genre of writing, which was clear and direct, often simplified in its representation of causality between research and impact, and less likely to contain expressions of uncertainty than might be normally expected in academic writing (cf. e.g. Vold, 2006 ; Yang et al., 2015 ). Low-scoring case studies were more likely to contain filler phrases that could be described as “academese” (Biber and Gray, 2019 , p. 1), more likely to use unsubstantiated or vague adjectives to describe impacts, and were less likely to signpost readers to key points using sub-headings and paragraph headings. High-scoring case studies in two Main Panels (out of the three that could be analysed in this way) were significantly easier to read, although both high- and low-scoring case studies tended to be of “graduate” (Hartley, 2016 ) difficulty.

These findings suggest that aspects of written style may have contributed towards or compromised the scores of some case studies in REF2014, in line with previous research emphasising the role of implicit and subjective factors in determining the outcomes of impact evaluation (Derrick, 2018 ; Watermeyer and Chubb, 2018 ). If this were the case, it may raise questions about whether case studies are an appropriate way to evaluate impact. However, metric-based approaches have many other limitations and are widely regarded as inappropriate for evaluating societal impact (Bornmann et al., 2018 ; Pollitt et al., 2016 ; Ravenscroft et al., 2017 ; Wilsdon et al., 2015 ). Comparing research output evaluation systems across different countries, Sivertsen ( 2017 ) presents the peer-review-based UK REF as “best practice” compared to the metrics-based systems elsewhere. Comparing the evaluation of impact in the UK to impact evaluations in USA, the Netherlands, Italy and Finland, Derrick ( 2019 ) describes REF2014 and REF2021 as “the world’s most developed agenda for evaluating the wider benefits of research and its success has influenced the way many other countries define and approach the assessment of impact”.

We cannot be certain about the extent to which linguistic features or style shaped the judgement of REF evaluators, nor can such influences easily be identified or even consciously recognised when they are at work (cf. research on sub-conscious bias and tacit knowledge; the idea that “we know more than we can say”—Polanyi, 1958 cited in Goodman, 2003 , p. 142). Nonetheless, we hope that the granularity of our findings proves useful in informing decisions about presenting case studies, both for case study authors (in REF2021 and other research impact evaluations around the world) and those designing such evaluation processes. In publishing this evidence, we hope to create a more “level playing field” between institutions with and without significant resources available to hire dedicated staff or consultants to help write their impact case studies.

Data availability

The dataset analysed during the current study corresponds to the publicly available impact case studies defined through the method explained in Section “Research design and sample” and Table 2 . A full list of case studies included can be obtained from the corresponding author upon request.

https://impact.ref.ac.uk/casestudies/search1.aspx

For Main Panel B, only six high-scoring and two low-scoring case studies are clearly identifiable and available to the public (cf. Fig. 1 ). The Main Panel B dataset is therefore too small for separate statistical analysis, and no generalisations should be made on the basis of only one high-scoring and one low-scoring submission.

However, in the qualitative analysis, there were a similar number of high-scoring case studies that were considered to have reached this score due to a clear focus on one single, highly impressive impact, compared to those that were singled out for their impressive range of different impacts.

Note that there were more instances of the smaller number of attributional phrases in the low-scoring corpus.

For Main Panel B, only six high-scoring and two low-scoring case studies are clearly identifiable and available to the public. The Main Panel B dataset is therefore too small for separate statistical analysis, and no generalisations should be made on the basis of only one high-scoring and one low-scoring submission.

Anthony L (2014) AntConc, 3.4.4 edn. Waseda University, Tokyo

Google Scholar  

Auerbach CF, Silverstein LB (2003) Qualitative data: an introduction to coding and analyzing data in qualitative research. New York University Press, New York, NY

Back L (2015) On the side of the powerful: the ‘impact agenda’ and sociology in public. https://www.thesociologicalreview.com/on-the-side-of-the-powerful-the-impact-agenda-sociology-in-public/ . Last Accessed 24 Jan 2020

Biber D, Gray B (2019) Grammatical complexity in academic English: linguistic change in writing. Cambridge University Press, Cambridge

Bornmann L, Haunschild R, Adams J (2018) Do altmetrics assess societal impact in the same way as case studies? An empirical analysis testing the convergent validity of altmetrics based on data from the UK Research Excellence Framework (REF). J Informetr 13(1):325–340

Article   Google Scholar  

Bozeman B, Youtie J (2017) Socio-economic impacts and public value of government-funded research: lessons from four US National Science Foundation initiatives. Res Policy 46(8):1387–1398

Braun V, Clarke V (2006) Using thematic analysis in psychology. Quale Res Psychol 3(2):77–101

Brezina V (2018) Statistics in corpus linguistics: a practical guide. Cambridge University Press, Cambridge

Book   Google Scholar  

Chubb J (2017) Instrumentalism and epistemic responsibility: researchers and the impact agenda in the UK and Australia. University of York

Chubb J, Watermeyer R (2017) Artifice or integrity in the marketization of research impact? Investigating the moral economy of (pathways to) impact statements within research funding proposals in the UK and Australia. Stud High Educ 42(2):2360–2372

Chubb J, Reed MS (2017) Epistemic responsibility as an edifying force in academic research: investigating the moral challenges and opportunities of an impact agenda in the UK and Australia. Palgrave Commun 3:20

Chubb J, Reed MS (2018) The politics of research impact: academic perceptions of the implications for research funding, motivation and quality. Br Politics 13(3):295–311

Clark WC et al. (2016) Crafting usable knowledge for sustainable development. Proc Natl Acad Sci USA 113(17):4570–4578

Article   ADS   CAS   PubMed   Google Scholar  

Coleman I (2019) The evolution of impact support in UK universities. Cactus Communications Pvt. Ltd

Derrick G (2018) The evaluators’ eye: impact assessment and academic peer review. Palgrave Macmillan

Derrick G (2019) Cultural impact of the impact agenda: implications for social sciences and humanities (SSH) research. In: Bueno D et al. (eds.), Higher education in the world, vol. 7. Humanities and higher education: synergies between science, technology and humanities. Global University Network for Innovation (GUNi)

Derrick G et al. (2018) Towards characterising negative impact: introducing Grimpact. In: Proceedings of the 23rd international conference on Science and Technology Indicators (STI 2018). Centre for Science and Technology Studies (CWTS), Leiden, The Netherlands

Esfandiari R, Barbary F (2017) A contrastive corpus-driven study of lexical bundles between English writers and Persian writers in psychology research articles. J Engl Academic Purp 29:21–42

Goodman CP (2003) The tacit dimension. Polanyiana 2(1):133–157

Graesser AC, McNamara DS, Kulikowich J (2011) Coh-Metrix: providing multi-level analyses of text characteristics. Educ Res 40:223–234

Haberlandt KF, Graesser AC (1985) Component processes in text comprehension and some of their interactions. J Exp Psychol: Gen 114(3):357–374

Hardie A (2014) Statistical identification of keywords, lockwords and collocations as a two-step procedure. ICAME 35, Nottingham

Hartley J (2016) Is time up for the Flesch measure of reading ease? Scientometrics 107(3):1523–1526

HEFCE (2011) Assessment framework and guidance on submissions. Ref. 02.2011

Hill S (2016) Assessing (for) impact: future assessment of the societal impact of research. Palgrave Commun 2:16073

Hyland K (2002) Directives: argument and engagement in academic writing. Appl Linguist 23(2):215–238

Hyland K (2008) As can be seen: lexical bundles and disciplinary variation. Engl Specif Purp 27(1):4–21

Lemos MC (2015) Usable climate knowledge for adaptive and co-managed water governance. Curr Opin Environ Sustain 12:48–52

MacDonald R (2017) “Impact”, research and slaying Zombies: the pressures and possibilities of the REF. Int J Sociol Soc Policy 37(11–12):696–710

Mårtensson P et al. (2016) Evaluating research: a multidisciplinary approach to assessing research practice and quality. Res Policy 45(3):593–603

McEnery T, Xiao R, Tono Y (2006) Corpus-based language studies: an advanced resource book. Routledge, Abingdon

McNamara DS et al. (2014) Automated evaluation of text and discourse with Coh-Metrix. Cambridge University Press, New York, NY

National Science Foundation (2014) Perspectives on broader impacts

Nesta (2018) Seven principles for public engagement in research and innovation policymaking. https://www.nesta.org.uk/documents/955/Seven_principles_HlLwdow.pdf . Last Accessed 12 Dec 2019

Oancea A (2010) The BERA/UCET review of the impacts of RAE 2008 on education research in UK higher education institutions. ERA/UCET, Macclesfield

Oancea (2014) Research assessment as governance technology in the United Kingdom: findings from a survey of RAE 2008 impacts. Z Erziehungswis 17(S6):83–110

Olssen M, Peters MA (2005) Neoliberalism, higher education and the knowledge economy: from the free market to knowledge capitalism. J Educ Policy 20(3):313–345

Pidd M, Broadbent J (2015) Business and management studies in the 2014 Research Excellence Framework. Br J Manag 26:569–581

Pollitt A et al. (2016) Understanding the relative valuation of research impact: a best–worst scaling experiment of the general public and biomedical and health researchers. BMJ Open 6(8):e010916

Article   PubMed   PubMed Central   Google Scholar  

Rau H, Goggins G, Fahy F (2018) From invisibility to impact: recognising the scientific and societal relevance of interdisciplinary sustainability research. Res Policy 47(1):266–276

Ravenscroft J et al. (2017) Measuring scientific impact beyond academia: an assessment of existing impact metrics and proposed improvements. PLoS ONE 12(3):e0173152

Article   PubMed   PubMed Central   CAS   Google Scholar  

Rayson P, Garside R (2000) Comparing corpora using frequency profiling, Workshop on Comparing Corpora, held in conjunction with the 38th annual meeting of the Association for Computational Linguistics (ACL 2000), Hong Kong, pp. 1–6

Rayson P, Berridge D, Francis B (2004) Extending the Cochran rule for the comparison of word frequencies between corpora. In: Purnelle G, Fairon C, Dister A (eds.), Le poids des mots: Proceedings of the 7th international conference on statistical analysis of textual data (JADT 2004) (II). Presses universitaires de Louvain, Louvain-la-Neuve, Belgium, pp. 926–936

Reed MS (2018) The research impact handbook, 2nd edn. Fast Track Impact, Huntly, Aberdeenshire

Reed MS (2019) Book review: new book calls for civil disobedience to fight “dehumanising” impact agenda. Fast Track Impact

Reed MS et al. (under review) Evaluating research impact: a methodological framework. Res Policy

Rhoads R, Torres CA (2005) The University, State, and Market: The Political Economy of Globalization in the Americas. Stanford University Press, Stanford

Saldana J (2009) The Coding Manual for Qualitative Researchers. Sage, Thousand Oaks

Scott M (1997) PC analysis of key words—and key key words. System 25(2):233–245

Sivertsen G (2017) Unique, but still best practice? The Research Excellence Framework (REF) from an international perspective. Palgrave Commun 3:17078

Smith S, Ward V, House A (2011) ‘Impact’ in the proposals for the UK’s Research Excellence Framework: shifting the boundaries of academic autonomy. Res Policy 40(10):1369–1379

Stern LN (2016) Building on success and learning from experience: an independent review of the Research Excellence Framework

Tsey K et al. (2016) Evaluating research impact: the development of a research for impact tool. Front Public Health 4:160

Vold ET (2006) Epistemic modality markers in research articles: a cross-linguistic and cross-disciplinary study. Int J Appl Linguist 16(1):61–87

Warry P (2006) Increasing the economic impact of the Research Councils (the Warry report). Research Council UK, Swindon

Watermeyer R (2019) Competitive accountability in academic life: the struggle for social impact and public legitimacy. Edward Elgar, Cheltenham

Watermeyer R, Hedgecoe A (2016) ‘Selling ‘impact’: peer reviewer projections of what is needed and what counts in REF impact case studies. A retrospective analysis. J Educ Policy 31:651–665

Watermeyer R, Chubb J (2018) Evaluating ‘impact’ in the UK’s Research Excellence Framework (REF): liminality, looseness and new modalities of scholarly distinction. Stud Higher Educ 44(9):1–13

Weinstein N et al. (2019) The real-time REF review: a pilot study to examine the feasibility of a longitudinal evaluation of perceptions and attitudes towards REF 2021

Wilsdon J et al. (2015) Metric tide: report of the independent review of the role of metrics in research assessment and management

Yang A, Zheng S, Ge G (2015) Epistemic modality in English-medium medical research articles: a systemic functional perspective. Engl Specif Purp 38:1–10

Download references

Acknowledgements

Thanks to Dr. Adam Mearns, School of English Literature, Language & Linguistics at Newcastle University for help with statistics and wider input to research design as a co-supervisor on the Ph.D. research upon which this article is based.

Author information

Authors and affiliations.

Newcastle University, Newcastle, UK

Bella Reichard, Mark S Reed & Andrea Whittle

University of York, York, UK

University of Leeds, Leeds, UK

Northumbria University, Newcastle, UK

Lucy Jowett & Alisha Peart

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Mark S Reed .

Ethics declarations

Competing interests.

MR is CEO of Fast Track Impact Ltd, providing impact training to researchers internationally. JC worked with Research England as part of the Real-Time REF Review in parallel with the writing of this article. BR offers consultancy services reviewing REF impact case studies.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Reichard, B., Reed, M.S., Chubb, J. et al. Writing impact case studies: a comparative study of high-scoring and low-scoring case studies from REF2014. Palgrave Commun 6 , 31 (2020). https://doi.org/10.1057/s41599-020-0394-7

Download citation

Received : 10 July 2019

Accepted : 09 January 2020

Published : 25 February 2020

DOI : https://doi.org/10.1057/s41599-020-0394-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

comparative multi case study

COMMENTS

  1. Comparative Case Studies: Methodological Discussion - Springer

    Comparative Case Studies have been suggested as providing effective tools to understanding policy and practice along three different axes of social scientific research, namely horizontal (spaces), vertical (scales), and transversal (time).

  2. Comparative Case Studies: An Innovative Approach - ResearchGate

    What is a case study and what is it good for? In this article, we argue for a new approach—the comparative case study approach—that attends simultaneously to macro, meso, and micro dimensions...

  3. Multiple Case Research Design | SpringerLink

    In this section, we specifically address the elements that make a multiple case a discrete research design. Next to the characteristics of multiple case research, we address the main issues and decisions to be made within this research design, and the major pitfalls.

  4. Single case studies vs. multiple case studies: A comparative ...

    This study attempts to answer when to write a single case study and when to write a multiple case study. It will further answer the benefits and disadvantages with the different types. The literature review, which is based on secondary sources, is about case studies.

  5. WP2 Conceptual framework for comparative multiple case study ...

    This document fulfils delivery 2.1: Literature research on multi-case study analysis, and covers the theoretical concepts behind the criteria for multi-case study analysis. Those criteria are presented in delivery 2.2 as the comparative multiple-case design, which is the methodological framework developed in task 2.2. Thus deliveries 2.1 and 2. ...

  6. 9 Multiple Case Research Design - Springer

    Introducing a multiple case research means to clarify the focus of the research and its rea-soning. Often a multiple case research design acts as a bridge between single case research and cross-sectional (or longitudinal) research. Their initial set-ups or starting points are rather clearly defined.

  7. Comparative Case Studies - Redalyc

    We provide details about the key ideas that undergird our comparative case study approach, which include: focusing on the processes through which events unfold; reconceptualizing culture and context; a critical approach to power relations; and a revised understanding of the value of comparison (for a fuller treatment of these themes, see Bartlet...

  8. Comparative Case Studies: Methodological Discussion

    Comparative Case Studies have been suggested as providing effective tools to understanding policy and practice along three different axes of social scientific research, namely horizontal...

  9. 7 - Selecting Cases for Comparative Sequential Analysis

    It is after picking an initial case that the method of inductive case selection contributes novel guidelines for case study researchers by reconfiguring how Millian methods are used. Namely, how should one (or more) additional cases be selected for comparison, and why?

  10. Writing impact case studies: a comparative study of high ...

    High-scoring case studies contained attributional phrases which were more likely to attribute research and/or pathways to impact, and they were written more coherently (containing more explicit...