To read this content please select one of the options below:

Please note you do not have access to teaching notes, sample size for qualitative research.

Qualitative Market Research

ISSN : 1352-2752

Article publication date: 12 September 2016

Qualitative researchers have been criticised for not justifying sample size decisions in their research. This short paper addresses the issue of which sample sizes are appropriate and valid within different approaches to qualitative research.

Design/methodology/approach

The sparse literature on sample sizes in qualitative research is reviewed and discussed. This examination is informed by the personal experience of the author in terms of assessing, as an editor, reviewer comments as they relate to sample size in qualitative research. Also, the discussion is informed by the author’s own experience of undertaking commercial and academic qualitative research over the last 31 years.

In qualitative research, the determination of sample size is contextual and partially dependent upon the scientific paradigm under which investigation is taking place. For example, qualitative research which is oriented towards positivism, will require larger samples than in-depth qualitative research does, so that a representative picture of the whole population under review can be gained. Nonetheless, the paper also concludes that sample sizes involving one single case can be highly informative and meaningful as demonstrated in examples from management and medical research. Unique examples of research using a single sample or case but involving new areas or findings that are potentially highly relevant, can be worthy of publication. Theoretical saturation can also be useful as a guide in designing qualitative research, with practical research illustrating that samples of 12 may be cases where data saturation occurs among a relatively homogeneous population.

Practical implications

Sample sizes as low as one can be justified. Researchers and reviewers may find the discussion in this paper to be a useful guide to determining and critiquing sample size in qualitative research.

Originality/value

Sample size in qualitative research is always mentioned by reviewers of qualitative papers but discussion tends to be simplistic and relatively uninformed. The current paper draws attention to how sample sizes, at both ends of the size continuum, can be justified by researchers. This will also aid reviewers in their making of comments about the appropriateness of sample sizes in qualitative research.

  • Qualitative research
  • Qualitative methodology
  • Case studies
  • Sample size

Boddy, C.R. (2016), "Sample size for qualitative research", Qualitative Market Research , Vol. 19 No. 4, pp. 426-432. https://doi.org/10.1108/QMR-06-2016-0053

Emerald Group Publishing Limited

Copyright © 2016, Emerald Group Publishing Limited

Related articles

All feedback is valuable.

Please share your general feedback

Report an issue or find answers to frequently asked questions

Contact Customer Support

Sample Size Policy for Qualitative Studies Using In-Depth Interviews

  • Published: 12 September 2012
  • Volume 41 , pages 1319–1320, ( 2012 )

Cite this article

sample size of qualitative research

  • Shari L. Dworkin 1  

299k Accesses

566 Citations

27 Altmetric

Explore all metrics

Avoid common mistakes on your manuscript.

In recent years, there has been an increase in submissions to the Journal that draw on qualitative research methods. This increase is welcome and indicates not only the interdisciplinarity embraced by the Journal (Zucker, 2002 ) but also its commitment to a wide array of methodologies.

For those who do select qualitative methods and use grounded theory and in-depth interviews in particular, there appear to be a lot of questions that authors have had recently about how to write a rigorous Method section. This topic will be addressed in a subsequent Editorial. At this time, however, the most common question we receive is: “How large does my sample size have to be?” and hence I would like to take this opportunity to answer this question by discussing relevant debates and then the policy of the Archives of Sexual Behavior . Footnote 1

The sample size used in qualitative research methods is often smaller than that used in quantitative research methods. This is because qualitative research methods are often concerned with garnering an in-depth understanding of a phenomenon or are focused on meaning (and heterogeneities in meaning )—which are often centered on the how and why of a particular issue, process, situation, subculture, scene or set of social interactions. In-depth interview work is not as concerned with making generalizations to a larger population of interest and does not tend to rely on hypothesis testing but rather is more inductive and emergent in its process. As such, the aim of grounded theory and in-depth interviews is to create “categories from the data and then to analyze relationships between categories” while attending to how the “lived experience” of research participants can be understood (Charmaz, 1990 , p. 1162).

There are several debates concerning what sample size is the right size for such endeavors. Most scholars argue that the concept of saturation is the most important factor to think about when mulling over sample size decisions in qualitative research (Mason, 2010 ). Saturation is defined by many as the point at which the data collection process no longer offers any new or relevant data. Another way to state this is that conceptual categories in a research project can be considered saturated “when gathering fresh data no longer sparks new theoretical insights, nor reveals new properties of your core theoretical categories” (Charmaz, 2006 , p. 113). Saturation depends on many factors and not all of them are under the researcher’s control. Some of these include: How homogenous or heterogeneous is the population being studied? What are the selection criteria? How much money is in the budget to carry out the study? Are there key stratifiers (e.g., conceptual, demographic) that are critical for an in-depth understanding of the topic being examined? What is the timeline that the researcher faces? How experienced is the researcher in being able to even determine when she or he has actually reached saturation (Charmaz, 2006 )? Is the author carrying out theoretical sampling and is, therefore, concerned with ensuring depth on relevant concepts and examining a range of concepts and characteristics that are deemed critical for emergent findings (Glaser & Strauss, 1967 ; Strauss & Corbin, 1994 , 2007 )?

While some experts in qualitative research avoid the topic of “how many” interviews “are enough,” there is indeed variability in what is suggested as a minimum. An extremely large number of articles, book chapters, and books recommend guidance and suggest anywhere from 5 to 50 participants as adequate. All of these pieces of work engage in nuanced debates when responding to the question of “how many” and frequently respond with a vague (and, actually, reasonable) “it depends.” Numerous factors are said to be important, including “the quality of data, the scope of the study, the nature of the topic, the amount of useful information obtained from each participant, the use of shadowed data, and the qualitative method and study designed used” (Morse, 2000 , p. 1). Others argue that the “how many” question can be the wrong question and that the rigor of the method “depends upon developing the range of relevant conceptual categories, saturating (filling, supporting, and providing repeated evidence for) those categories,” and fully explaining the data (Charmaz, 1990 ). Indeed, there have been countless conferences and conference sessions on these debates, reports written, and myriad publications are available as well (for a compilation of debates, see Baker & Edwards, 2012 ).

Taking all of these perspectives into account, the Archives of Sexual Behavior is putting forward a policy for authors in order to have more clarity on what is expected in terms of sample size for studies drawing on grounded theory and in-depth interviews. The policy of the Archives of Sexual Behavior will be that it adheres to the recommendation that 25–30 participants is the minimum sample size required to reach saturation and redundancy in grounded theory studies that use in-depth interviews. This number is considered adequate for publications in journals because it (1) may allow for thorough examination of the characteristics that address the research questions and to distinguish conceptual categories of interest, (2) maximizes the possibility that enough data have been collected to clarify relationships between conceptual categories and identify variation in processes, and (3) maximizes the chances that negative cases and hypothetical negative cases have been explored in the data (Charmaz, 2006 ; Morse, 1994 , 1995 ).

The Journal does not want to paradoxically and rigidly quantify sample size when the endeavor at hand is qualitative in nature and the debates on this matter are complex. However, we are providing this practical guidance. We want to ensure that more of our submissions have an adequate sample size so as to get closer to reaching the goal of saturation and redundancy across relevant characteristics and concepts. The current recommendation that is being put forward does not include any comment on other qualitative methodologies, such as content and textual analysis, participant observation, focus groups, case studies, clinical cases or mixed quantitative–qualitative methods. The current recommendation also does not apply to phenomenological studies or life history approaches. The current guidance is intended to offer one clear and consistent standard for research projects that use grounded theory and draw on in-depth interviews.

Editor’s note: Dr. Dworkin is an Associate Editor of the Journal and is responsible for qualitative submissions.

Baker, S. E., & Edwards, R. (2012). How many qualitative interviews is enough? National Center for Research Methods. Available at: http://eprints.ncrm.ac.uk/2273/ .

Charmaz, K. (1990). ‘Discovering’ chronic illness: Using grounded theory. Social Science and Medicine, 30 , 1161–1172.

Article   PubMed   Google Scholar  

Charmaz, K. (2006). Constructing grounded theory: A practical guide through qualitative analysis . London: Sage Publications.

Google Scholar  

Glaser, B. G., & Strauss, A. L. (1967). The discovery of grounded theory: Strategies for qualitative research . Chicago: Aldine Publishing Co.

Mason, M. (2010). Sample size and saturation in PhD studies using qualitative interviews. Forum: Qualitative Social Research, 11 (3) [Article No. 8].

Morse, J. M. (1994). Designing funded qualitative research. In N. Denzin & Y. Lincoln (Eds.), Handbook of qualitative research (pp. 220–235). Thousand Oaks, CA: Sage Publications.

Morse, J. M. (1995). The significance of saturation. Qualitative Health Research, 5 , 147–149.

Article   Google Scholar  

Morse, J. M. (2000). Determining sample size. Qualitative Health Research, 10 , 3–5.

Strauss, A. L., & Corbin, J. M. (1994). Grounded theory methodology. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of qualitative research (pp. 273–285). Thousand Oaks, CA: Sage Publications.

Strauss, A. L., & Corbin, J. M. (2007). Basics of qualitative research: Techniques and procedures for developing grounded theory . Thousand Oaks, CA: Sage Publications.

Zucker, K. J. (2002). From the Editor’s desk: Receiving the torch in the era of sexology’s renaissance. Archives of Sexual Behavior, 31 , 1–6.

Download references

Author information

Authors and affiliations.

Department of Social and Behavioral Sciences, University of California at San Francisco, 3333 California St., LHTS #455, San Francisco, CA, 94118, USA

Shari L. Dworkin

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Shari L. Dworkin .

Rights and permissions

Reprints and permissions

About this article

Dworkin, S.L. Sample Size Policy for Qualitative Studies Using In-Depth Interviews. Arch Sex Behav 41 , 1319–1320 (2012). https://doi.org/10.1007/s10508-012-0016-6

Download citation

Published : 12 September 2012

Issue Date : December 2012

DOI : https://doi.org/10.1007/s10508-012-0016-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Find a journal
  • Publish with us
  • Track your research

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

(I Can’t Get No) Saturation: A simulation and guidelines for sample sizes in qualitative research

Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliations Innovation Studies, Copernicus Institute of Sustainable Development, Utrecht University, Utrecht, The Netherlands, INGENIO (CSIC-UPV), Universitat Politècnica de València, Valencia, Spain

ORCID logo

  • Frank J. van Rijnsoever

PLOS

  • Published: July 26, 2017
  • https://doi.org/10.1371/journal.pone.0181689
  • Reader Comments

Table 1

I explore the sample size in qualitative research that is required to reach theoretical saturation . I conceptualize a population as consisting of sub-populations that contain different types of information sources that hold a number of codes. Theoretical saturation is reached after all the codes in the population have been observed once in the sample. I delineate three different scenarios to sample information sources: “random chance,” which is based on probability sampling, “minimal information,” which yields at least one new code per sampling step, and “maximum information,” which yields the largest number of new codes per sampling step. Next, I use simulations to assess the minimum sample size for each scenario for systematically varying hypothetical populations. I show that theoretical saturation is more dependent on the mean probability of observing codes than on the number of codes in a population. Moreover, the minimal and maximal information scenarios are significantly more efficient than random chance, but yield fewer repetitions per code to validate the findings. I formulate guidelines for purposive sampling and recommend that researchers follow a minimum information scenario.

Citation: van Rijnsoever FJ (2017) (I Can’t Get No) Saturation: A simulation and guidelines for sample sizes in qualitative research. PLoS ONE 12(7): e0181689. https://doi.org/10.1371/journal.pone.0181689

Editor: Gemma Elizabeth Derrick, Lancaster University, UNITED KINGDOM

Received: February 20, 2017; Accepted: July 4, 2017; Published: July 26, 2017

Copyright: © 2017 Frank J. van Rijnsoever. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: The author(s) received no specific funding for this work.

Competing interests: The author has declared that no competing interests exist.

Introduction

Qualitative research is becoming an increasingly prominent way to conduct scientific research in business, management, and organization studies [ 1 ]. In the first decade of the twenty-first century, more qualitative research has been published in top American management journals than in the preceding 20 years [ 2 ]. Qualitative research is seen as crucial in the process of building new theories [ 2 – 4 ] and it allows researchers to describe how change processes unfold over time [ 5 , 6 ]. Moreover, it gives close-up and in-depth insights into various organizational phenomena [ 7 , 8 ] perspectives and motivations for actions [ 1 , 8 ]. However, despite the explicit attention of journal editors to what qualitative research is and how it could or should be conducted [ 8 – 10 ], it is not always transparent how particular research was actually conducted [ 2 , 11 ]. A typical topic of debate is what the size of a sample should be for inductive qualitative research to be credible and dependable [ 9 , 12 ] (Note that in this paper, I refer to qualitative research in an inductive context. I recognize that there are more deductive-oriented forms of qualitative research).

A general statement from inductive qualitative research about sample size is that the data collection and analysis should continue until the point at which no new codes or concepts emerge [ 13 , 14 ]. This does not only mean that no new stories emerge, but also that no new codes that signify new properties of uncovered patterns emerge [ 15 ]. At this point, “theoretical saturation” is reached; all the relevant information that is needed to gain complete insights into a topic has been found [ 1 , 13 ]. (Note that to prevent confusion, I use the term ‘code’ in this article to refer to information uncovered in qualitative research. I reserve the term ‘concept’ to refer to the concepts in the theoretical framework).

Most qualitative researchers who aim for theoretical saturation do not rely on probability sampling. Rather, the sampling procedure is purposive [ 14 , 16 ]. It aims “to select information-rich cases whose study will illuminate the questions under study” [ 12 ]. The researcher decides which cases to include in the sample based on prior information like theory or insights gained during the data collection.

However, the minimum size of a purposive sample needed to reach theoretical saturation is difficult to estimate [ 9 , 17 – 22 ].

There are two reasons why the minimum size of a purposive sample deserves more attention. First, theoretical saturation seems to call for a “more is better” sampling approach, as this minimizes the chances of codes being missed. However, the coding process in qualitative research is laborious and time consuming. As such, especially researchers with scarce resources do not want to oversample too much. Some scholars give tentative indications of sample sizes that often lie between 20 and 30 and are usually below 50 [ 23 , 24 ], but the theoretical mechanism on which these estimates are based is unknown.

Second, most research argues that determining whether theoretical saturation has been reached remains at the discretion of the researcher, who uses her or his own judgment and experience [ 9 , 22 , 25 , 26 ]. Patton [ 12 ] even states that “there are no rules for sample size in qualitative inquiry” (p. 184). As such, the guidelines for judging the sample size are often implicit. The reason for this is that most qualitative research is largely an interpretivistic endeavor [ 27 ] that requires flexible creative thinking, experience, and tacit knowledge [ 9 ]. However, researchers from the field of management [ 8 , 11 , 28 ], information sciences [ 24 , 29 ], health [ 30 , 31 ] and the social sciences in general [ 12 , 13 , 32 , 33 ], acknowledge the need for transparency in the process of qualitative research. Moreover, not all researchers have the required experience to assess intuitively whether theoretical saturation has been reached. For them, articulating the assessment criteria in a set of guidelines can be helpful [ 33 ].

In this paper I explore the sample size that is required to reach theoretical saturation in various scenarios and I use these insights to formulate guidelines about purposive sampling. Following a simulation approach, I assess experimentally the effects of different population parameters on the minimum sample size. I first generate a series of systematically varying hypothetical populations. For each population, I assess the minimum sample sizes required to reach theoretical saturation for three different sampling scenarios: “random chance,” which is based on probability sampling, “minimal information,” which yields at least one new code per sampling step, and “maximum information,” which yields the largest number of new codes per sampling step. The latter two are purposive sampling scenarios.

The results demonstrate that theoretical saturation is more dependent on the mean probability of observing codes than on the number of codes in a population. Moreover, when the mean probability of observing codes is low, the minimal information and maximum information scenarios are much more efficient in reaching theoretical saturation than the random chance scenario. However, the purposive scenarios yield significantly fewer multiple observations per code that can be used to validate the findings.

By using simulations, this study adds to earlier studies that base their sample size estimates on empirical data [ 16 , 17 ], or own experience [ 22 ]. Simulating the factors that influence the minimum purposive sample size gives these estimates a theoretical basis [ 34 ]. Moreover, the simulations show that the earlier empirical estimates for theoretical saturation are reasonable under most purposive sampling conditions. To my knowledge, there is one earlier study that uses simulations to predict minimum sample size in qualitative research based on random sampling [ 35 ]. The present study extends this work by taking into account the process of purposive sampling, using different sampling scenarios.

Based on my analyses, I offer a set of guidelines that researchers can use to estimate whether theoretical saturation has been reached. These guidelines help to make more informed choices for sampling and add to the transparency of the research, but are by no means intended as mechanistic rules that reduce the flexibility of the researcher [ 10 ].

In section 2, I discuss the theoretical concepts about purposive sampling. Section 3 describes the simulation, and the results are presented in section 4. In section 5, I draw conclusions, discuss the limitations, and offer recommendations.

Theoretical concepts

I base this section largely on the existing literature on purposive sampling. I also introduce some new ideas that are sometimes implied by the literature, but that were never conceptualized. Table 1 summarizes the main concepts in this paper, and the symbols used to denote them.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0181689.t001

Populations, information sources, and sampling steps

A population is the “universe of units of analysis” from which a sample can be drawn [ 36 ]. However, in qualitative research, the unit of analysis does not have to be the same as the unit from which information is gathered. I call the latter “information sources.” In the context of interviews, information sources are often referred to as informants [ 16 , 37 ], but they can be any source that informs the researcher: other examples are sites to collect observational data, existing documents, or archival data. I refer to the total set of information sources that are potentially relevant to answering the research question as the population.

From this population, one or multiple information sources are sampled as part of an iterative process that includes data collection, analysis, and interpretation. At each iteration the researcher has the opportunity to adjust the sampling procedure and to select a new information source to be sampled. I assume in this paper that at each iteration only one source is sampled; this assumption has no further consequences for the remainder of the paper. Moreover, I use the term “sampling steps” rather than iterations, as this excludes analysis and interpretation. Finally, contrarily to formal quantitative sampling terminology, I count as sampling steps only observations that participated in the research, thus excluding non-response or the inability to access sources. This eases interpretation.

Sub-populations

A population of information sources is usually not homogeneous. Multiple sub-populations can often be distinguished, for example the difference between interviewees, documents, or focus groups. This is important as the researcher can choose different sampling procedures and data collection methods for each sub-population. The exact delineation of sub-populations depends on the judgment of the researcher. However, I argue there are a number of restrictions on the delineation of sub-populations.

  • First, if there are differences in the type of information source, sampling strategy, type of data, data collection, or methods of analysis, then there are sub-populations. The reason for this criterion is that different methods are needed. These different methods need to be accounted for [ 32 ] as they can explain differences in outcomes.
  • Second, information sources should be interchangeable at the sub-population level. Within a sub-population, no single information source may be critical for reaching theoretical saturation. Hence, no single information source in a sub-population can contain information that is not found in other information sources in that sub-population. The reason for this criterion is that if a particular information source is critical for theoretical saturation, it should by definition be included in the research. Observing critical information is not guaranteed if the inclusion is dependent on a particular sampling strategy. A critical information source should then be treated as a separate sub-population of size one.
  • Second, if cases or groups are compared, it is important to treat these as sub-populations. For example, distinguishing between sub-populations is a condition for data triangulation, because the researcher effectively compares the results from one sub-population (for example interviews with managers) with the results from another (for example annual reports). Furthermore, comparative case studies [ 4 , 38 ] involve the comparison of sub-populations.

The concept of sub-populations implies that theoretical saturation can be reached at the level of the overall population or at the level of the sub-population. Reaching theoretical saturation in all the sub-populations is not a condition for reaching theoretical saturation at the level of the population, since sub-populations can have an overlap in information. However, it is necessary to reach theoretical saturation in each sub-population in comparative research or when triangulating results, as this is the only way to make a valid comparison.

Codes and theoretical saturation

In most cases of inductive qualitative research, information is extracted from information sources, interpreted and translated into codes. I refer to codes here in the context of inductive qualitative data analysis, which means that they can be seen as “tags” or “labels” on unique pieces of information [ 13 ]. Codes can represent any sort of information and may be related to each other (for example phenomena, explanations or contextualization). The only conditions that I impose are that each code represents only one piece of information and that two different codes are not allowed to represent the same information. In practice, this means that synonyms are removed during qualitative data analysis. Thereby, codes can be interpreted as unique “bits” of information.

The population contains all the codes that can be potentially observed. At the start of a study, the codes in the population are unobserved and the exact number of codes in the population is unknown. Consulting information sources sampled from the population allows codes to become observed. Theoretical saturation is reached when each code in the population has been observed at least once.

Number of codes and mean probability of observing codes

I let, the number of sampling steps required to reach theoretical saturation depend on two population characteristics. First, the larger the number of codes distinguished in the population, the more sampling steps are required to observe them all. The number of codes can vary greatly per study, depending on the complexity of the research question, and the amount of theory in the literature. A number of 100 is common. Second, the more often a code is present in the population, the larger are the chances that it will become observed. As theoretical saturation takes place at the population level, the distribution of codes in the population is important. For example, interviews can vary in length or some documents can contain more relevant information than others. In general, one would expect that the higher the “mean probability of observing codes” in a population is, the fewer sampling steps are required to reach theoretical saturation. By definition, these probabilities vary between 0 and 1. A mean probability of observing codes of 0.5 means that, on average, a code is observed at 50% of the information sources.

Purposive sampling allows the researcher to make an informed estimation about the probability of observing a given code at each sampling step, using (theoretical) prior information, like sampling frames [ 39 ] or insights gained during the data analysis. (This conceptualizing of purposive sampling is also consistent with the notion of theoretical sampling. Both terms are often used interchangeably. Theoretical sampling can be seen as a special case of purposive sampling [ 14 ]). However, when the number of codes is large, it is easier simply to estimate the mean probability of observing all the codes in the population. To make such estimations, it is important to consider what the probability of observing codes actually represents. The probability of a code being present at least depends on: the likelihood of an information source actually containing the code, the willingness and ability of the source (or its authors) to let the code be uncovered, and the ability of the researcher to observe the code. These probability estimations are based on the characteristics of the information source and the researcher. The probability of observing a certain code can decrease when the information source (for example an interviewee) has strategic reasons not to share information. The strategic behavior of actors can also lead to the discovery of other additional codes about the motivations and the actions of these actors. The relevance of these codes depends on the research question. In addition, if the researcher has less experience with the technique used to uncover codes from a source or with correctly interpreting information during the data analysis, the probability of observing codes decreases. Having multiple independent coders, on the other hand, can increase the probability of observing a code.

Repetitive codes

Some researchers consider codes that are observed more than once as redundant, since they do not add new information to the data [ 12 , 26 , 32 ]. I refer to codes that are observed more than once more neutrally as “repetitive codes.” Repetitive codes are important for a methodological purpose: they can help guard against misinformation. That is, information sources may have given false codes, for reasons of social desirability, strategy, or accidental errors.

To guard against misinformation and to enhance the credibility of the research, it can be advisable to aim for a sample in which each code is observed multiple times (this also follows from the logic behind triangulation). One could argue that if a code, after a substantial number of sampling steps, is still observed only once while almost all other codes have a higher incidence, a critical examination of the code is warranted. In many cases, the researcher may already be suspicious of such a code during the analysis. A frequency of one does not mean that the code is wrong by definition; it is possible that the code is just rare or that the low frequency is just a coincidence. However, it is relatively easy to make an argumentative judgment about the plausibility of rare codes (for example based on theory).

Sampling strategies, sampling scenarios, and efficiency

A sampling strategy describes how the researcher selects the information sources. The most elaborate inventory of sampling strategies comes from Patton [ 12 ], who identifies 15 purposive sampling strategies for qualitative research. Examples include “maximum variation sampling,” “typical case sampling,” and “snowball sampling”. These strategies are based strongly on research practices, but the underlying theoretical criteria for distinguishing between the strategies are left implicit. For example, a criterion that can explain the difference between “maximum variation sampling,” “typical case sampling,” and “extreme case sampling” is the focus of the research question. “Snowball sampling” and “opportunistic sampling” differ in the way in which they obtain information about the next information source that is to be sampled. “Confirming or disconfirming sampling” and “including politically sensitive cases” as strategies are motivated by a delineation of the population. Overall, Patton [ 12 ] acknowledges that purposive sampling in qualitative research can be a mixture of the strategies identified and that some of these strategies overlap. These strategies also make implicit assumptions regarding the prior knowledge of the researcher about the population. For example, “extreme case sampling” implicitly assumes that the researcher has knowledge about the full population; otherwise, he or she would be not be able to identify the extreme cases. “Snowball sampling” assumes that the researcher does not have full knowledge of the population, as relevant leads are only identified at each sampling step.

I use the concepts described above to formulate three generic sampling scenarios. I refer to sampling scenarios to avoid confusion with the sampling strategies. The term scenarios term signifies that they are based on theoretical notions, instead of empirical data or observed practices. The three sampling scenarios are based on the number of newly observed codes that a sampled information source adds. This criterion is motivated by the premise of purposive sampling: based on the expected information, the researcher makes an informed decision about the next information source to be sampled at each sampling step. This informed decision implies that the researcher can thus reasonably foresee whether, and perhaps how many, new codes will be observed at the next sampling step. The fewer sampling steps that a scenario requires to reach theoretical saturation, the more efficient it is.

The three scenarios that I identify are “random chance,” “minimal information,” and “maximal information.”

  • Random chance assumes that the researcher does not use prior information during each sampling step. The researcher randomly samples an information source from the population and adds it to the sample. This scenario is solely based on probability and is considered to be inappropriate for most qualitative studies [ 14 , 16 ]. However, there are good reasons to include this scenario. First, there are conditions under which random chance is an appropriate scenario for sampling. One of these is when no information is gained about the population during the sampling steps, such as when documents or websites are analyzed. Second, random chance can be seen as a worst-case scenario. If a researcher is uncertain about how a sampling process actually worked, it is always possible to explore whether theoretical saturation would have been reached under the conservative conditions of random chance. Third, random chance is the only scenario for which the number of sampling steps can be calculated mathematically. Finally, the random chance scenario can serve as a benchmark to which the number of sampling steps in the other scenarios can be compared.
  • Minimal information is a purposive scenario that works in the same way as random chance, but adds as extra condition that at least one new code must be observed at each sampling step. This is equivalent to a situation in which the researcher actively seeks information sources that reveal new codes, for example by making enquiries about the source beforehand. It is not uncommon for a researcher to discuss topics with a potential interviewee prior to the actual interview to assess whether the interview will be worthwhile. The minimal information scenario captures these kinds of enquiries. Similarly, researchers may be referred to a next source that adds new codes as part of a snowball strategy. Overall, the criterion of observing at least one new code per sampling step seems to be relatively easy to achieve as long as the researcher has some information about the population at each step. This makes the scenario broadly applicable and more efficient than random chance.
  • Maximal information is a purposive scenario that assumes that the researcher has almost full knowledge of the codes that exist in the population and the information sources in which they are present. At each sampling step, an information source is added to the sample that leads to the largest possible increase in observed codes. This scenario is in line with the theoretical aim of purposive sampling. However, it does not reflect scenarios where populations' sizes are unknown and too large. It makes large assumptions regarding the prior knowledge of the researcher about the population. An example of when this scenario might be realistic occurs when the researcher is extremely familiar with the field and the specific setting that he or she is investigating.

I use simulations as they allow me to assess the effects of the three scenarios for a series of hypothetical populations that vary systematically regarding (1) the number of codes in the population and (2) the mean probability of observing codes. The controlled setting allows me to assess the relative influence of each of these factors on the reaching of theoretical saturation. In an empirical setting, this would not be possible, because the researcher can generally not control the characteristics of the population under study, because the number of populations that can be studied is limited and because it is never entirely certain whether theoretical saturation has been reached [ 27 ].

To keep the paper readable for audiences with either a quantitative or qualitative background, I minimize the mathematical details in the main text as much as possible. The full technical details of the simulation are in S1 Appendix , which can be read instead of sections 3.1 and 3.2. To relate sections 3.1 and 3.2 to S1 Appendix , I assign symbols to the most important concepts in the main text, and refer to the appropriate sections of S1 Appendix .

Definitions

sample size of qualitative research

Simulation of scenarios

Using the R-program [ 40 ], I generate 1100 hypothetical populations of 5000 information sources. The populations vary systematically by the number of codes (k) from 1 to 101 with increments of 10. I let the mean probability of observing codes vary between 0.09 (1/11) and 0.91 (10/11) (see S1 Appendix Section F: Simulation). Further, in line with my earlier argument about interchangeability of information sources, I impose a condition whereby each code should actually be present in at least two information sources in the population.

For each hypothetical population, I simulate the number of sampling steps necessary to reach theoretical saturation under the three scenarios from section 2.5. Fig 1 gives a schematic overview of how the algorithms for each scenario operate. The full R-code is available as S1 File : R-code for the simulations, the resulting data is available as S2 File : Simulated data.

thumbnail

https://doi.org/10.1371/journal.pone.0181689.g001

All three scenarios operate in a similar manner. After generating a population, an information source is selected:

  • Random chance selects information sources based on probability.
  • Minimal information works in the same way as random chance, but adds as extra condition that at least one new code must be observed per sampling step. Otherwise the information source is discarded, and does not count towards the number of sampling steps.
  • Maximal information first identifies a set of information sources that contain the largest number of unobserved codes. From this set, which often consists of a small number of information sources, it randomly selects an information source.

If the source has not been selected before, it is added to the sample. After each sampling step, the model evaluates if theoretical saturation is reached. If so, the process stops and the number of sampling steps n s is reported. Otherwise, the next sampling step takes place and a new information source is selected from the population.

As there are multiple combinations of information sources that allow reaching theoretical saturation per population, I apply each of the three sampling scenarios to each population 500 times. This produces a distribution for each scenario with values of the number of sampling steps to reach theoretical saturation. From this distribution, I report the value that leads to theoretical saturation in 95% of the 500 simulations of a population as main outcome. The value of 95% is in line with statistical conventions, and makes my results more robust.

sample size of qualitative research

Note that the y-axis is logarithmic. The solid black line indicates the calculated random chance’s value of n based on F11 . The blue dots represent random chance, the green diamonds represent minimal information, and the red triangles represent maximal information.

https://doi.org/10.1371/journal.pone.0181689.g002

Fig 2 shows that in the random chance scenario, a low mean probability of observing codes leads to over 4000 sampling steps to reach theoretical saturation, regardless of the number of codes. As the mean probability of observing codes increases, the number of sampling steps declines rapidly with a decreasing trend to below 10 for all number of codes. This implies that mean probability of observing codes is more important than the number of codes for reaching theoretical saturation. The figure also shows that both purposive scenarios are more efficient than random chance. For a low mean probability of observing codes, the differences between scenarios are the largest. With the random chance scenario, 101 codes in the population and a mean probability of observing codes that is smaller 0.1, it generally requires more than 1000 sampling steps to reach theoretical saturation in 95% of the cases. Under the same conditions, this number is reduced to about 46 information sources in the minimal information scenario and to about 20 in the maximal information scenario. As the mean probability of observing codes becomes larger, the random chance and minimal information scenarios require about the same number of sampling steps for theoretical saturation, while the maximal information scenario requires less. Notable is that the numbers of both purposive scenarios fall within the range of common indications of sample size from the literature (below 50). Our result confirms that this indication is not far from accurate. Finally, in the maximal information scenario, the number of sampling steps for reaching theoretical saturation has little variance for different values of the mean probability of observing codes. This is because the high level of efficiency gives little room for variation.

sample size of qualitative research

The blue dots represent random chance, the green diamonds represent minimal information, and the red triangles represent maximal information.

https://doi.org/10.1371/journal.pone.0181689.g003

In line with the result above, the mean probability of observing codes has a greater influence than the number of codes on the mean number of observations per code in the random chance scenario. Second, the random chance scenario gives the largest number of repetitive codes (over 400) at a low mean probabilities of observing codes. This is explained by the fact that this scenario has the most sampling steps on average. However, for higher mean probabilities of observing codes, the random chance scenario yields about the same number of codes as minimal information, which is between 3 and 5. Finally, the maximal information scenario only yields between 1 and 3 observations per code. This low number of codes makes the use of repetitive codes for maximal information very limited.

Overall, the results show that there is a clear trade-off between the efficiency of the scenario and the number of repetitive codes. To increase the credibility of one’s research, it is possible to aim for a minimum number of observations of each code (ν) . For reasons of space, I do not simulate the various scenarios for different minimum numbers of observations of each code, but a calculation (see S1 Appendix Section D: Repetitive codes: F14) reveals that it is relatively easy to aim for observing codes multiple times. On average, to obtain a repetition of one ( ν = 2) based on the calculated random chance, 2.3 extra sampling steps are required, which is an increase of about 10%. For ν = 3, 3.66 extra steps are required (about 17%), and for ν = 4, 4.66 extra steps are required (about 21%). The number of extra steps required is even smaller for both purposive scenarios as these are more efficient.

Conclusions

The results for the purposive scenarios produced the same range of minimum sample sizes (below 50 information sources) as tentatively indicated in the literature. The simulations also uncovered mechanisms that give key insights into the estimation of the minimum size of a qualitative sample. The mean probability of observing codes is more important than the number of codes in the population for reaching theoretical saturation. Furthermore, when the probability of observing codes is low, the purposive scenarios are much more efficient than the random chance scenario. When this probability is high, the differences between scenarios are small. Finally, the more efficient a scenario is, the lower the mean number of observations per code, but only a few sampling steps are required to increase the minimum number of observations of all the codes.

Limitations and further research

This paper has two potential limitations that deserve discussion. First, critics could claim that the scenarios are mechanistic and do not represent real-world sampling procedures. I used ideal typical scenarios that capture the full range of possible empirical sampling procedures. Researchers who view their research through the lens of these scenarios are likely to observe that their sampling procedure shares characteristics with at least one of the three scenarios or that their sampling procedure is a mixture of two scenarios. Future researchers can also simulate other scenarios that they conceive and even include different sampling strategies in their simulations, like snowball sampling or sampling for maximal variation [ 12 , 13 ].

Second, I simulated a broad range of scenarios for the purpose of this paper, but other simulations are also possible. For example, I simulated only one population per combination of mean probabilities of observing codes and the number of codes. This lack of variation could cast doubt on the robustness of my results. However, there was a large variation among the 1100 populations, as the number of codes was not important for the minimum sample size and because the variance around the mean probability of observing codes was not important. By letting the mean probability of observing codes vary between 0.09 and 0.91, I only considered a range of probabilities that is realistic in an empirical setting. I also did not vary the population sizes. Instead, I chose a large number that produced conservative estimates of the minimal sample size. It would be empirically interesting to vary the sample sizes in the simulations. For computational reasons and to reduce the complexity of this paper, I left this challenge for future researchers. Finally, I did not simulate different minimum observations per code, as the formula based on random chance gave sufficient insights into this issue.

Guidelines for purposive sampling

Based on these insights, I formulate a set of guidelines for sampling in qualitative research. I am aware that such guidelines are contested by many qualitative researchers, but Tracy [ 33 ] rightfully argues that criteria or guidelines are useful to represent the core of the craft of doing research, and help improve quality. Some of these guidelines are already implemented by many scholars, but for completeness I mention them here. The guidelines are not intended as formal mechanistic rules, but rather as an aid to making informed choices about the sampling and how to report it.

The guidelines for sampling in qualitative research are as follows:

  • The basis for distinguishing sub-populations.
  • Whether the sources are interchangeable in a sub-population.
  • Whether the sub-populations serve a comparative purpose or are used for other means.
  • The process of data collection, sampling, and analysis per sub-population.
  • Other criteria that are deemed important by the researcher.
  • The complexity and scope of the research question.
  • The existing theory and information available about the sub-population.
  • Other possible factors that are deemed to be of influence.
  • The likelihood of an information source actually containing codes (is required information rare in the population and what are the chances of non-response?).
  • The willingness and ability of the source (or its authors) to let the code be uncovered (are there strategic interests?).
  • The probability that the researcher is able to observe the code (based on the researcher’s prior research experience and familiarity with the topic).
  • Random chance is only appropriate if after a substantial number of sampling steps, the researcher still has little or no idea about the characteristics of the sub-population and where codes can be found. In that case, random chance serves as a fallback scenario. If theoretical saturation is reached under random chance, then it is also reached in the other two scenarios. With conservative estimates of the mean probability of observing codes, the minimum sample size is over 4000 information sources, while for higher means, the minimum sample size rapidly drops to below 100 at probabilities of around 0.3 and below 50 at probabilities of 0.4.
  • Choosing a minimal information scenario requires some argumentation. Most important is that the researcher makes it plausible that a new code will be observed at each sampling step. This is something that the researcher will experience as the research progresses. If at a sampling step an information source does not yield any new codes, the researcher can opt for increasing the number of sampling steps by one. Usually there is little need to aim deliberately for multiple observations per code, because the scenario delivers sufficient repetitive codes. Under low estimates of the mean probability of observing codes, the minimum sample size for minimal information is around 50, while for higher means the minimum sample size is below 25.
  • The researcher can only choose maximum information when there is already a full overview of all the information sources in the (sub-)population and how information-rich these sources are (e.g. how many codes they contain). However, as maximum information makes very strong assumptions, the choice needs proper argumentation. The benefit of the maximum information scenario is that even under low estimates of the mean probability of observing codes, the minimum sample size is only 20 information sources. For higher means, the minimum sample size drops below 10. However, unless there is already strong theory present, I advise to aim for multiple observations of each code to guard against misinformation.
  • Choose a fitting sampling strategy . The researcher should take into account that the sampling strategy (see [ 12 , 13 ]) needs to lead to a sufficiently broad reach across information sources in the population to be able to cover all the codes relevant to answering the research question.
  • Account for these steps when reporting the research . State why a scenario, with its associated minimum sample size is appropriate. The researcher can choose to report the total number of unique codes observed after a given number of sampling steps (for example: 4–5). This can help assess the plausibility of the scenario. The researcher can further report the number of times each code was observed and whether there were reasons to suspect that some codes were not credible. Finally, the researcher can assess if theoretical saturation at the population level was reached.

Following these recommendations does not mean that overall quality of the research is good. The recommendations can only help to improve the sampling, which is but one aspect of the entire process. In addition, in many instances, codes are not yet fixed at the start of the research. Rather, they become more known as the research progresses. I suggest that researchers reevaluate their assessment during each sampling step.

Keeping the analyses in mind, I recommend that researchers should generally opt for a minimal information strategy, as it makes reasonable assumptions, it is efficient, and it yields sufficient codes. Whether saturation has been reached remains in the argumentative judgment of the researcher. These guidelines can aid the researcher in making this judgment and the readers in assessing it. Overall, the results and the guidelines offered in this paper can improve the quality and transparency of purposive sampling procedures. Therefore, I encourage fellow researchers to consider using these ideas and guidelines and to improve upon them where they see fit.

Supporting information

S1 appendix. technical details..

Mathematical details of the simulation.

https://doi.org/10.1371/journal.pone.0181689.s001

S1 File. R-code for the simulations.

Code for the simulations in R.

https://doi.org/10.1371/journal.pone.0181689.s002

S2 File. Simulated data.

The simulated data set used for this study.

https://doi.org/10.1371/journal.pone.0181689.s003

Acknowledgments

The authors is grateful for feedback on this work by Allard van Mossel, Chris Eveleens, Marijn van Weele, Colette Bos and Maryse Chappin. An earlier version of this paper was presented at the Qualitative and Mixed Methods in Research Evaluation and Policy conference 2015 in London, and the 2016 Annual Meeting of the Academy of Management in Anaheim.

  • 1. Bryman A, Bell E. Business Research Methods 3e. Oxford: Oxford university press; 2011.
  • View Article
  • Google Scholar
  • 5. Poole MS, Van de Ven AH, Dooley K. Organizational Change and Innovation Processes. Oxford: Oxford University Press; 2000.
  • 12. Patton MQ. Qualitative evaluation and research methods. SAGE Publications, inc; 1990.
  • 13. Bryman A. Social Research Methods. 3rd ed. Oxford: Oxford University Press; 2013.
  • 15. Charmaz K. Constructing grounded theory. London, UK: Sage; 2014.
  • 20. Baker SE, Edwards R. How many qualitative interviews is enough. National Centre for Research Methods Review Paper. Southhampton: National Centre for Research Methods; 2012.
  • 23. Mason M. Sample size and saturation in PhD studies using qualitative interviews. Forum Qualitative Sozialforschung/Forum: Qualitative Social Research. 2010;11.
  • 27. Blaikie N. Approaches to Social Enquiry. Cambridge: polity; 2007.
  • 30. Safman RM, Sobal J. Qualitative sample extensiveness in health education research. Health Education & Behavior. Sage Publications; 2004;31: 9–21.
  • 33. Tracy SJ. Qualitative quality: Eight “big-tent” criteria for excellent qualitative research. Qualitative inquiry. Sage Publications Sage CA: Los Angeles, CA; 2010;16: 837–851.
  • 36. Lewis-Beck MS, Bryman A, Futing Liao T. The Sage Encyclopydia of Social Science Research Methods. London: Sage; 2004.
  • 38. Yin RK. Case Study Research: Design and Methods. 4th ed. Thousand Oaks, CA: Sage Publications Inc.; 2009.
  • 40. R Development Core Team. R: A language and environment for statistical computing [Internet]. Vienna: R Foundation for Statistical Computing; 2015. www.R-project.org
  • 🇬🇧 UK / EUROPE
  • 🇦🇺 AUSTRALIA
  • 🇺🇸 UNITED STATES

Factors to determine your sample size for qualitative research

Factors to determine your sample size for qualitative research

How to choose qualitative research software

The 3 Market Research Trends Taking The Industry By Storm

How To Conduct Qualitative Market Research Successfully

A common question that is asked when working with new and existing clients is sample size - it always crops up along with further enquiries around minimum sample size, what constitutes a large sample and should I be aiming to get more people for quantitative research purposes?

The health of your qualitative research , as always, will depend on what it is you are aiming to discover - There are many elements that affect research and in this blog, we will look at some key factors that you can implement into your sample size when executing qualitative research.

Factors that determine your sample for qualitative research project can have a huge impact on your findings

It's a journey

Sample size for qualitative research is an organic process..

From the outset defining your sample size will be the first hurdle to jump through, but in these early stages, we need to acknowledge that who you initially acquire may not be your final selection of people. Often, when building a sample size in qualitative research, a good place to start is with a quantitative method such as a survey or questionnaire. From here you can segment your audience into a population that you are targeting or hoping to attract.

For example, if you were an academic institution such as a university, and you were wanting insight on how to improve your student services, you would first weigh up what would be an adequate sample size and what you were hoping to discover. An approach you may want to take is to canvas a large number of learners via a quantitative survey to their student accounts, and segment the responses into a smaller, 30-50 student group that best represents the audience you want to reach. Within that new group, you can launch a qualitative research campaign that is representative of your education institution.

Planning the journey of your qualitative research.

1. Have different research methods for different stages of your research journey. 2. Be open to new methods of collecting data and information. 3. Break up your larger sample into smaller groups depending on how they answer or score in preliminary research activities.

Determining sample sizes in qualitative research does matter;

(but bigger isn't always better).

Modern technology and AI is changing the way in which we manage research methods. The advancements that are happening mean you are able to process a large set of data . When determining sample size we need to step back from our traditional understanding of what we perceive qualitative research to be and consider what is the question we need answering and what sample do we have readily available?

Focus groups, in-depth interviews and consultations can produce great sample data and a range of info for you and your team to have a discussion; but ensuring the people you are sampling are relevant and appropriate is half the battle. Often we discover that the more you research and understand what you are wanting to discover, your sample size will become clear.

Experience tells us that sample sizes depend on the campaign you are deploying so you should consider one of the following methods.

Start large and reduce. Quantitative research is a great way to recruit a large sample of people which can then be reduced and introduced to qualitative methodologies. This approach enables you to handpick a population of individuals that match your criteria or target audience - aim to lower the initial sample size down to around 30-40 participants. Communities. These can accommodate large numbers of participants - often around 200, and using platforms that are designed in a WhatsApp style, can provide qualitative data where you can interact with participants using a range of qualitative (and quantitative) tools, guide them onto related points and answer any questions. Campaigns such as these could operate anywhere between 3 and 14 days. Long-Term Campaigns. This form of qualitative research can involve a large sample - sometimes up to 1000 participants and studies can last up to 12 months. Discussion is ongoing with a range of different activities to keep respondents engaged.

In community projects, participants can be contacted for qualitative discussions or interviews

Managing your sample size in qualitative is key

Let's be clear, there is no one size fits all approach when determining sample size; we can however be as scientific as possible with our methodology. When your research calls for larger samples, you'll need to manage that group of participants in a practical and nurturing way. If you are running a long-term community project with a high number of participants, engagement and analysis with your respondents can get expensive; but managing relationships and interactions will yield the best results so how do we achieve this?

Incentivise your project

When launching your project, consider adding an incentive that will motivate your sample to respond regularly and in detail. Look at the range of participants that you are working with and assess what a suitable reward could be.

Communicate in their language on a regular basis

This might seem like an obvious point to make, but depending on your campaign, interaction and engagement with your sample will yield greater results. If a respondent feels like they are being valued and moderators are taking note of their thoughts and responses, you are going to get less saturation with your findings.

Choose a platform that works for you

With a number of research platforms now on the market, it's essential that you choose one that suits the way in which you work and the budget you have available. When making a choice, check that it has the ability to centralise and store depth interviews, quantitative research, links to articles and performs a high level of analysis. If it is unable to do any of this, especially the analysis, then search for a platform that does.

Sample size in qualitative research is about insight

The most important factor for researchers when conducting qualitative research is the quality of insight that you are obtaining. At each stage of planning, you need to be considering what data you are wanting to study and in particular, what you are going to do with it. You do not want your sample to suffer from saturation so it's vital you are asking the correct questions and deploying the right methodology to your campaign.

Why do I write this? Because qualitative research is all about the level of insight you are wanting to harness. As mentioned previously, work with a platform that is able to centralise all the information, but importantly, does it in a way that makes sense - in a way that provides value. Insight is all about providing a relevant narrative to your work - whether it is on an 'in-house' company level, or full national level. Researchers will always tell you that insight is key and that you need to choose the correct means of collecting this in order to answer questions you established at the start.

Insight can take the form of heat maps, an interview, discussion on theoretical viewpoints to name a few and any platform should be delivering live, real-time results. The ability to see a large number of the population of your campaign delivering information as it happens enables you to make decisions more quickly and efficiently.

Budgeting for research

How much will it cost.

There is an age-old assumption that qualitative research is more expensive than quantitative because you are having to spend greater lengths of time analysing the data - the notion being the greater the sample sizes, the more time it takes to understand and interpret the research. Researchers have always taken their time to study a chosen method historically this has been a solid, practical way to establish a focus or action point. But as mentioned at the start, technology advances at a lightning pace which means features are being constantly updated resulting in prices being driven down. Yes, we would be correct in our understanding that a quick, small study will probably be cheaper - it requires fewer working hours and provides a smaller amount of analysis. Therefore it will provide you with good value and is ideal for projects where

Does a larger sample mean a larger bill?

The answer to this is, well a bit of a variable. Your sample size will inevitably play a part in the overall cost but it depends on what it is you are asking them to do and the level of support required. When conducting research the size of your sample will eat a large chunk of your budget if you are wanting a high volume of video content transcribed and analysed for sentiment. This being said, design a narrative in your project so that your sample size can be reduced as the project develops; this way you can save a range of the more data-intensive elements for when you have whittled down your initial cohort.

In our experience, most companies are passionate about research and want to work with anyone who has respect for the field. But when establishing your research budget, consider how much support you require, the amount of prep work you can do 'in-house' and the type of project you want to deploy.

READY TO JOIN THE 1000+ GLOBAL RESEARCHERS ALREADY HARNASSING THE POWER OF QUALZY?

Click here to get in touch.

Online consumer research panels: Are they the right method for your next project?

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Sampling in Qualitative Research

In gerontology the most recognized and elaborate discourse about sampling is generally thought to be in quantitative research associated with survey research and medical research. But sampling has long been a central concern in the social and humanistic inquiry, albeit in a different guise suited to the different goals. There is a need for more explicit discussion of qualitative sampling issues. This article will outline the guiding principles and rationales, features, and practices of sampling in qualitative research. It then describes common questions about sampling in qualitative research. In conclusion it proposes the concept of qualitative clarity as a set of principles (analogous to statistical power) to guide assessments of qualitative sampling in a particular study or proposal.

Questions of what is an appropriate research sample are common across the many disciplines of gerontology, albeit in different guises. The basic questions concern what to observe and how many observations or cases are needed to assure that the findings will contribute useful information. Throughout the history of gerontology, the most recognized and elaborate discourse about sampling has been associated with quantitative research, including survey and medical research. But concerns about sampling have long been central to social and humanistic inquiry (e.g., Mead 1953 ). The authors argue such concerns remained less recognized by quantitative researchers because of differing focus, concepts, and language. Recently, an explicit discussion about concepts and procedures for qualitative sampling issues has emerged. Despite the growing numbers of textbooks on qualitative research, most offer only a brief discussion of sampling issues, and far less is presented in a critical fashion ( Gubrium and Sankar 1994 ; Werner and Schoepfle 1987 ; Spradley 1979 , 1980 ; Strauss and Corbin 1990 ; Trotter 1991 ; but cf. Denzin and Lincoln 1994 ; DePoy and Gitlin 1993 ; Miles and Huberman 1994 ; Pelto and Pelto 1978 ).

The goal of this article is to extend and further refine the explicit discussion of sampling issues and techniques for qualitative research in gerontology. Throughout the article, the discussion draws on a variety of examples in aging, disability, ethnicity as well as more general anthropology.

The significance of the need to understand qualitative sampling and its uses is increasing for several reasons. First, emerging from the normal march of scientific developments that builds on prior research, there is a growing consensus about the necessity of complementing standardized data with insights about the contexts and insiders' perspectives on aging and the elderly. These data are best provided by qualitative approaches. In gerontology, the historical focus on aging pathology obscured our view of the role of culture and personal meanings in shaping how individuals at every level of cognitive and physical functioning personally experience and shape their lives. The individual embodying a “case” or “symptoms” continues to make sense of, manage, and represent experiences to him- or herself and to others. A second significance to enhancing our appreciation of qualitative approaches to sampling is related to the societal contexts of the scientific enterprise. Shifts in public culture now endorse the inclusion of the experiences and beliefs of diverse and minority segments of the population. A reflection of these societal changes is the new institutional climate for federally funded research, which mandates the inclusion and analysis of data on minorities. Qualitative approaches are valuable because they are suited to assessing the validity of standardized measures and analytic techniques for use with racial and ethnic subpopulations. They also permit us to explore diversities in cultural and personal beliefs, values, ideals, and experiences.

This article will outline the guiding principles and rationales, features, and practices of sampling in qualitative research. It describes the scientific implications of the cultural embeddedness of sampling issues as a pervasive feature in wider society. It then describes common questions about sampling in qualitative research. It concludes by proposing an analog to statistical power, qualitative clarity , as a set of principles to guide assessments of the sampling techniques in a study report or research proposal. The term clarity was chosen to express the goal of making explicit the details of how the sample was assembled, the theoretical assumptions, and the practical constraints that influenced the sampling process. Qualitative clarity should include at least two components, theoretical grounding and sensitivity to context. The concept focuses on evaluating the strength and flexibility of the analytic tools used to develop knowledge during discovery procedures and interpretation. These can be evaluated even if the factors to be measured cannot be specified.

A wide range of opinions about sampling exists in the qualitative research community. The authors take issue with qualitative researchers who dismiss these as irrelevant or even as heretical concerns. The authors also disagree with those quantitative practitioners who dismiss concerns about qualitative sampling as irrelevant in general on the grounds that qualitative research provides no useful knowledge. It is suggested that such a position is untenable and uninformed.

This article focuses only on qualitative research; issues related to combined qualitative and quantitative methods are not discussed. The focus is on criteria for designing samples; qualitative issues related to suitability of any given person for research are not addressed. The criteria for designing samples constitute what Johnson (1990) labels as “Criteria One issues,” the construction and evaluation of theory and data-driven research designs. Criteria Two issues relate to the individual subjects in terms of cooperativeness, rapport, and suitability for qualitative study methods.

Although this article may appear to overly dichotomize qualitative and quantitative approaches, this was done strictly for the purposes of highlighting key issues in a brief space. The authors write here from the perspective of researchers who work extensively with both orientations, singly and in combination, in the conduct of major in-depth and longitudinal research grants that employ both methods. It is the authors' firm belief that good research requires an openness to multiple approaches to conceptualizing and measurement phenomena.

Contributions, Logic and Issues in Qualitative Sampling

Major contributions.

Attention to sampling issues has usually been at the heart of anthropology and of qualitative research since their inception. Much work was devoted to evaluating the appropriateness of theory, design strategies, and procedures for sampling. Important contributions have been made by research devoted to identifying and describing the nature of sample universes and the relevant analytic units for sampling. For example, the “universe of kinship” ( Goodenough 1956 ) has been a mainstay of cross-cultural anthropological study. Kinship studies aim to determine the fundamental culturally defined building blocks of social relationships of affiliation and descent (e.g., Bott 1971 ; Fortes 1969 ). Ethnographic investigations document the diversity of kinship structures, categories of kith and kin, and terminologies that give each culture across the globe its distinctive worldview, social structure, family organization, and patterns to individual experiences of the world.

Concerns with sampling in qualitative research focus on discovering the scope and the nature of the universe to be sampled. Qualitative researchers ask, “What are the components of the system or universe that must be included to provide a valid representation of it?” In contrast, quantitative designs focus on determining how many of what types of cases or observations are needed to reliably represent the whole system and to minimize both falsely identifying or missing existing relationships between factors. Thus the important contributions of qualitative work derived from concerns with validity and process may be seen as addressing core concerns of sampling, albeit in terms of issues less typically discussed by quantitative studies. Two examples may clarify this; one concerns time allocation studies of Peruvian farmers and the other addresses a census on Truk Island in the South Pacific.

The Andes mountains of Peru are home to communities of peasants who farm and tend small herds to garner a subsistence living. To help guide socioeconomic modernization and to improve living conditions, refined time allocation studies (see Gross 1984 ) were conducted in the 1970s to assess the rational efficiency of traditional patterns of labor, production, and reproduction. Seemingly irrational results were obtained. A systematic survey of how villagers allocated their time to various activities identified a few healthy adults who sat in the fields much of the day. Given the marginal food supplies, such “inactivity” seemed irrational and suggested a possible avenue for the desired interventions to improve village economic production. Only after interviewing the farmers to learn why the men sat in the fields and then calculating the kilocalories of foods gained by putting these men to productive work elsewhere was an explanation uncovered. It was discovered that crop yields and available calories would decline , not increase, due to foraging birds and animals. Because the farmers sat there, the events of animal foraging never occurred in the data universe. Here, judgments about the rationality of behaviors were guided by too narrow a definition of the behavioral universe, shaped by reliance on analytic factors external to the system (e.g., biases in industrial economies that equate “busyness” with production). An important message here is that discovery and definition of the sample universe and of relevant units of activity must precede sampling and analyses.

On Truk Island in the South Pacific, two anthropologists each conducted an independent census using the same methods. They surveyed every person in the community. Statistical analyses of these total universe samples were conducted to determine the incidence of types of residence arrangements for newlywed couples. The researchers reached opposite conclusions. Goodenough (1956) argued that his colleague's conclusion that there are no norms for where new couples locate their residence clearly erred by classifying households as patrilocal (near the father), matrilocal, or neolocal (not near either parent) at one time as if isolated from other social factors. Goodenough used the same residence typology as did his colleague in his analysis, but identified a strong matralineal pattern (wife's extended family). Evidence for this pattern becomes clear when the behaviors are viewed in relation to the extended family and over time. The newlyweds settle on whatever space is available but plan to move later to the more socially preferred (e.g., matralineal) sites. This later aspect was determined by combining survey-based observations of behavior with interviews to learn “what the devil they think they are doing” ( Geertz 1973 ). Thus different analytic definitions of domestic units led to opposite conclusions, despite the use of a sample of the total universe of people! Social constructions of the lived universe, subjectively important temporal factors have to be understood to identify valid units for analyses and interpretation of the data.

The Peruvian and the Truk Island examples illustrate some of the focal contributions of qualitative approaches to sampling. Altering the quantitatively oriented sampling interval, frequency, or duration would not have produced the necessary insights. The examples also suggest some of the dilemmas challenging sampling in qualitative research. These will be addressed in a later section. Both cases reveal the influence of deeply ingrained implicit cultural biases in the scientific construction of the sampling universe and the units for sampling.

The Cultural Embeddedness of the Concept of Sampling

Sampling issues are not exclusive to science. Widespread familiarity with sampling and related issues is indicated by the pervasive popular appetite for opinion and election polls, surveys of consumer product prices and quality, and brief reports of newsworthy scientific research in the mass media. Sampling issues are at the heart of jury selection, which aims to represent a cross section of the community; frequent debates erupt over how to define the universe of larger American society (e.g., by race and gender) to use for juror selection in a specific community. We can shop for sampler boxes of chocolates to get a tasty representation of the universe of all the candies from a company. Debates about the representativeness, size, and biases in survey results because of the people selected for study or the small size of samples are a part of everyday conversation. Newspapers frequently report on medical or social science research, with accounts of experts' challenging the composition or size of the sample or the wording of the survey questions. Critical skills in sampling are instilled during schooling and on-the-job training.

Such widespread familiarity with basic sampling issues suggests a deep cultural basis for the fascination and thus the need for a more critical understanding. The concept and practices of sampling resonate with fundamental cultural ideals and taboos. It is perhaps the case that sampling is linked, in American culture, to democratic ideals and notions of inclusion and representation.

What does that mean for qualitative researchers designing sampling strategies? We need to be aware that the language of science is ladened with cultural and moral categories. Thus gerontological research may potentially be shaped by both cultural themes masked as scientific principles. Basic terms for research standards can simultaneously apply to ideals for social life ( Luborsky 1994 ). We construct and are admonished by peers to carefully protect independent and dependent variables; we design studies to provide the greatest statistical power and speak of controlling variables. At the same time, psychosocial interventions are designed to enhance these same factors of individual independence and senses of power and control. We examine constructs and data to see if they are valid or invalid; the latter word also is defined in dictionaries as referring to someone who is not upright but physically deformed or sickly. Qualitative research, likewise, needs to recognize that we share with informants in the search for themes and coherence in life, and normatively judge the performance of others in these terms ( Luborsky 1994 , 1993b ).

The ideals of representativeness and proportionality are not, in practice, unambiguous or simple to achieve as is evidenced in the complex jury selection process. Indeed, there is often more than one way to achieve representativeness. Implicit cultural values may direct scientists to define some techniques as more desirable than others. Two current examples illustrate how sampling issues are the source of vitriolic debate outside the scientific community: voting procedures, and the construction or apportionment of voting districts to represent minority, ethnic, or racial groups. Representing “the voice of the people” in government is a core tenet of American democracy, embodied in the slogan “one person one vote.” Before women's suffrage, the universe was defined as “one man one vote.” A presidential nomination for U.S. Attorney General Dr. Lani Guinier, was withdrawn, in part, because she suggested the possibility of an alternative voting system (giving citizens more than one vote to cast) to achieve proportional representation for minorities. We see in these examples that to implement generalized democratic ideals of equal rights and representation can be problematic in the context of the democratic ideal of majority rule. Another example is the continuing debate in the U.S. Supreme Court over how to reapportion voting districts so as to include sufficient numbers of minority persons to give them a voice in local elections. These examples indicate the popular knowledge of sampling issues, the intensity of feelings about representativeness, and the deep dilemmas about proportional representation and biases arising within a democratic society. The democratic ideals produce multiple conflicts at the ideological level.

It is speculated that the association of sampling issues with such core American cultural dilemmas exacerbates the rancor between qualitative and quantitative gerontology; whereas in disciplines that do not deal with social systems, there is a tradition of interdependence instead of rancor. For example, the field of chemistry includes both qualitative and quantitative methods but is not beset by the tension found in gerontology. Qualitative chemistry is the set of methods specialized in identifying the types and entire range of elements and compounds present in materials or chemical reactions. A variety of discovery-oriented methods are used, including learning which elements are reacting with one another. Quantities of elements present may be described in general ranges as being from a trace to a substantial amount. Quantitative chemistry includes measurement-oriented methods attuned to determining the exact quantity of each constituent element present. Chemists use both methods as necessary to answer research problems. The differences in social contextual factors may contribute to the lower level of tension between quantitative and qualitative traditions within the European social sciences situated as they are within alternative systems for achieving democratic representation in government (e.g., direct plebiscites or multiparty governments rather than the American electoral college approach to a two-party system).

Ideals and Techniques of Qualitative Sampling

The preceding discussion highlighted the need to first identify the ideal or goal for sampling and second to examine the techniques and dilemmas for achieving the ideal. The following section describes several ideals, sampling techniques, and inherent dilemmas. Core ideals include the determination of the scope of the universe for study and the identification of appropriate analytic units when sampling for meaning

Defining the universe

This is simultaneously one of qualitative research's greatest contributions and greatest stumbling blocks to wider acceptance in the scientific community. As the examples of the Peruvian peasants and Trukese postmarital residence norms illustrated, qualitative approaches that can identify relevant units (e.g., of farming activity or cultural ideals for matralineal residence) are needed to complement behavioral or quantitative methods if we are to provide an internally valid definition of the scope of the universe to be sampled. Probability-based approaches do not capture these dimensions adequately.

The problem is that the very nature of such discovery-oriented techniques runs counter to customary quantitative design procedures. This needs to be clearly recognized. Because the nature of the units and their character cannot be specified ahead of time, but are to be discovered, the exact number and appropriate techniques for sampling cannot be stated at the design stage but must emerge during the process of conducting the research. One consequence is that research proposals and reports may appear incomplete or inadequate when in fact they are appropriately defined for qualitative purposes. One technique in writing research proposals has been to specify the likely or probable number of subjects to be interviewed.

Evidence that a researcher devoted sufficient attention to these issues can be observed in at least two dimensions. First, one finds a wealth of theoretical development of the concepts and topics. In qualitative research, these serve as the analytic tools for discovery and aid in anticipating new issues that emerge during the analyses of the materials. Second, because standardized measurement or diagnostic tests have not yet been developed for qualitative materials, a strong emphasis is placed on analytic or interpretive perspectives to the data collection and data analyses.

Expository styles, traditional in qualitative studies, present another dilemma for qualitative discussions of sampling. An impediment to wider recognition of what constitutes an adequate design is customary, implicit notions about the “proper” or traditional formats for writing research proposals and journal articles. The traditional format for grant applications places discussions of theory in the section devoted to the general significance of the research application separate from the methods and measures. However, theoretical issues and conceptual distinctions are the research tools and methods for qualitative researchers, equivalent to the quantitative researchers' standardized scales and measures. As the authors have observed it written reviews of grant applications over many years, reviewers want such “clutter” in qualitative documents placed where it belongs elsewhere in the proposal, not in the design section ( Rubinstein 1994 ). Qualitative researchers look for the analytic refinement, rigor, and breadth in conceptualization linked to the research procedures section as signs of a strong proposal or publication. Thus basic differences in scientific emphases, complicated by expectations for standardized scientific discourse, need to be more fully acknowledged.

Appropriate analytic units: Sampling for meaning

The logic or premises for qualitative sampling for meaning is incompletely understood in gerontology. Although it appears that, in the last decade, there has been an improved interdisciplinary acceptance and communication within gerontology, gerontology is largely driven by a sense of medicalization of social aging and a bias toward survey sampling and quantitative analysis based on “adequate numbers” for model testing and other procedures. At the same time, and partly in reaction to the dominance of the quantitative ethos, qualitative researchers have demurred from legitimating or addressing these issues in their own work.

Understanding the logic behind sampling for meaning in gerontological research requires an appreciation of how it differs from other approaches. By sampling for meaning, the authors indicate the selection of subjects in research that has as its goal the understanding of individuals' naturalistic perceptions of self, society, and the environment. Stated in another way, this is research that takes the insider's perspective. Meaning is defined as the process of reference and connotation, undertaken by individuals, to evoke key symbols, values, and ideas that shape, make coherent, and inform experience ( D'Andrade 1984 ; Good & Good 1982 ; Luborsky and Rubinstein 1987 ; Mishler 1986 ; Rubinstein 1990 ; Williams 1984 ). Clearly, the qualitative approach to meaning stands in marked contrast to other approaches to assessing meaning by virtue of its focus on naturalistic data and the discovery of the informant's own evaluations and categories. For example, one approach assesses meaning by using standardized lists of predefined adjectives or phrases (e.g., semantic differential scale methods, Osgood, Succi, and Tannenbaum 1957 ); another approach uses diagnostic markers to assign individuals to predefined general types (e.g., depressed, anxious) as a way to categorize people rather than describe personal meaning (e.g., the psychiatric diagnostic manual, DSMEI-R, APA 1987 ).

The difference between the me of that night and the me of tonight is the difference between the cadaver and the surgeon doing the cutting. (Flaubert, quoted in Crapanzano 1982 , p. 181)

It is important to understand that meanings and contexts (including an individual's sense of identity), the basic building blocks of qualitative research, are not fixed, constant objects with immutable traits. Rather, meanings and identities are fluid and changeable according to the situation and the persons involved. Gustave Flaubert precisely captures the sense of active personal meaning-making and remaking across time. Cohler (1991) describes such meaning-making and remaking as the personal life history self, a self that interprets, experiences, and marshals meanings as a means to manage adversity. A classic illustration of the fluidity of meanings is the case presented by Evans-Pritchard (1940) who explains the difficulty he had determining the names of his informants at the start of his fieldwork in Africa. He was repeatedly given entirely different names by the same people. In the kinship-based society, the name or identity one provides to another person depends on factors relative to each person's respective clan membership, age, and community. Now known as the principle of segmentary opposition, the situated and contextual nature of identities was illustrated once the fieldworker discovered the informants were indexing their names to provide an identity at an equal level of social organization. For example, to explain who we are when we travel outside the United States, we identify ourselves as Americans, not as someone from 1214 Oakdale Road. When we introduce ourselves to a new neighbor at a neighborhood block party, we identify ourselves by our apartment building or house on the block, not by reference to our identity as residents at the state or national level.

Themes and personal meanings are markers of processes not fixed structures. Life stories, whose narration is organized around a strongly held personal theme(s) as opposed to a chronology of events from birth to present day, have been linked with distress and clinical depression ( Luborsky 1993b ). Williams (1984) suggests that the experience of being ill from a chronic medical disease arises when the disease disrupts the expected trajectory of one's biography. Some researchers argue that a break in the sense of continuity in personal meaning ( Becker 1993 ), rather than any particular meaning (theme), precedes illness and depression ( Atchley 1988 ; Antonovsky 1987 ).

Another example of fluid meaning is ethnicity. Ethnic identity is a set of meanings that can be fluid and vary according to the social situation, historical time period, and its personal salience over the lifetime ( Luborsky and Rubinstein 1987 , 1990 ). Ethnic identity serves as a source of fixed, basic family values during child socialization; more fluidly, as an ascribed family identity to redefine or even reject as part of psychological processes of individuation in early adulthood; sometimes a source of social stigma in communities or in times of war with foreign countries (e.g., “being Italian” during World War II); and a source of continuity of meaning and pride in later life that may serve to help adapt to bereavement and losses.

From the qualitative perspective, there are a number of contrasts that emerge between sampling for meaning and more traditional, survey-style sampling, which has different goals. Those who are not familiar with the sampling-for-meaning approach often voice concerns over such aspects as size ( Lieberson 1992 ), adequacy and, most tellingly, purpose of the sampling. Why, for example, are sample sizes often relatively small? What is elicited and why? What is the relationship between meanings and other traditional categories of analyses, such as age, sex, class, social statuses, or particular diseases?

What is perhaps the most important contrast between the sampling-for-meaning approach and more standard survey sampling is found in the model of the person that underlies elicitation strategies. The model of the person in standard research suggests that important domains of life can be tapped by a relatively small number of standardized “one size fits all” questions, organized and presented in a scientific manner, and that most responses are relatively objective, capable of being treated as a decontextualized trait, and are quantifiable ( Mishler 1986 ; Trotter 1991 ). From this perspective, individuals are viewed as sets of fixed traits and not as carriers and makers of meaning.

Sampling for meaning, in contrast, is based on four very distinct notions. The first is that responses have contexts and carry referential meaning. Thus questions about events, activities, or other categories of experience cannot be understood without some consideration of how these events implicate other similar or contrasting events in a person's life ( Scheer and Luborsky 1991 ). This is particularly important for older people.

Second, individuals often actively interpret experience. That is to say, many people—but not all—actively work to consider their experience, put it in context, and understand it. Experience is not a fixed response. Further, the concern with meanings or of remaking meaning can be more emergent during some life stages and events or attention to certain kinds of meanings than others. Examples of this include bereavement, retirement, ethnic identity, and personal life themes in later life.

Third, certain categories of data do not have a separable existence apart from their occurrences embodied within routines and habits of the day and the body. Although certain categories of elicited data may have a relatively objective status and be relatively “at hand” for a person's stock of knowledge, other topics may never have been considered in a way that enables a person to have ready access to them ( Alexander, Rubinstein, Goodman, and Luborsky 1992 ). Consequently, qualitative research provides a context and facilitates a process of collaboration between researcher and informant.

Fourth, interpretation, either as natural for the informant or facilitated in the research interview, is basically an action of interpretation of experience that makes reference to both sociocultural standards, be they general cultural standards or local community ones, as well as the ongoing template or matrix of individual experience. Thus, for example, a person knows cultural ideals about a marriage, has some knowledge of other people's marriages, and has intimate knowledge of one's own. In the process of interpretation, all these levels come into play.

These issues occur over a variety of sampling frames and processing frameworks. There are three such sampling contexts. First, sampling for meaning occurs in relation to individuals as representatives of experiential types. Here, the goal is the elucidation of particular types of meaning or experience (personal, setting-based, sociocultural), through inquiry about, discussion of, and conversation concerning experiences and the interpretation of events and social occur-rences. The goal of sampling, in this case, is to produce collections of individuals from whom the nature of experience can be elicited through verbal descriptions and narrations.

Second, sampling for meaning can occur in the context of an individual in a defined social process. An example here could include understanding the entry of a person into a medical practice as a patient, for the treatment of a disorder. Qualitatively, we might wish to follow this person as she moves through medical channels, following referrals, tests, and the like. Even beginning this research at a single primary physician, or with a sample of individuals who have a certain disorder, the structure of passage through a processing system may vary widely and complexly. However, given a fixed point of entry (a medical practice or a single disease), sampling for meaning is nested in ongoing social processes. Researchers wish to understand not only the patient's experience of this setting as she moves through it (e.g., Esteroff 1982 ) but also the perspectives of the various social actors involved.

Finally, researchers may wish to consider sampling for meaning in a fixed social setting. In a certain way, sampling for meaning in a fixed social setting is what is meant, in anthropology and other social sciences, by “participant observation.” The social setting is more or less fixed, as is the population of research informants. An example might be a nursing home unit, with a more or less fixed number of residents, some stability but some change, and regular staff of several types representing distinctive organizational strata and interests (administration, medicine, nursing, social work, aides, volunteers, family, or environmental services).

It is important to note that even though qualitative research focuses on the individual, subjectivity or individuality is not the only goal of study. Qualitative research can focus on the macrolevel. One basic goal of qualitative research in aging is to describe the contents of people's experiences of life, health, and disability. It is true that much of the research to date treats the individual as the basic unit of analysis. Yet, the development of insights into the cultural construction of life experiences is an equal priority because cultural beliefs and values instill and shape powerful experiences, ideals, and motivations and shape how individuals make sense of and respond to events.

Studying how macrolevel cultural and community ideologies pattern the microlevel of individual life is part of a tradition stretching from Margaret Mead, Max Weber, Robert Merton, Talcott Parsons, to studies of physical and mental disabilities by Edgerton (1967) , Esteroff (1982) , and Murphy (1987) . For example, Stouffer's (1949) pioneering of survey methods revealed that American soldiers in World War II responded to the shared adversity of combat differently according to personal expectations based on sociocultural value patterns and lived experiences. These findings further illustrate Merton's theories of relative deprivation and reference groups, which point to the basis of individual well-being in basic processes of social comparison.

The notion of stigma illustrates the micro- and the macrolevels of analyses. For example, stigma theory's long reign in the social and political sciences and in clinical practice illustrates the micro- and macroqualitative perspectives. Stigma theory posits that individuals are socially marked or stigmatized by negative cultural evaluations because of visible differences or deformities, as defined by the community. Patterns of avoidance and denial of the disabled mark the socially conditioned feelings of revulsion, fear, or contagion. Personal experiences of low self-esteem result when negative messages are internalized by, for example, persons with visible impairments, or the elderly in an ageist setting. Management of social stigma by individuals and family is as much a focus as is management of impairments. Stigma is related significantly to compliance with prescribed adaptive devices ( Zola 1982 ; Luborsky 1993a ). A graphic case of this phenomenon are polio survivors who were homebound due to dependence on massive bedside artificial ventilators. With the recent advent of portable ventilators, polio survivors gained the opportunity to become mobile and travel outside the home, but they did not adopt the new equipment, because the new independence was far outweighed by the public stigma they experienced ( Kaufert and Locker 1990 ).

A final point is that sampling for meaning can also be examined in terms of sampling within the data collected. For example, the entire corpus of materials and observations with informants needs to be examined in the discovery and interpretive processes aimed at describing relevant units for analyses and dimensions of meaning. This is in contrast to reading the texts to describe and confirm a finding without then systematically rereading the texts for sections that may provide alternative or contradictory interpretations.

Techniques for selecting a sample

As discussed earlier, probability sampling techniques cannot be used for qualitative research by definition, because the members of the universe to be sampled are not known a priori, so it is not possible to draw elements for study in proportion to an as yet unknown distribution in the universe sampled. A review of the few qualitative research publications that treat sampling issues at greater length (e.g., Depoy and Gitlin 1993 ; Miles and Huberman 1994 ; Morse 1994 ; Ragin and Becker 1992 ) identify five major types of nonprobability sampling techniques for qualitative research. A consensus among these authors is found in the paramount importance they assign to theory to guide the design and selection of samples ( Platt 1992 ). These are briefly reviewed as follows.

First, convenience (or opportunistic) sampling is a technique that uses an open period of recruitment that continues until a set number of subjects, events, or institutions are enrolled. Here, selection is based on a first-come, first-served basis. This approach is used in studies drawing on predefined populations such as participants in support groups or medical clinics. Second, purposive sampling is a practice where subjects are intentionally selected to represent some explicit predefined traits or conditions. This is analogous to stratified samples in probability-based approaches. The goal here is to provide for relatively equal numbers of different elements or people to enable exploration and description of the conditions and meanings occurring within each of the study conditions. The objective, however, is not to determine prevalence, incidence, or causes. Third, snowballing or word-of-mouth techniques make use of participants as referral sources. Participants recommend others they know who may be eligible. Fourth, quota sampling is a method for selecting numbers of subjects to represent the conditions to be studied rather than to represent the proportion of people in the universe. The goal of quota sampling is to assure inclusion of people who may be underrepresented by convenience or purposeful sampling techniques. Fifth, case study ( Ragin and Becker 1992 ; Patton 1990 ) samples select a single individual, institution, or event as the total universe. A variant is the key-informant approach ( Spradley 1979 ), or intensity sampling ( Patton 1990 ) where a subject who is expert in the topic of study serves to provide expert information on the specialized topic. When qualitative perspectives are sought as part of clinical or survey studies, the purposive, quota, or case study sampling techniques are generally the most useful.

How many subjects is the perennial question. There is seldom a simple answer to the question of sample or cell size in qualitative research. There is no single formula or criterion to use. A “gold standard” that will calculate the number of people to interview is lacking (cf. Morse 1994 ). The question of sample size cannot be determined by prior knowledge of effect sizes, numbers of variables, or numbers of analyses—these will be reported as findings. Sample sizes in qualitative studies can only be set by reference to the specific aims and the methods of study, not in the abstract. The answer only emerges within a framework of clearly stated aims, methods, and goals and is conditioned by the availability of staff and economic resources.

Rough “rules of thumb” exist, but these derive from three sources: traditions within social science research studies of all kinds, commonsense ideas about how many will be enough, and practical concerns about how many people can be interviewed and analyzed in light of financial and personnel resources. In practice, from 12 to 26 people in each study cell seems just about right to most authors. In general, it should be noted that Americans have a propensity to define bigger as better and smaller as inferior. Quantitative researchers, in common with the general population, question such small sample sizes because they are habituated to opinion polls or epidemiology surveys based on hundreds or thousands of subjects. However, sample sizes of less than 10 are common in many quantitative clinical and medical studies where statistical power analyses are provided based on the existence of very large effect sizes for the experimental versus control conditions.

Other considerations in evaluating sample sizes are the resources, times, and reporting requirements. In anthropological field research, a customary formula is that of the one to seven: for every 1 year of fieldwork by one researcher, 7 years are required to conduct the analysis. Thus, in studies that use more than one interviewer, the ability to collect data also increases the burden for analyses.

An outstanding volume exploring the logic, contributions, and dilemmas of case study research ( Ragin and Becker 1992 ) reports that survey researchers resort to case examples to explain ambiguities in their data, whereas qualitative researchers reach for descriptive statistics when they do not have a clear explanation for their observations. Again, the choice of sample size and group design is guided by the qualitative goal of describing the nature and contents of cultural, social, and personal values and experiences within specific conditions or circumstances, rather than of determining incidence and prevalence.

Who and who not?

In the tradition of informant-based and of participatory research, it is assumed that all members of a community can provide useful information about the values, beliefs, or practices in question. Experts provide detailed, specialized information, whereas nonexperts do so about daily life. In some cases, the choice is obvious, dictated by the topic of study, for example, childless elderly, retirees, people with chronic diseases or new disabilities. In other cases, it is less obvious, as in studies of disease, for example, that require insights from sufferers but also from people not suffering to gain an understanding for comparison with the experiences and personal meanings of similar people without the condition. Comparisons can be either on a group basis or matched more closely on a one-to-one basis for many traits (e.g., age, sex, disease, severity), sometimes referred to as yoked pairs. However, given the labor-intensive nature of qualitative work, sometimes the rationale for including control groups of people who do not have the experiences is not justifiable.

Homogeneity or diversity

Currently, when constructing samples for single study groups, qualitative research appears to be about equally split in terms of seeking homogeneity or diversity. There is little debate or attention to these contrasting approaches. For example, some argue that it is more important to represent a wide range of different types of people and experiences in order to represent the similarities and diversity in human experience, beliefs, and conditions (e.g., Kaufman 1987 , 1989 ) than it is to include sufficient numbers of people sharing an experience or condition to permit evaluation of within-group similarities. In contrast, others select informants to be relatively homogeneous on several characteristics to strengthen comparability within the sample as an aid to identifying similarities and diversity.

Summary and Reformulation for Practice

To review, the authors suggest that explicit objective criteria to use for evaluating qualitative research designs do exist, but many of these focus on different issues and aspects of the research process, in comparison to issues for quantitative studies. This article has discussed the guiding principles, features, and practices of sampling in qualitative research. The guiding rationale is that of the discovery of the insider's view of cultural and personal meanings and experience. Major features of sampling in qualitative research concern the issues of identifying the scope of the universe for sampling and the discovery of valid units for analyses. The practices of sampling, in comparison to quantitative research, are rooted in the application of multiple conceptual perspectives and interpretive stances to data collection and analyses that allow the development and evaluation of a multitude of meanings and experiences.

This article noted that sampling concerns are widespread in American culture rather than in the esoteric specialized concern of scientific endeavors ( Luborsky and Sankar 1993 ). Core scientific research principles are also basic cultural ideals ( Luborsky 1994 ). For example, “control” (statistical, personal, machinery), dependence and independence (variables and individual), a reliable person with a valid driver's license matches reliability and validity concerns about assessment scales. Knowledge about the rudimentary principles of research sampling is widespread outside of the research laboratory, particularly with the relatively new popularity of economic, political, and community polls as a staple of news reporting and political process in democratic governance. Core questions about the size, sources, and features of participants are applied to construct research populations, courtroom juries, and districts to serve as electoral universes for politicians.

The cultural contexts and popular notions about sampling and sample size have an impact on scientific judgments. It is important to acknowledge the presence and influence of generalized social sensibilities or awareness about sampling issues. Such notions may have less direct impact on research in fields with long-established and formalized criteria and procedures for determining sample size and composition. The generalized social notions may come to exert a greater influence as one moves across the spectrum of knowledge-building strategies to more qualitative and humanistic approaches. Even though such studies also have a long history of clearly articulated traditions of formal critiques (e.g., in philosophy and literary criticism), they have not been amenable to operationalization and quantification.

The authors suggested that some of the rancor between qualitative and quantitative approaches is rooted in deeper cultural tensions. Prototypic questions posed to qualitative research in interdisciplinary settings derive from both the application of frameworks derived from other disciplines' approaches to sampling as well as those of the reviewers as persons socialized into the community where the study is conceived and conducted. Such concerns may be irrelevant or even counterproductive.

Qualitative Clarity as an Analog to Statistical Power

The guiding logic of qualitative research, by design, generally prevents it from being able to fulfill the assumptions underlying statistical power analyses of research designs. The discovery-oriented goals, use of meanings as units of analyses, and interpretive methods of qualitative research dictate that the exact factors, dimensions, and distribution of phenomena identified as important for analyses may not always be specified prior to data analyses activities. These emerge from the data analyses and are one of the major contributions of qualitative study. No standardized scales or tests exist yet to identify and describe new arenas of cultural, social, or personal meanings. Meaning does not conform to normative distributions by known factors. No probability models exist that would enable prediction of distributions of meanings needed to perform statistical power analyses.

Qualitative studies however can, and should, be judged in terms of how well they meet the explicit goals and purposes relevant to such research.

The authors have suggested that the concept of qualitative clarity be developed to guide evaluations of sampling as an analog to the concept of statistical power. Qualitative clarity refers to principles that are relevant to the concerns of this type of research. That is, the adequacy of the strength and flexibility of the analytic tools used to develop knowledge during discovery procedures and interpretation can be evaluated even if the factors to be measured cannot be specified. The term clarity conveys the aim of making explicit, for open discussion, the details of how the sample was assembled, the theoretical assumptions and the pragmatic constraints that influenced the sampling process. Qualitative clarity should include at least two components, theoretical grounding and sensitivity to context. These are briefly described next.

Rich and diverse theoretical grounding

In the absence of standardized measures for assessing meaning, the analogous qualitative research tools are theory and discovery processes. Strong and well-developed theoretical preparation is necessary to provide multiple and alternative interpretations of the data. Traditionally, in qualitative study, it is the richness and sophistication of the analytic perspectives or “lenses” focused on the data that lends richness, credibility, and validity to the analyses. The relative degree of theoretical development in a research proposal or manuscript is readily apparent in the text, for example, in terms of extended descriptions of different schools of thought and possible multiple contrasting of interpretive explanations for phenomena at hand. In brief, the authors argue that given the stated goal of sampling for meaning, qualitative research can be evaluated to assess if it has adequate numbers of conceptual perspectives that will enable the study to identify a variety of meanings and to critique multiple rich interpretations of the meanings.

Sampling within the data is another important design feature. The discovery of meaning should also include sampling within the data collected. The entire set of qualitative materials should be examined rather than selectively read after identifying certain parts of the text to describe and confirm a finding without reading for sections that may provide alternative or contradictory interpretations.

Sensitivity to contexts

As a second component of qualitative clarity, sensitivity to context refers to the contextual dimensions shaping the meanings studied. It also refers to the historical settings of the scientific concepts used to frame the research questions and the methods. Researchers need to be continually attentive to examining the meanings and categories discovered for elements from the researchers' own cultural and personal backgrounds. The first of these contexts is familiar to gerontologists: patterns constructed by the individual's life history; generation; cohort; psychological, developmental, and social structure; and health. Another more implicit contextual aspect to examine as part of the qualitative clarity analysis is evidence of a critical view of the methods and theories introduced by the investigators. Because discovery of the insiders' perspective on cultural and personal meanings is a goal of qualitative study, it is important to keep an eye to biases derived from the intrusion of the researcher's own scientific categories. Qualitative research requires a critical stance as to both the kinds of information and the meanings discovered, and to the analytic categories guiding the interpretations. One example is recent work that illustrates how traditional gerontological constructs for data collection and analyses do not correspond to the ways individuals themselves interpret their own activities, conditions, or label their identities (e.g., “caregiver,” Abel 1991 ; “disabled,” Murphy 1987 ; “old and alone,” Rubinstein, 1986 ; “Alzheimer's disease,” Gubrium 1992 ; “life themes,” Luborsky 1993b ). A second example is the growing awareness of the extent to which past research tended to define problems of disability or depression narrowly in terms of the individual's ability, or failure, to adjust, without giving adequate attention to the societal level sources of the individual's distress ( Cohen and Sokolovsky 1989 ). Thus researchers need to demonstrate an awareness of how the particular questions guiding qualitative research, the methods and styles of analyses, are influenced by cultural and historical settings of the research ( Luborsky and Sankar 1993 ) in order to keep clear whose meanings are being reported.

To conclude, our outline for the concept of qualitative clarity, which is intended to serve as the qualitatively appropriate analog to statistical power, is offered to gerontologists as a summary of the main points that need to be considered when evaluating samples for qualitative research. The descriptions of qualitative sampling in this article are meant to extend the discussion and to encourage the continued development of more explicit methods for qualitative research.

Acknowledgments

Support for the first author by the National Institute of Child Health and Human Development (#RO1 HD31526) and the National Institute on Aging (#RO1 AG09065) is gratefully acknowledged. Ongoing support for the second author from the National Institute of Aging is also gratefully acknowledged.

Biographies

Mark R. Luborsky, Ph.D., is a senior research anthropologist and assistant director of research at the Philadelphia Geriatric Center. Federal and foundation grants support his studies of sociocultural values and personal meanings in early and late adulthood, and how these relate to mental and physical health, and to disability and rehabilitation processes. He also consults and teaches on these topics.

Robert L. Rubinstein, Ph.D., is a senior research anthropologist and director of research at the Philadelphia Geriatric Center. He has conducted research in the United States and Vanuatu, South Pacific Islands. His gerontological research interests include social relations of the elderly, childlessness in later life, and the home environments of old people.

  • Abel Emily. Who Cares for the Elderly. Temple University Press; Philadelphia: 1991. [ Google Scholar ]
  • Alexander Baine, Rubinstein Robert, Goodman Marcene, Luborsky Mark. A Path Not Taken: A Cultural Analysis of Regrets and Childlessness in the Lives of Older Women. The Gerontologist. 1992; 32 (5):618–26. [ PubMed ] [ Google Scholar ]
  • American Psychiatric Association (APA) Diagnostic and Statistical Manual of Mental Disorders DSMIII-R revised. APA; Washington, DC: 1987. [ Google Scholar ]
  • Antonovsky Aaron. Unraveling the Mystery of Health. Jossey-Bass; San Francisco: 1987. [ Google Scholar ]
  • Atchley Robert. A Continuity Theory of Aging. The Gerontologist. 1988; 29 (2):183–90. [ PubMed ] [ Google Scholar ]
  • Becker Gaylene. Continuity After a Stroke: Implications of Life-Course Disruptions in Old Age. The Gerontologist. 1993; 33 (2):148–58. [ PubMed ] [ Google Scholar ]
  • Bott Elizabeth. Family and Social Networks. Tavistock; London: 1971. [ Google Scholar ]
  • Cohen Carl, Sokolovsky Jay. Old Men of the Bowery. Guilford; New York: 1989. [ Google Scholar ]
  • Cohler Bertram. The Life Story and the Study of Resilience and Response to Adversity. Journal of Life History and Narrative. 1991; 1 (2&3):169–200. [ Google Scholar ]
  • Crapanzano Vincent. The Self, the Third, and Desire. In: Lee B, editor. Psychosocial Theories of the Self. Plenum; New York: 1982. [ Google Scholar ]
  • D'Andrade Roy. Cultural Meaning Systems. In: Shweder R, LeVine R, editors. Culture Theory: Essays on Mind, Self, and Emotion. Cambridge Press; New York: 1984. [ Google Scholar ]
  • Denzin Norman, Lincoln Yolanda. Handbook of Qualitative Research. Sage; Thousand Oaks, CA: 1994. [ Google Scholar ]
  • DePoy Elizabeth, Gitlin Laura. Introduction to Research: Multiple Strategies for Health and Human Services. Mosby; St. Louis, MO: 1993. [ Google Scholar ]
  • Edgerton Robert. The Cloak of Competence. University of California Press; Berkeley: 1967. [ Google Scholar ]
  • Esteroff Susan. Making It Crazy: An Ethnography of Psychiatric Patients in an American Community. University of California Press; Berkeley: 1982. [ Google Scholar ]
  • Evans-Pritchard Edmund E. The Nuer: A Description of the Livelihood and Political Institutions of a Nilotic People. Cambridge University Press; Cambridge, England: 1940. [ Google Scholar ]
  • Fortes Meyer. Kinship and the Social Order. Aldine; Chicago: 1969. [ Google Scholar ]
  • Geertz Clifford. The Interpretation of Culture. Basic Books; New York: 1973. [ Google Scholar ]
  • Good Byron, Good Mary-Jo Delveechio. Toward a Meaning-Centered Analysis of Popular Illness Categories. In: Marsella A, White G, editors. Cultural Conceptions of Mental Health and Therapy. Reidel; Dordrecht, Holland: 1982. [ Google Scholar ]
  • Goodenough Ward. Residence Rules. Southwestern Journal of Anthropology. 1956; 12 (1):22–37. [ Google Scholar ]
  • Gross Daniel. Time Allocation: A Tool for the Study of Cultural Behavior. Annual Review of Anthropology. 1984; 13 :519–58. [ Google Scholar ]
  • Gubrium Jay. The Mosaic of Care. Springer; New York: 1992. [ Google Scholar ]
  • Gubrium Jaber, Sankar Andrea. Qualitative Methods in Aging Research. Sage; Thousand Oaks, CA: 1994. [ Google Scholar ]
  • Johnson John. Selecting Ethnographic Informants. Sage; Thousand Oaks, CA: 1990. [ Google Scholar ]
  • Kaufert Joseph, Locker David. Rehabilitation Ideology and Respiratory Support Technology. Social Science and Medicine. 1990; 30 (8):867–77. [ PubMed ] [ Google Scholar ]
  • Kaufman Sharon. The Ageless Self: Sources of Meaning in Late Life. University of Wisconsin Press; Madison: 1987. [ Google Scholar ]
  • Kaufman Sharon. Long-Term Impact of Injury on Individuals, Families, and Society: Personal Narratives and Policy Implications. In: Rich D, MacKenzie Ellen, Associates, editors. Cost of Injury in the United States: A Report to Congress. Institute for Health and Aging, University of California Press; Injury Prevention Center, Johns Hopkins University Press; San Francisco, CA: 1989. [ Google Scholar ]
  • Lieberson Stanley. Small N's and Big Conclusions. In: Ragin C, Becker H, editors. What is a Case? Cambridge University Press; Cambridge, England: 1992. [ Google Scholar ]
  • Luborsky Mark. Sociocultural Factors Shaping Technology Usage: Fulfilling the Promise. Technology and Disability. 1993a; 2 (1):71–8. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Luborsky Mark. The Romance With Personal Meaning in Gerontology: Cultural Aspects of Life Themes. The Gerontologist. 1993b; 33 (4):445–52. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Luborsky Mark. The Identification and Analysis of Themes and Patterns. In: Gubrium J, Sankar A, editors. Qualitative Methods in Aging Research. Sage; Thousand Oaks, CA: 1994. [ Google Scholar ]
  • Luborsky Mark, Rubinstein Robert. Ethnicity and Lifetimes: Self Concepts and Situational Contexts of Ethnic Identity in Late Life. In: Gelfand D, Barresi C, editors. Ethnic Dimensions of Aging. Springer; New York: 1987. [ Google Scholar ]
  • Luborsky Mark, Rubinstein Robert. Ethnic Identity and Bereavement in Later Life: The Case of Older Widowers. In: Sokolovsky J, editor. The Cultural Context of Aging: Worldwide Perspectives. Bergin and Garvey; New York: 1990. [ Google Scholar ]
  • Luborsky Mark, Sankar Andrea. Extending the Critical Gerontology Perspective: Cultural Dimensions. The Gerontologist. 1993; 33 (4):440–4. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Mead Margaret. National Character. In: Kroeber A, editor. Anthropology Today. University of Chicago Press; Chicago: 1953. [ Google Scholar ]
  • Miles M, Huberman A. Qualitative Data Analysis. Sage; Thousand Oaks, CA: 1994. [ Google Scholar ]
  • Mishler Elliott. Research Interviewing. Harvard University Press; Cambridge, MA: 1986. [ Google Scholar ]
  • Morse Janet. Designing Funded Qualitative Research. In: Denzin N, Lincoln Y, editors. Handbook of Qualitative Research. Sage; Thousand Oaks, CA: 1994. [ Google Scholar ]
  • Murphy Robert. The Body Silent. Columbia University Press; New York: 1987. [ Google Scholar ]
  • Osgood Charles, Succi G, Tannenbaum P. The Measurement of Meaning. University of Illinois Press; Urbana: 1957. [ Google Scholar ]
  • Patton Michael. Qualitative Evaluation and Research Methods. Sage; Thousand Oaks, CA: 1990. [ Google Scholar ]
  • Pelto Peter, Pelto Gertrude. Anthropological Research: The Structure of Inquiry. 2nd ed. Cambridge University Press; Cambridge, England: 1978. [ Google Scholar ]
  • Platt Joseph. Cases of Cases. In: Ragin C, Becker H, editors. What is a Case? Cambridge University Press; Cambridge, England: 1992. [ Google Scholar ]
  • Ragin Charles, Becker Howard. What is a Case?: Exploring the Foundations of Social Inquiry. Cambridge University Press; Cambridge, England: 1992. [ Google Scholar ]
  • Rubinstein Robert. Singular Paths: Old Men Living Alone. Columbia University Press; New York: 1986. [ Google Scholar ]
  • Rubinstein Robert. The Environmental Representation of Personal Themes by Older People. Journal of Aging Studies. 1990; 4 (2):131–8. [ Google Scholar ]
  • Rubinstein Robert. Proposal Writing. In: Gubrium J, Sankar A, editors. Qualitative Research Methods in Aging Research. Sage; Thousand Oaks, CA: 1994. [ Google Scholar ]
  • Scheer Jessica, Luborsky Mark. The Cultural Context of Polio Biographies. Orthopedics. 1991; 14 (11):1173–81. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Spradley James. The Ethnographic Interview. Holt, Rinehart & Winston; New York: 1979. [ Google Scholar ]
  • Spradley James. Participant Observation. Holt, Rinehart & Winston; New York: 1980. [ Google Scholar ]
  • Stouffer SA. The American Soldier. Vols. 1 & 2. Wiley; New York: 1949. 1965. [ Google Scholar ]
  • Strauss Anselm, Corbin Juliet. Basics of Qualitative Research: Grounded Theory Procedures and Techniques. Sage; Thousand Oaks, CA: 1990. [ Google Scholar ]
  • Trotter Robert. Ethnographic Research Methods for Applied Medical Anthropology. In: Hill C, editor. Training Manual in Applied Medical Anthropology. American Anthropological Association; Washington, DC: 1991. [ Google Scholar ]
  • Werner Oswald, Schoepfle George. Systematic Fieldwork. Vols. 1 & 2. Sage; Thousand Oaks, CA: 1987. [ Google Scholar ]
  • Williams Gareth. The Genesis of Chronic Illness: Narrative Reconstruction. Sociology of Health and Illness. 1984; 6 (2):175–200. [ PubMed ] [ Google Scholar ]
  • Zola Irving. Missing Pieces: Chronicle of Living With a Disability. Temple University Press; Philadephia: 1982. [ Google Scholar ]
  • Research article
  • Open access
  • Published: 21 November 2018

Characterising and justifying sample size sufficiency in interview-based studies: systematic analysis of qualitative health research over a 15-year period

  • Konstantina Vasileiou   ORCID: orcid.org/0000-0001-5047-3920 1 ,
  • Julie Barnett 1 ,
  • Susan Thorpe 2 &
  • Terry Young 3  

BMC Medical Research Methodology volume  18 , Article number:  148 ( 2018 ) Cite this article

740k Accesses

1189 Citations

172 Altmetric

Metrics details

Choosing a suitable sample size in qualitative research is an area of conceptual debate and practical uncertainty. That sample size principles, guidelines and tools have been developed to enable researchers to set, and justify the acceptability of, their sample size is an indication that the issue constitutes an important marker of the quality of qualitative research. Nevertheless, research shows that sample size sufficiency reporting is often poor, if not absent, across a range of disciplinary fields.

A systematic analysis of single-interview-per-participant designs within three health-related journals from the disciplines of psychology, sociology and medicine, over a 15-year period, was conducted to examine whether and how sample sizes were justified and how sample size was characterised and discussed by authors. Data pertinent to sample size were extracted and analysed using qualitative and quantitative analytic techniques.

Our findings demonstrate that provision of sample size justifications in qualitative health research is limited; is not contingent on the number of interviews; and relates to the journal of publication. Defence of sample size was most frequently supported across all three journals with reference to the principle of saturation and to pragmatic considerations. Qualitative sample sizes were predominantly – and often without justification – characterised as insufficient (i.e., ‘small’) and discussed in the context of study limitations. Sample size insufficiency was seen to threaten the validity and generalizability of studies’ results, with the latter being frequently conceived in nomothetic terms.

Conclusions

We recommend, firstly, that qualitative health researchers be more transparent about evaluations of their sample size sufficiency, situating these within broader and more encompassing assessments of data adequacy . Secondly, we invite researchers critically to consider how saturation parameters found in prior methodological studies and sample size community norms might best inform, and apply to, their own project and encourage that data adequacy is best appraised with reference to features that are intrinsic to the study at hand. Finally, those reviewing papers have a vital role in supporting and encouraging transparent study-specific reporting.

Peer Review reports

Sample adequacy in qualitative inquiry pertains to the appropriateness of the sample composition and size . It is an important consideration in evaluations of the quality and trustworthiness of much qualitative research [ 1 ] and is implicated – particularly for research that is situated within a post-positivist tradition and retains a degree of commitment to realist ontological premises – in appraisals of validity and generalizability [ 2 , 3 , 4 , 5 ].

Samples in qualitative research tend to be small in order to support the depth of case-oriented analysis that is fundamental to this mode of inquiry [ 5 ]. Additionally, qualitative samples are purposive, that is, selected by virtue of their capacity to provide richly-textured information, relevant to the phenomenon under investigation. As a result, purposive sampling [ 6 , 7 ] – as opposed to probability sampling employed in quantitative research – selects ‘information-rich’ cases [ 8 ]. Indeed, recent research demonstrates the greater efficiency of purposive sampling compared to random sampling in qualitative studies [ 9 ], supporting related assertions long put forward by qualitative methodologists.

Sample size in qualitative research has been the subject of enduring discussions [ 4 , 10 , 11 ]. Whilst the quantitative research community has established relatively straightforward statistics-based rules to set sample sizes precisely, the intricacies of qualitative sample size determination and assessment arise from the methodological, theoretical, epistemological, and ideological pluralism that characterises qualitative inquiry (for a discussion focused on the discipline of psychology see [ 12 ]). This mitigates against clear-cut guidelines, invariably applied. Despite these challenges, various conceptual developments have sought to address this issue, with guidance and principles [ 4 , 10 , 11 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 ], and more recently, an evidence-based approach to sample size determination seeks to ground the discussion empirically [ 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 ].

Focusing on single-interview-per-participant qualitative designs, the present study aims to further contribute to the dialogue of sample size in qualitative research by offering empirical evidence around justification practices associated with sample size. We next review the existing conceptual and empirical literature on sample size determination.

Sample size in qualitative research: Conceptual developments and empirical investigations

Qualitative research experts argue that there is no straightforward answer to the question of ‘how many’ and that sample size is contingent on a number of factors relating to epistemological, methodological and practical issues [ 36 ]. Sandelowski [ 4 ] recommends that qualitative sample sizes are large enough to allow the unfolding of a ‘new and richly textured understanding’ of the phenomenon under study, but small enough so that the ‘deep, case-oriented analysis’ (p. 183) of qualitative data is not precluded. Morse [ 11 ] posits that the more useable data are collected from each person, the fewer participants are needed. She invites researchers to take into account parameters, such as the scope of study, the nature of topic (i.e. complexity, accessibility), the quality of data, and the study design. Indeed, the level of structure of questions in qualitative interviewing has been found to influence the richness of data generated [ 37 ], and so, requires attention; empirical research shows that open questions, which are asked later on in the interview, tend to produce richer data [ 37 ].

Beyond such guidance, specific numerical recommendations have also been proffered, often based on experts’ experience of qualitative research. For example, Green and Thorogood [ 38 ] maintain that the experience of most qualitative researchers conducting an interview-based study with a fairly specific research question is that little new information is generated after interviewing 20 people or so belonging to one analytically relevant participant ‘category’ (pp. 102–104). Ritchie et al. [ 39 ] suggest that studies employing individual interviews conduct no more than 50 interviews so that researchers are able to manage the complexity of the analytic task. Similarly, Britten [ 40 ] notes that large interview studies will often comprise of 50 to 60 people. Experts have also offered numerical guidelines tailored to different theoretical and methodological traditions and specific research approaches, e.g. grounded theory, phenomenology [ 11 , 41 ]. More recently, a quantitative tool was proposed [ 42 ] to support a priori sample size determination based on estimates of the prevalence of themes in the population. Nevertheless, this more formulaic approach raised criticisms relating to assumptions about the conceptual [ 43 ] and ontological status of ‘themes’ [ 44 ] and the linearity ascribed to the processes of sampling, data collection and data analysis [ 45 ].

In terms of principles, Lincoln and Guba [ 17 ] proposed that sample size determination be guided by the criterion of informational redundancy , that is, sampling can be terminated when no new information is elicited by sampling more units. Following the logic of informational comprehensiveness Malterud et al. [ 18 ] introduced the concept of information power as a pragmatic guiding principle, suggesting that the more information power the sample provides, the smaller the sample size needs to be, and vice versa.

Undoubtedly, the most widely used principle for determining sample size and evaluating its sufficiency is that of saturation . The notion of saturation originates in grounded theory [ 15 ] – a qualitative methodological approach explicitly concerned with empirically-derived theory development – and is inextricably linked to theoretical sampling. Theoretical sampling describes an iterative process of data collection, data analysis and theory development whereby data collection is governed by emerging theory rather than predefined characteristics of the population. Grounded theory saturation (often called theoretical saturation) concerns the theoretical categories – as opposed to data – that are being developed and becomes evident when ‘gathering fresh data no longer sparks new theoretical insights, nor reveals new properties of your core theoretical categories’ [ 46 p. 113]. Saturation in grounded theory, therefore, does not equate to the more common focus on data repetition and moves beyond a singular focus on sample size as the justification of sampling adequacy [ 46 , 47 ]. Sample size in grounded theory cannot be determined a priori as it is contingent on the evolving theoretical categories.

Saturation – often under the terms of ‘data’ or ‘thematic’ saturation – has diffused into several qualitative communities beyond its origins in grounded theory. Alongside the expansion of its meaning, being variously equated with ‘no new data’, ‘no new themes’, and ‘no new codes’, saturation has emerged as the ‘gold standard’ in qualitative inquiry [ 2 , 26 ]. Nevertheless, and as Morse [ 48 ] asserts, whilst saturation is the most frequently invoked ‘guarantee of qualitative rigor’, ‘it is the one we know least about’ (p. 587). Certainly researchers caution that saturation is less applicable to, or appropriate for, particular types of qualitative research (e.g. conversation analysis, [ 49 ]; phenomenological research, [ 50 ]) whilst others reject the concept altogether [ 19 , 51 ].

Methodological studies in this area aim to provide guidance about saturation and develop a practical application of processes that ‘operationalise’ and evidence saturation. Guest, Bunce, and Johnson [ 26 ] analysed 60 interviews and found that saturation of themes was reached by the twelfth interview. They noted that their sample was relatively homogeneous, their research aims focused, so studies of more heterogeneous samples and with a broader scope would be likely to need a larger size to achieve saturation. Extending the enquiry to multi-site, cross-cultural research, Hagaman and Wutich [ 28 ] showed that sample sizes of 20 to 40 interviews were required to achieve data saturation of meta-themes that cut across research sites. In a theory-driven content analysis, Francis et al. [ 25 ] reached data saturation at the 17th interview for all their pre-determined theoretical constructs. The authors further proposed two main principles upon which specification of saturation be based: (a) researchers should a priori specify an initial analysis sample (e.g. 10 interviews) which will be used for the first round of analysis and (b) a stopping criterion , that is, a number of interviews (e.g. 3) that needs to be further conducted, the analysis of which will not yield any new themes or ideas. For greater transparency, Francis et al. [ 25 ] recommend that researchers present cumulative frequency graphs supporting their judgment that saturation was achieved. A comparative method for themes saturation (CoMeTS) has also been suggested [ 23 ] whereby the findings of each new interview are compared with those that have already emerged and if it does not yield any new theme, the ‘saturated terrain’ is assumed to have been established. Because the order in which interviews are analysed can influence saturation thresholds depending on the richness of the data, Constantinou et al. [ 23 ] recommend reordering and re-analysing interviews to confirm saturation. Hennink, Kaiser and Marconi’s [ 29 ] methodological study sheds further light on the problem of specifying and demonstrating saturation. Their analysis of interview data showed that code saturation (i.e. the point at which no additional issues are identified) was achieved at 9 interviews, but meaning saturation (i.e. the point at which no further dimensions, nuances, or insights of issues are identified) required 16–24 interviews. Although breadth can be achieved relatively soon, especially for high-prevalence and concrete codes, depth requires additional data, especially for codes of a more conceptual nature.

Critiquing the concept of saturation, Nelson [ 19 ] proposes five conceptual depth criteria in grounded theory projects to assess the robustness of the developing theory: (a) theoretical concepts should be supported by a wide range of evidence drawn from the data; (b) be demonstrably part of a network of inter-connected concepts; (c) demonstrate subtlety; (d) resonate with existing literature; and (e) can be successfully submitted to tests of external validity.

Other work has sought to examine practices of sample size reporting and sufficiency assessment across a range of disciplinary fields and research domains, from nutrition [ 34 ] and health education [ 32 ], to education and the health sciences [ 22 , 27 ], information systems [ 30 ], organisation and workplace studies [ 33 ], human computer interaction [ 21 ], and accounting studies [ 24 ]. Others investigated PhD qualitative studies [ 31 ] and grounded theory studies [ 35 ]. Incomplete and imprecise sample size reporting is commonly pinpointed by these investigations whilst assessment and justifications of sample size sufficiency are even more sporadic.

Sobal [ 34 ] examined the sample size of qualitative studies published in the Journal of Nutrition Education over a period of 30 years. Studies that employed individual interviews ( n  = 30) had an average sample size of 45 individuals and none of these explicitly reported whether their sample size sought and/or attained saturation. A minority of articles discussed how sample-related limitations (with the latter most often concerning the type of sample, rather than the size) limited generalizability. A further systematic analysis [ 32 ] of health education research over 20 years demonstrated that interview-based studies averaged 104 participants (range 2 to 720 interviewees). However, 40% did not report the number of participants. An examination of 83 qualitative interview studies in leading information systems journals [ 30 ] indicated little defence of sample sizes on the basis of recommendations by qualitative methodologists, prior relevant work, or the criterion of saturation. Rather, sample size seemed to correlate with factors such as the journal of publication or the region of study (US vs Europe vs Asia). These results led the authors to call for more rigor in determining and reporting sample size in qualitative information systems research and to recommend optimal sample size ranges for grounded theory (i.e. 20–30 interviews) and single case (i.e. 15–30 interviews) projects.

Similarly, fewer than 10% of articles in organisation and workplace studies provided a sample size justification relating to existing recommendations by methodologists, prior relevant work, or saturation [ 33 ], whilst only 17% of focus groups studies in health-related journals provided an explanation of sample size (i.e. number of focus groups), with saturation being the most frequently invoked argument, followed by published sample size recommendations and practical reasons [ 22 ]. The notion of saturation was also invoked by 11 out of the 51 most highly cited studies that Guetterman [ 27 ] reviewed in the fields of education and health sciences, of which six were grounded theory studies, four phenomenological and one a narrative inquiry. Finally, analysing 641 interview-based articles in accounting, Dai et al. [ 24 ] called for more rigor since a significant minority of studies did not report precise sample size.

Despite increasing attention to rigor in qualitative research (e.g. [ 52 ]) and more extensive methodological and analytical disclosures that seek to validate qualitative work [ 24 ], sample size reporting and sufficiency assessment remain inconsistent and partial, if not absent, across a range of research domains.

Objectives of the present study

The present study sought to enrich existing systematic analyses of the customs and practices of sample size reporting and justification by focusing on qualitative research relating to health. Additionally, this study attempted to expand previous empirical investigations by examining how qualitative sample sizes are characterised and discussed in academic narratives. Qualitative health research is an inter-disciplinary field that due to its affiliation with medical sciences, often faces views and positions reflective of a quantitative ethos. Thus qualitative health research constitutes an emblematic case that may help to unfold underlying philosophical and methodological differences across the scientific community that are crystallised in considerations of sample size. The present research, therefore, incorporates a comparative element on the basis of three different disciplines engaging with qualitative health research: medicine, psychology, and sociology. We chose to focus our analysis on single-per-participant-interview designs as this not only presents a popular and widespread methodological choice in qualitative health research, but also as the method where consideration of sample size – defined as the number of interviewees – is particularly salient.

Study design

A structured search for articles reporting cross-sectional, interview-based qualitative studies was carried out and eligible reports were systematically reviewed and analysed employing both quantitative and qualitative analytic techniques.

We selected journals which (a) follow a peer review process, (b) are considered high quality and influential in their field as reflected in journal metrics, and (c) are receptive to, and publish, qualitative research (Additional File  1 presents the journals’ editorial positions in relation to qualitative research and sample considerations where available). Three health-related journals were chosen, each representing a different disciplinary field; the British Medical Journal (BMJ) representing medicine, the British Journal of Health Psychology (BJHP) representing psychology, and the Sociology of Health & Illness (SHI) representing sociology.

Search strategy to identify studies

Employing the search function of each individual journal, we used the terms ‘interview*’ AND ‘qualitative’ and limited the results to articles published between 1 January 2003 and 22 September 2017 (i.e. a 15-year review period).

Eligibility criteria

To be eligible for inclusion in the review, the article had to report a cross-sectional study design. Longitudinal studies were thus excluded whilst studies conducted within a broader research programme (e.g. interview studies nested in a trial, as part of a broader ethnography, as part of a longitudinal research) were included if they reported only single-time qualitative interviews. The method of data collection had to be individual, synchronous qualitative interviews (i.e. group interviews, structured interviews and e-mail interviews over a period of time were excluded), and the data had to be analysed qualitatively (i.e. studies that quantified their qualitative data were excluded). Mixed method studies and articles reporting more than one qualitative method of data collection (e.g. individual interviews and focus groups) were excluded. Figure  1 , a PRISMA flow diagram [ 53 ], shows the number of: articles obtained from the searches and screened; papers assessed for eligibility; and articles included in the review (Additional File  2 provides the full list of articles included in the review and their unique identifying code – e.g. BMJ01, BJHP02, SHI03). One review author (KV) assessed the eligibility of all papers identified from the searches. When in doubt, discussions about retaining or excluding articles were held between KV and JB in regular meetings, and decisions were jointly made.

figure 1

PRISMA flow diagram

Data extraction and analysis

A data extraction form was developed (see Additional File  3 ) recording three areas of information: (a) information about the article (e.g. authors, title, journal, year of publication etc.); (b) information about the aims of the study, the sample size and any justification for this, the participant characteristics, the sampling technique and any sample-related observations or comments made by the authors; and (c) information about the method or technique(s) of data analysis, the number of researchers involved in the analysis, the potential use of software, and any discussion around epistemological considerations. The Abstract, Methods and Discussion (and/or Conclusion) sections of each article were examined by one author (KV) who extracted all the relevant information. This was directly copied from the articles and, when appropriate, comments, notes and initial thoughts were written down.

To examine the kinds of sample size justifications provided by articles, an inductive content analysis [ 54 ] was initially conducted. On the basis of this analysis, the categories that expressed qualitatively different sample size justifications were developed.

We also extracted or coded quantitative data regarding the following aspects:

Journal and year of publication

Number of interviews

Number of participants

Presence of sample size justification(s) (Yes/No)

Presence of a particular sample size justification category (Yes/No), and

Number of sample size justifications provided

Descriptive and inferential statistical analyses were used to explore these data.

A thematic analysis [ 55 ] was then performed on all scientific narratives that discussed or commented on the sample size of the study. These narratives were evident both in papers that justified their sample size and those that did not. To identify these narratives, in addition to the methods sections, the discussion sections of the reviewed articles were also examined and relevant data were extracted and analysed.

In total, 214 articles – 21 in the BMJ, 53 in the BJHP and 140 in the SHI – were eligible for inclusion in the review. Table  1 provides basic information about the sample sizes – measured in number of interviews – of the studies reviewed across the three journals. Figure  2 depicts the number of eligible articles published each year per journal.

figure 2

The publication of qualitative studies in the BMJ was significantly reduced from 2012 onwards and this appears to coincide with the initiation of the BMJ Open to which qualitative studies were possibly directed.

Pairwise comparisons following a significant Kruskal-Wallis Footnote 2 test indicated that the studies published in the BJHP had significantly ( p  < .001) smaller samples sizes than those published either in the BMJ or the SHI. Sample sizes of BMJ and SHI articles did not differ significantly from each other.

Sample size justifications: Results from the quantitative and qualitative content analysis

Ten (47.6%) of the 21 BMJ studies, 26 (49.1%) of the 53 BJHP papers and 24 (17.1%) of the 140 SHI articles provided some sort of sample size justification. As shown in Table  2 , the majority of articles which justified their sample size provided one justification (70% of articles); fourteen studies (25%) provided two distinct justifications; one study (1.7%) gave three justifications and two studies (3.3%) expressed four distinct justifications.

There was no association between the number of interviews (i.e. sample size) conducted and the provision of a justification (rpb = .054, p  = .433). Within journals, Mann-Whitney tests indicated that sample sizes of ‘justifying’ and ‘non-justifying’ articles in the BMJ and SHI did not differ significantly from each other. In the BJHP, ‘justifying’ articles ( Mean rank  = 31.3) had significantly larger sample sizes than ‘non-justifying’ studies ( Mean rank  = 22.7; U = 237.000, p  < .05).

There was a significant association between the journal a paper was published in and the provision of a justification (χ 2 (2) = 23.83, p  < .001). BJHP studies provided a sample size justification significantly more often than would be expected ( z  = 2.9); SHI studies significantly less often ( z  = − 2.4). If an article was published in the BJHP, the odds of providing a justification were 4.8 times higher than if published in the SHI. Similarly if published in the BMJ, the odds of a study justifying its sample size were 4.5 times higher than in the SHI.

The qualitative content analysis of the scientific narratives identified eleven different sample size justifications. These are described below and illustrated with excerpts from relevant articles. By way of a summary, the frequency with which these were deployed across the three journals is indicated in Table  3 .

Saturation was the most commonly invoked principle (55.4% of all justifications) deployed by studies across all three journals to justify the sufficiency of their sample size. In the BMJ, two studies claimed that they achieved data saturation (BMJ17; BMJ18) and one article referred descriptively to achieving saturation without explicitly using the term (BMJ13). Interestingly, BMJ13 included data in the analysis beyond the point of saturation in search of ‘unusual/deviant observations’ and with a view to establishing findings consistency.

Thirty three women were approached to take part in the interview study. Twenty seven agreed and 21 (aged 21–64, median 40) were interviewed before data saturation was reached (one tape failure meant that 20 interviews were available for analysis). (BMJ17). No new topics were identified following analysis of approximately two thirds of the interviews; however, all interviews were coded in order to develop a better understanding of how characteristic the views and reported behaviours were, and also to collect further examples of unusual/deviant observations. (BMJ13).

Two articles reported pre-determining their sample size with a view to achieving data saturation (BMJ08 – see extract in section In line with existing research ; BMJ15 – see extract in section Pragmatic considerations ) without further specifying if this was achieved. One paper claimed theoretical saturation (BMJ06) conceived as being when “no further recurring themes emerging from the analysis” whilst another study argued that although the analytic categories were highly saturated, it was not possible to determine whether theoretical saturation had been achieved (BMJ04). One article (BMJ18) cited a reference to support its position on saturation.

In the BJHP, six articles claimed that they achieved data saturation (BJHP21; BJHP32; BJHP39; BJHP48; BJHP49; BJHP52) and one article stated that, given their sample size and the guidelines for achieving data saturation, it anticipated that saturation would be attained (BJHP50).

Recruitment continued until data saturation was reached, defined as the point at which no new themes emerged. (BJHP48). It has previously been recommended that qualitative studies require a minimum sample size of at least 12 to reach data saturation (Clarke & Braun, 2013; Fugard & Potts, 2014; Guest, Bunce, & Johnson, 2006) Therefore, a sample of 13 was deemed sufficient for the qualitative analysis and scale of this study. (BJHP50).

Two studies argued that they achieved thematic saturation (BJHP28 – see extract in section Sample size guidelines ; BJHP31) and one (BJHP30) article, explicitly concerned with theory development and deploying theoretical sampling, claimed both theoretical and data saturation.

The final sample size was determined by thematic saturation, the point at which new data appears to no longer contribute to the findings due to repetition of themes and comments by participants (Morse, 1995). At this point, data generation was terminated. (BJHP31).

Five studies argued that they achieved (BJHP05; BJHP33; BJHP40; BJHP13 – see extract in section Pragmatic considerations ) or anticipated (BJHP46) saturation without any further specification of the term. BJHP17 referred descriptively to a state of achieved saturation without specifically using the term. Saturation of coding , but not saturation of themes, was claimed to have been reached by one article (BJHP18). Two articles explicitly stated that they did not achieve saturation; instead claiming a level of theme completeness (BJHP27) or that themes being replicated (BJHP53) were arguments for sufficiency of their sample size.

Furthermore, data collection ceased on pragmatic grounds rather than at the point when saturation point was reached. Despite this, although nuances within sub-themes were still emerging towards the end of data analysis, the themes themselves were being replicated indicating a level of completeness. (BJHP27).

Finally, one article criticised and explicitly renounced the notion of data saturation claiming that, on the contrary, the criterion of theoretical sufficiency determined its sample size (BJHP16).

According to the original Grounded Theory texts, data collection should continue until there are no new discoveries ( i.e. , ‘data saturation’; Glaser & Strauss, 1967). However, recent revisions of this process have discussed how it is rare that data collection is an exhaustive process and researchers should rely on how well their data are able to create a sufficient theoretical account or ‘theoretical sufficiency’ (Dey, 1999). For this study, it was decided that theoretical sufficiency would guide recruitment, rather than looking for data saturation. (BJHP16).

Ten out of the 20 BJHP articles that employed the argument of saturation used one or more citations relating to this principle.

In the SHI, one article (SHI01) claimed that it achieved category saturation based on authors’ judgment.

This number was not fixed in advance, but was guided by the sampling strategy and the judgement, based on the analysis of the data, of the point at which ‘category saturation’ was achieved. (SHI01).

Three articles described a state of achieved saturation without using the term or specifying what sort of saturation they had achieved (i.e. data, theoretical, thematic saturation) (SHI04; SHI13; SHI30) whilst another four articles explicitly stated that they achieved saturation (SHI100; SHI125; SHI136; SHI137). Two papers stated that they achieved data saturation (SHI73 – see extract in section Sample size guidelines ; SHI113), two claimed theoretical saturation (SHI78; SHI115) and two referred to achieving thematic saturation (SHI87; SHI139) or to saturated themes (SHI29; SHI50).

Recruitment and analysis ceased once theoretical saturation was reached in the categories described below (Lincoln and Guba 1985). (SHI115). The respondents’ quotes drawn on below were chosen as representative, and illustrate saturated themes. (SHI50).

One article stated that thematic saturation was anticipated with its sample size (SHI94). Briefly referring to the difficulty in pinpointing achievement of theoretical saturation, SHI32 (see extract in section Richness and volume of data ) defended the sufficiency of its sample size on the basis of “the high degree of consensus [that] had begun to emerge among those interviewed”, suggesting that information from interviews was being replicated. Finally, SHI112 (see extract in section Further sampling to check findings consistency ) argued that it achieved saturation of discursive patterns . Seven of the 19 SHI articles cited references to support their position on saturation (see Additional File  4 for the full list of citations used by articles to support their position on saturation across the three journals).

Overall, it is clear that the concept of saturation encompassed a wide range of variants expressed in terms such as saturation, data saturation, thematic saturation, theoretical saturation, category saturation, saturation of coding, saturation of discursive themes, theme completeness. It is noteworthy, however, that although these various claims were sometimes supported with reference to the literature, they were not evidenced in relation to the study at hand.

Pragmatic considerations

The determination of sample size on the basis of pragmatic considerations was the second most frequently invoked argument (9.6% of all justifications) appearing in all three journals. In the BMJ, one article (BMJ15) appealed to pragmatic reasons, relating to time constraints and the difficulty to access certain study populations, to justify the determination of its sample size.

On the basis of the researchers’ previous experience and the literature, [30, 31] we estimated that recruitment of 15–20 patients at each site would achieve data saturation when data from each site were analysed separately. We set a target of seven to 10 caregivers per site because of time constraints and the anticipated difficulty of accessing caregivers at some home based care services. This gave a target sample of 75–100 patients and 35–50 caregivers overall. (BMJ15).

In the BJHP, four articles mentioned pragmatic considerations relating to time or financial constraints (BJHP27 – see extract in section Saturation ; BJHP53), the participant response rate (BJHP13), and the fixed (and thus limited) size of the participant pool from which interviewees were sampled (BJHP18).

We had aimed to continue interviewing until we had reached saturation, a point whereby further data collection would yield no further themes. In practice, the number of individuals volunteering to participate dictated when recruitment into the study ceased (15 young people, 15 parents). Nonetheless, by the last few interviews, significant repetition of concepts was occurring, suggesting ample sampling. (BJHP13).

Finally, three SHI articles explained their sample size with reference to practical aspects: time constraints and project manageability (SHI56), limited availability of respondents and project resources (SHI131), and time constraints (SHI113).

The size of the sample was largely determined by the availability of respondents and resources to complete the study. Its composition reflected, as far as practicable, our interest in how contextual factors (for example, gender relations and ethnicity) mediated the illness experience. (SHI131).

Qualities of the analysis

This sample size justification (8.4% of all justifications) was mainly employed by BJHP articles and referred to an intensive, idiographic and/or latently focused analysis, i.e. that moved beyond description. More specifically, six articles defended their sample size on the basis of an intensive analysis of transcripts and/or the idiographic focus of the study/analysis. Four of these papers (BJHP02; BJHP19; BJHP24; BJHP47) adopted an Interpretative Phenomenological Analysis (IPA) approach.

The current study employed a sample of 10 in keeping with the aim of exploring each participant’s account (Smith et al. , 1999). (BJHP19).

BJHP47 explicitly renounced the notion of saturation within an IPA approach. The other two BJHP articles conducted thematic analysis (BJHP34; BJHP38). The level of analysis – i.e. latent as opposed to a more superficial descriptive analysis – was also invoked as a justification by BJHP38 alongside the argument of an intensive analysis of individual transcripts

The resulting sample size was at the lower end of the range of sample sizes employed in thematic analysis (Braun & Clarke, 2013). This was in order to enable significant reflection, dialogue, and time on each transcript and was in line with the more latent level of analysis employed, to identify underlying ideas, rather than a more superficial descriptive analysis (Braun & Clarke, 2006). (BJHP38).

Finally, one BMJ paper (BMJ21) defended its sample size with reference to the complexity of the analytic task.

We stopped recruitment when we reached 30–35 interviews, owing to the depth and duration of interviews, richness of data, and complexity of the analytical task. (BMJ21).

Meet sampling requirements

Meeting sampling requirements (7.2% of all justifications) was another argument employed by two BMJ and four SHI articles to explain their sample size. Achieving maximum variation sampling in terms of specific interviewee characteristics determined and explained the sample size of two BMJ studies (BMJ02; BMJ16 – see extract in section Meet research design requirements ).

Recruitment continued until sampling frame requirements were met for diversity in age, sex, ethnicity, frequency of attendance, and health status. (BMJ02).

Regarding the SHI articles, two papers explained their numbers on the basis of their sampling strategy (SHI01- see extract in section Saturation ; SHI23) whilst sampling requirements that would help attain sample heterogeneity in terms of a particular characteristic of interest was cited by one paper (SHI127).

The combination of matching the recruitment sites for the quantitative research and the additional purposive criteria led to 104 phase 2 interviews (Internet (OLC): 21; Internet (FTF): 20); Gyms (FTF): 23; HIV testing (FTF): 20; HIV treatment (FTF): 20.) (SHI23). Of the fifty interviews conducted, thirty were translated from Spanish into English. These thirty, from which we draw our findings, were chosen for translation based on heterogeneity in depressive symptomology and educational attainment. (SHI127).

Finally, the pre-determination of sample size on the basis of sampling requirements was stated by one article though this was not used to justify the number of interviews (SHI10).

Sample size guidelines

Five BJHP articles (BJHP28; BJHP38 – see extract in section Qualities of the analysis ; BJHP46; BJHP47; BJHP50 – see extract in section Saturation ) and one SHI paper (SHI73) relied on citing existing sample size guidelines or norms within research traditions to determine and subsequently defend their sample size (7.2% of all justifications).

Sample size guidelines suggested a range between 20 and 30 interviews to be adequate (Creswell, 1998). Interviewer and note taker agreed that thematic saturation, the point at which no new concepts emerge from subsequent interviews (Patton, 2002), was achieved following completion of 20 interviews. (BJHP28). Interviewing continued until we deemed data saturation to have been reached (the point at which no new themes were emerging). Researchers have proposed 30 as an approximate or working number of interviews at which one could expect to be reaching theoretical saturation when using a semi-structured interview approach (Morse 2000), although this can vary depending on the heterogeneity of respondents interviewed and complexity of the issues explored. (SHI73).

In line with existing research

Sample sizes of published literature in the area of the subject matter under investigation (3.5% of all justifications) were used by 2 BMJ articles as guidance and a precedent for determining and defending their own sample size (BMJ08; BMJ15 – see extract in section Pragmatic considerations ).

We drew participants from a list of prisoners who were scheduled for release each week, sampling them until we reached the target of 35 cases, with a view to achieving data saturation within the scope of the study and sufficient follow-up interviews and in line with recent studies [8–10]. (BMJ08).

Similarly, BJHP38 (see extract in section Qualities of the analysis ) claimed that its sample size was within the range of sample sizes of published studies that use its analytic approach.

Richness and volume of data

BMJ21 (see extract in section Qualities of the analysis ) and SHI32 referred to the richness, detailed nature, and volume of data collected (2.3% of all justifications) to justify the sufficiency of their sample size.

Although there were more potential interviewees from those contacted by postcode selection, it was decided to stop recruitment after the 10th interview and focus on analysis of this sample. The material collected was considerable and, given the focused nature of the study, extremely detailed. Moreover, a high degree of consensus had begun to emerge among those interviewed, and while it is always difficult to judge at what point ‘theoretical saturation’ has been reached, or how many interviews would be required to uncover exception(s), it was felt the number was sufficient to satisfy the aims of this small in-depth investigation (Strauss and Corbin 1990). (SHI32).

Meet research design requirements

Determination of sample size so that it is in line with, and serves the requirements of, the research design (2.3% of all justifications) that the study adopted was another justification used by 2 BMJ papers (BMJ16; BMJ08 – see extract in section In line with existing research ).

We aimed for diverse, maximum variation samples [20] totalling 80 respondents from different social backgrounds and ethnic groups and those bereaved due to different types of suicide and traumatic death. We could have interviewed a smaller sample at different points in time (a qualitative longitudinal study) but chose instead to seek a broad range of experiences by interviewing those bereaved many years ago and others bereaved more recently; those bereaved in different circumstances and with different relations to the deceased; and people who lived in different parts of the UK; with different support systems and coroners’ procedures (see Tables 1 and 2 for more details). (BMJ16).

Researchers’ previous experience

The researchers’ previous experience (possibly referring to experience with qualitative research) was invoked by BMJ15 (see extract in section Pragmatic considerations ) as a justification for the determination of sample size.

Nature of study

One BJHP paper argued that the sample size was appropriate for the exploratory nature of the study (BJHP38).

A sample of eight participants was deemed appropriate because of the exploratory nature of this research and the focus on identifying underlying ideas about the topic. (BJHP38).

Further sampling to check findings consistency

Finally, SHI112 argued that once it had achieved saturation of discursive patterns, further sampling was decided and conducted to check for consistency of the findings.

Within each of the age-stratified groups, interviews were randomly sampled until saturation of discursive patterns was achieved. This resulted in a sample of 67 interviews. Once this sample had been analysed, one further interview from each age-stratified group was randomly chosen to check for consistency of the findings. Using this approach it was possible to more carefully explore children’s discourse about the ‘I’, agency, relationality and power in the thematic areas, revealing the subtle discursive variations described in this article. (SHI112).

Thematic analysis of passages discussing sample size

This analysis resulted in two overarching thematic areas; the first concerned the variation in the characterisation of sample size sufficiency, and the second related to the perceived threats deriving from sample size insufficiency.

Characterisations of sample size sufficiency

The analysis showed that there were three main characterisations of the sample size in the articles that provided relevant comments and discussion: (a) the vast majority of these qualitative studies ( n  = 42) considered their sample size as ‘small’ and this was seen and discussed as a limitation; only two articles viewed their small sample size as desirable and appropriate (b) a minority of articles ( n  = 4) proclaimed that their achieved sample size was ‘sufficient’; and (c) finally, a small group of studies ( n  = 5) characterised their sample size as ‘large’. Whilst achieving a ‘large’ sample size was sometimes viewed positively because it led to richer results, there were also occasions when a large sample size was problematic rather than desirable.

‘Small’ but why and for whom?

A number of articles which characterised their sample size as ‘small’ did so against an implicit or explicit quantitative framework of reference. Interestingly, three studies that claimed to have achieved data saturation or ‘theoretical sufficiency’ with their sample size, discussed or noted as a limitation in their discussion their ‘small’ sample size, raising the question of why, or for whom, the sample size was considered small given that the qualitative criterion of saturation had been satisfied.

The current study has a number of limitations. The sample size was small (n = 11) and, however, large enough for no new themes to emerge. (BJHP39). The study has two principal limitations. The first of these relates to the small number of respondents who took part in the study. (SHI73).

Other articles appeared to accept and acknowledge that their sample was flawed because of its small size (as well as other compositional ‘deficits’ e.g. non-representativeness, biases, self-selection) or anticipated that they might be criticized for their small sample size. It seemed that the imagined audience – perhaps reviewer or reader – was one inclined to hold the tenets of quantitative research, and certainly one to whom it was important to indicate the recognition that small samples were likely to be problematic. That one’s sample might be thought small was often construed as a limitation couched in a discourse of regret or apology.

Very occasionally, the articulation of the small size as a limitation was explicitly aligned against an espoused positivist framework and quantitative research.

This study has some limitations. Firstly, the 100 incidents sample represents a small number of the total number of serious incidents that occurs every year. 26 We sent out a nationwide invitation and do not know why more people did not volunteer for the study. Our lack of epidemiological knowledge about healthcare incidents, however, means that determining an appropriate sample size continues to be difficult. (BMJ20).

Indicative of an apparent oscillation of qualitative researchers between the different requirements and protocols demarcating the quantitative and qualitative worlds, there were a few instances of articles which briefly recognised their ‘small’ sample size as a limitation, but then defended their study on more qualitative grounds, such as their ability and success at capturing the complexity of experience and delving into the idiographic, and at generating particularly rich data.

This research, while limited in size, has sought to capture some of the complexity attached to men’s attitudes and experiences concerning incomes and material circumstances. (SHI35). Our numbers are small because negotiating access to social networks was slow and labour intensive, but our methods generated exceptionally rich data. (BMJ21). This study could be criticised for using a small and unrepresentative sample. Given that older adults have been ignored in the research concerning suntanning, fair-skinned older adults are the most likely to experience skin cancer, and women privilege appearance over health when it comes to sunbathing practices, our study offers depth and richness of data in a demographic group much in need of research attention. (SHI57).

‘Good enough’ sample sizes

Only four articles expressed some degree of confidence that their achieved sample size was sufficient. For example, SHI139, in line with the justification of thematic saturation that it offered, expressed trust in its sample size sufficiency despite the poor response rate. Similarly, BJHP04, which did not provide a sample size justification, argued that it targeted a larger sample size in order to eventually recruit a sufficient number of interviewees, due to anticipated low response rate.

Twenty-three people with type I diabetes from the target population of 133 ( i.e. 17.3%) consented to participate but four did not then respond to further contacts (total N = 19). The relatively low response rate was anticipated, due to the busy life-styles of young people in the age range, the geographical constraints, and the time required to participate in a semi-structured interview, so a larger target sample allowed a sufficient number of participants to be recruited. (BJHP04).

Two other articles (BJHP35; SHI32) linked the claimed sufficiency to the scope (i.e. ‘small, in-depth investigation’), aims and nature (i.e. ‘exploratory’) of their studies, thus anchoring their numbers to the particular context of their research. Nevertheless, claims of sample size sufficiency were sometimes undermined when they were juxtaposed with an acknowledgement that a larger sample size would be more scientifically productive.

Although our sample size was sufficient for this exploratory study, a more diverse sample including participants with lower socioeconomic status and more ethnic variation would be informative. A larger sample could also ensure inclusion of a more representative range of apps operating on a wider range of platforms. (BJHP35).

‘Large’ sample sizes - Promise or peril?

Three articles (BMJ13; BJHP05; BJHP48) which all provided the justification of saturation, characterised their sample size as ‘large’ and narrated this oversufficiency in positive terms as it allowed richer data and findings and enhanced the potential for generalisation. The type of generalisation aspired to (BJHP48) was not further specified however.

This study used rich data provided by a relatively large sample of expert informants on an important but under-researched topic. (BMJ13). Qualitative research provides a unique opportunity to understand a clinical problem from the patient’s perspective. This study had a large diverse sample, recruited through a range of locations and used in-depth interviews which enhance the richness and generalizability of the results. (BJHP48).

And whilst a ‘large’ sample size was endorsed and valued by some qualitative researchers, within the psychological tradition of IPA, a ‘large’ sample size was counter-normative and therefore needed to be justified. Four BJHP studies, all adopting IPA, expressed the appropriateness or desirability of ‘small’ sample sizes (BJHP41; BJHP45) or hastened to explain why they included a larger than typical sample size (BJHP32; BJHP47). For example, BJHP32 below provides a rationale for how an IPA study can accommodate a large sample size and how this was indeed suitable for the purposes of the particular research. To strengthen the explanation for choosing a non-normative sample size, previous IPA research citing a similar sample size approach is used as a precedent.

Small scale IPA studies allow in-depth analysis which would not be possible with larger samples (Smith et al. , 2009). (BJHP41). Although IPA generally involves intense scrutiny of a small number of transcripts, it was decided to recruit a larger diverse sample as this is the first qualitative study of this population in the United Kingdom (as far as we know) and we wanted to gain an overview. Indeed, Smith, Flowers, and Larkin (2009) agree that IPA is suitable for larger groups. However, the emphasis changes from an in-depth individualistic analysis to one in which common themes from shared experiences of a group of people can be elicited and used to understand the network of relationships between themes that emerge from the interviews. This large-scale format of IPA has been used by other researchers in the field of false-positive research. Baillie, Smith, Hewison, and Mason (2000) conducted an IPA study, with 24 participants, of ultrasound screening for chromosomal abnormality; they found that this larger number of participants enabled them to produce a more refined and cohesive account. (BJHP32).

The IPA articles found in the BJHP were the only instances where a ‘small’ sample size was advocated and a ‘large’ sample size problematized and defended. These IPA studies illustrate that the characterisation of sample size sufficiency can be a function of researchers’ theoretical and epistemological commitments rather than the result of an ‘objective’ sample size assessment.

Threats from sample size insufficiency

As shown above, the majority of articles that commented on their sample size, simultaneously characterized it as small and problematic. On those occasions that authors did not simply cite their ‘small’ sample size as a study limitation but rather continued and provided an account of how and why a small sample size was problematic, two important scientific qualities of the research seemed to be threatened: the generalizability and validity of results.

Generalizability

Those who characterised their sample as ‘small’ connected this to the limited potential for generalization of the results. Other features related to the sample – often some kind of compositional particularity – were also linked to limited potential for generalisation. Though not always explicitly articulated to what form of generalisation the articles referred to (see BJHP09), generalisation was mostly conceived in nomothetic terms, that is, it concerned the potential to draw inferences from the sample to the broader study population (‘representational generalisation’ – see BJHP31) and less often to other populations or cultures.

It must be noted that samples are small and whilst in both groups the majority of those women eligible participated, generalizability cannot be assumed. (BJHP09). The study’s limitations should be acknowledged: Data are presented from interviews with a relatively small group of participants, and thus, the views are not necessarily generalizable to all patients and clinicians. In particular, patients were only recruited from secondary care services where COFP diagnoses are typically confirmed. The sample therefore is unlikely to represent the full spectrum of patients, particularly those who are not referred to, or who have been discharged from dental services. (BJHP31).

Without explicitly using the term generalisation, two SHI articles noted how their ‘small’ sample size imposed limits on ‘the extent that we can extrapolate from these participants’ accounts’ (SHI114) or to the possibility ‘to draw far-reaching conclusions from the results’ (SHI124).

Interestingly, only a minority of articles alluded to, or invoked, a type of generalisation that is aligned with qualitative research, that is, idiographic generalisation (i.e. generalisation that can be made from and about cases [ 5 ]). These articles, all published in the discipline of sociology, defended their findings in terms of the possibility of drawing logical and conceptual inferences to other contexts and of generating understanding that has the potential to advance knowledge, despite their ‘small’ size. One article (SHI139) clearly contrasted nomothetic (statistical) generalisation to idiographic generalisation, arguing that the lack of statistical generalizability does not nullify the ability of qualitative research to still be relevant beyond the sample studied.

Further, these data do not need to be statistically generalisable for us to draw inferences that may advance medicalisation analyses (Charmaz 2014). These data may be seen as an opportunity to generate further hypotheses and are a unique application of the medicalisation framework. (SHI139). Although a small-scale qualitative study related to school counselling, this analysis can be usefully regarded as a case study of the successful utilisation of mental health-related resources by adolescents. As many of the issues explored are of relevance to mental health stigma more generally, it may also provide insights into adult engagement in services. It shows how a sociological analysis, which uses positioning theory to examine how people negotiate, partially accept and simultaneously resist stigmatisation in relation to mental health concerns, can contribute to an elucidation of the social processes and narrative constructions which may maintain as well as bridge the mental health service gap. (SHI103).

Only one article (SHI30) used the term transferability to argue for the potential of wider relevance of the results which was thought to be more the product of the composition of the sample (i.e. diverse sample), rather than the sample size.

The second major concern that arose from a ‘small’ sample size pertained to the internal validity of findings (i.e. here the term is used to denote the ‘truth’ or credibility of research findings). Authors expressed uncertainty about the degree of confidence in particular aspects or patterns of their results, primarily those that concerned some form of differentiation on the basis of relevant participant characteristics.

The information source preferred seemed to vary according to parents’ education; however, the sample size is too small to draw conclusions about such patterns. (SHI80). Although our numbers were too small to demonstrate gender differences with any certainty, it does seem that the biomedical and erotic scripts may be more common in the accounts of men and the relational script more common in the accounts of women. (SHI81).

In other instances, articles expressed uncertainty about whether their results accounted for the full spectrum and variation of the phenomenon under investigation. In other words, a ‘small’ sample size (alongside compositional ‘deficits’ such as a not statistically representative sample) was seen to threaten the ‘content validity’ of the results which in turn led to constructions of the study conclusions as tentative.

Data collection ceased on pragmatic grounds rather than when no new information appeared to be obtained ( i.e. , saturation point). As such, care should be taken not to overstate the findings. Whilst the themes from the initial interviews seemed to be replicated in the later interviews, further interviews may have identified additional themes or provided more nuanced explanations. (BJHP53). …it should be acknowledged that this study was based on a small sample of self-selected couples in enduring marriages who were not broadly representative of the population. Thus, participants may not be representative of couples that experience postnatal PTSD. It is therefore unlikely that all the key themes have been identified and explored. For example, couples who were excluded from the study because the male partner declined to participate may have been experiencing greater interpersonal difficulties. (BJHP03).

In other instances, articles attempted to preserve a degree of credibility of their results, despite the recognition that the sample size was ‘small’. Clarity and sharpness of emerging themes and alignment with previous relevant work were the arguments employed to warrant the validity of the results.

This study focused on British Chinese carers of patients with affective disorders, using a qualitative methodology to synthesise the sociocultural representations of illness within this community. Despite the small sample size, clear themes emerged from the narratives that were sufficient for this exploratory investigation. (SHI98).

The present study sought to examine how qualitative sample sizes in health-related research are characterised and justified. In line with previous studies [ 22 , 30 , 33 , 34 ] the findings demonstrate that reporting of sample size sufficiency is limited; just over 50% of articles in the BMJ and BJHP and 82% in the SHI did not provide any sample size justification. Providing a sample size justification was not related to the number of interviews conducted, but it was associated with the journal that the article was published in, indicating the influence of disciplinary or publishing norms, also reported in prior research [ 30 ]. This lack of transparency about sample size sufficiency is problematic given that most qualitative researchers would agree that it is an important marker of quality [ 56 , 57 ]. Moreover, and with the rise of qualitative research in social sciences, efforts to synthesise existing evidence and assess its quality are obstructed by poor reporting [ 58 , 59 ].

When authors justified their sample size, our findings indicate that sufficiency was mostly appraised with reference to features that were intrinsic to the study, in agreement with general advice on sample size determination [ 4 , 11 , 36 ]. The principle of saturation was the most commonly invoked argument [ 22 ] accounting for 55% of all justifications. A wide range of variants of saturation was evident corroborating the proliferation of the meaning of the term [ 49 ] and reflecting different underlying conceptualisations or models of saturation [ 20 ]. Nevertheless, claims of saturation were never substantiated in relation to procedures conducted in the study itself, endorsing similar observations in the literature [ 25 , 30 , 47 ]. Claims of saturation were sometimes supported with citations of other literature, suggesting a removal of the concept away from the characteristics of the study at hand. Pragmatic considerations, such as resource constraints or participant response rate and availability, was the second most frequently used argument accounting for approximately 10% of justifications and another 23% of justifications also represented intrinsic-to-the-study characteristics (i.e. qualities of the analysis, meeting sampling or research design requirements, richness and volume of the data obtained, nature of study, further sampling to check findings consistency).

Only, 12% of mentions of sample size justification pertained to arguments that were external to the study at hand, in the form of existing sample size guidelines and prior research that sets precedents. Whilst community norms and prior research can establish useful rules of thumb for estimating sample sizes [ 60 ] – and reveal what sizes are more likely to be acceptable within research communities – researchers should avoid adopting these norms uncritically, especially when such guidelines [e.g. 30 , 35 ], might be based on research that does not provide adequate evidence of sample size sufficiency. Similarly, whilst methodological research that seeks to demonstrate the achievement of saturation is invaluable since it explicates the parameters upon which saturation is contingent and indicates when a research project is likely to require a smaller or a larger sample [e.g. 29 ], specific numbers at which saturation was achieved within these projects cannot be routinely extrapolated for other projects. We concur with existing views [ 11 , 36 ] that the consideration of the characteristics of the study at hand, such as the epistemological and theoretical approach, the nature of the phenomenon under investigation, the aims and scope of the study, the quality and richness of data, or the researcher’s experience and skills of conducting qualitative research, should be the primary guide in determining sample size and assessing its sufficiency.

Moreover, although numbers in qualitative research are not unimportant [ 61 ], sample size should not be considered alone but be embedded in the more encompassing examination of data adequacy [ 56 , 57 ]. Erickson’s [ 62 ] dimensions of ‘evidentiary adequacy’ are useful here. He explains the concept in terms of adequate amounts of evidence, adequate variety in kinds of evidence, adequate interpretive status of evidence, adequate disconfirming evidence, and adequate discrepant case analysis. All dimensions might not be relevant across all qualitative research designs, but this illustrates the thickness of the concept of data adequacy, taking it beyond sample size.

The present research also demonstrated that sample sizes were commonly seen as ‘small’ and insufficient and discussed as limitation. Often unjustified (and in two cases incongruent with their own claims of saturation) these findings imply that sample size in qualitative health research is often adversely judged (or expected to be judged) against an implicit, yet omnipresent, quasi-quantitative standpoint. Indeed there were a few instances in our data where authors appeared, possibly in response to reviewers, to resist to some sort of quantification of their results. This implicit reference point became more apparent when authors discussed the threats deriving from an insufficient sample size. Whilst the concerns about internal validity might be legitimate to the extent that qualitative research projects, which are broadly related to realism, are set to examine phenomena in sufficient breadth and depth, the concerns around generalizability revealed a conceptualisation that is not compatible with purposive sampling. The limited potential for generalisation, as a result of a small sample size, was often discussed in nomothetic, statistical terms. Only occasionally was analytic or idiographic generalisation invoked to warrant the value of the study’s findings [ 5 , 17 ].

Strengths and limitations of the present study

We note, first, the limited number of health-related journals reviewed, so that only a ‘snapshot’ of qualitative health research has been captured. Examining additional disciplines (e.g. nursing sciences) as well as inter-disciplinary journals would add to the findings of this analysis. Nevertheless, our study is the first to provide some comparative insights on the basis of disciplines that are differently attached to the legacy of positivism and analysed literature published over a lengthy period of time (15 years). Guetterman [ 27 ] also examined health-related literature but this analysis was restricted to 26 most highly cited articles published over a period of five years whilst Carlsen and Glenton’s [ 22 ] study concentrated on focus groups health research. Moreover, although it was our intention to examine sample size justification in relation to the epistemological and theoretical positions of articles, this proved to be challenging largely due to absence of relevant information, or the difficulty into discerning clearly articles’ positions [ 63 ] and classifying them under specific approaches (e.g. studies often combined elements from different theoretical and epistemological traditions). We believe that such an analysis would yield useful insights as it links the methodological issue of sample size to the broader philosophical stance of the research. Despite these limitations, the analysis of the characterisation of sample size and of the threats seen to accrue from insufficient sample size, enriches our understanding of sample size (in)sufficiency argumentation by linking it to other features of the research. As the peer-review process becomes increasingly public, future research could usefully examine how reporting around sample size sufficiency and data adequacy might be influenced by the interactions between authors and reviewers.

The past decade has seen a growing appetite in qualitative research for an evidence-based approach to sample size determination and to evaluations of the sufficiency of sample size. Despite the conceptual and methodological developments in the area, the findings of the present study confirm previous studies in concluding that appraisals of sample size sufficiency are either absent or poorly substantiated. To ensure and maintain high quality research that will encourage greater appreciation of qualitative work in health-related sciences [ 64 ], we argue that qualitative researchers should be more transparent and thorough in their evaluation of sample size as part of their appraisal of data adequacy. We would encourage the practice of appraising sample size sufficiency with close reference to the study at hand and would thus caution against responding to the growing methodological research in this area with a decontextualised application of sample size numerical guidelines, norms and principles. Although researchers might find sample size community norms serve as useful rules of thumb, we recommend methodological knowledge is used to critically consider how saturation and other parameters that affect sample size sufficiency pertain to the specifics of the particular project. Those reviewing papers have a vital role in encouraging transparent study-specific reporting. The review process should support authors to exercise nuanced judgments in decisions about sample size determination in the context of the range of factors that influence sample size sufficiency and the specifics of a particular study. In light of the growing methodological evidence in the area, transparent presentation of such evidence-based judgement is crucial and in time should surely obviate the seemingly routine practice of citing the ‘small’ size of qualitative samples among the study limitations.

A non-parametric test of difference for independent samples was performed since the variable number of interviews violated assumptions of normality according to the standardized scores of skewness and kurtosis (BMJ: z skewness = 3.23, z kurtosis = 1.52; BJHP: z skewness = 4.73, z kurtosis = 4.85; SHI: z skewness = 12.04, z kurtosis = 21.72) and the Shapiro-Wilk test of normality ( p  < .001).

Abbreviations

British Journal of Health Psychology

British Medical Journal

Interpretative Phenomenological Analysis

Sociology of Health & Illness

Spencer L, Ritchie J, Lewis J, Dillon L. Quality in qualitative evaluation: a framework for assessing research evidence. National Centre for Social Research 2003 https://www.heacademy.ac.uk/system/files/166_policy_hub_a_quality_framework.pdf Accessed 11 May 2018.

Fusch PI, Ness LR. Are we there yet? Data saturation in qualitative research Qual Rep. 2015;20(9):1408–16.

Google Scholar  

Robinson OC. Sampling in interview-based qualitative research: a theoretical and practical guide. Qual Res Psychol. 2014;11(1):25–41.

Article   Google Scholar  

Sandelowski M. Sample size in qualitative research. Res Nurs Health. 1995;18(2):179–83.

Article   CAS   Google Scholar  

Sandelowski M. One is the liveliest number: the case orientation of qualitative research. Res Nurs Health. 1996;19(6):525–9.

Luborsky MR, Rubinstein RL. Sampling in qualitative research: rationale, issues. and methods Res Aging. 1995;17(1):89–113.

Marshall MN. Sampling for qualitative research. Fam Pract. 1996;13(6):522–6.

Patton MQ. Qualitative evaluation and research methods. 2nd ed. Newbury Park, CA: Sage; 1990.

van Rijnsoever FJ. (I Can’t get no) saturation: a simulation and guidelines for sample sizes in qualitative research. PLoS One. 2017;12(7):e0181689.

Morse JM. The significance of saturation. Qual Health Res. 1995;5(2):147–9.

Morse JM. Determining sample size. Qual Health Res. 2000;10(1):3–5.

Gergen KJ, Josselson R, Freeman M. The promises of qualitative inquiry. Am Psychol. 2015;70(1):1–9.

Borsci S, Macredie RD, Barnett J, Martin J, Kuljis J, Young T. Reviewing and extending the five-user assumption: a grounded procedure for interaction evaluation. ACM Trans Comput Hum Interact. 2013;20(5):29.

Borsci S, Macredie RD, Martin JL, Young T. How many testers are needed to assure the usability of medical devices? Expert Rev Med Devices. 2014;11(5):513–25.

Glaser BG, Strauss AL. The discovery of grounded theory: strategies for qualitative research. Chicago, IL: Aldine; 1967.

Kerr C, Nixon A, Wild D. Assessing and demonstrating data saturation in qualitative inquiry supporting patient-reported outcomes research. Expert Rev Pharmacoecon Outcomes Res. 2010;10(3):269–81.

Lincoln YS, Guba EG. Naturalistic inquiry. London: Sage; 1985.

Book   Google Scholar  

Malterud K, Siersma VD, Guassora AD. Sample size in qualitative interview studies: guided by information power. Qual Health Res. 2015;26:1753–60.

Nelson J. Using conceptual depth criteria: addressing the challenge of reaching saturation in qualitative research. Qual Res. 2017;17(5):554–70.

Saunders B, Sim J, Kingstone T, Baker S, Waterfield J, Bartlam B, et al. Saturation in qualitative research: exploring its conceptualization and operationalization. Qual Quant. 2017. https://doi.org/10.1007/s11135-017-0574-8 .

Caine K. Local standards for sample size at CHI. In Proceedings of the 2016 CHI conference on human factors in computing systems. 2016;981–992. ACM.

Carlsen B, Glenton C. What about N? A methodological study of sample-size reporting in focus group studies. BMC Med Res Methodol. 2011;11(1):26.

Constantinou CS, Georgiou M, Perdikogianni M. A comparative method for themes saturation (CoMeTS) in qualitative interviews. Qual Res. 2017;17(5):571–88.

Dai NT, Free C, Gendron Y. Interview-based research in accounting 2000–2014: a review. November 2016. https://ssrn.com/abstract=2711022 or https://doi.org/10.2139/ssrn.2711022 . Accessed 17 May 2018.

Francis JJ, Johnston M, Robertson C, Glidewell L, Entwistle V, Eccles MP, et al. What is an adequate sample size? Operationalising data saturation for theory-based interview studies. Psychol Health. 2010;25(10):1229–45.

Guest G, Bunce A, Johnson L. How many interviews are enough? An experiment with data saturation and variability. Field Methods. 2006;18(1):59–82.

Guetterman TC. Descriptions of sampling practices within five approaches to qualitative research in education and the health sciences. Forum Qual Soc Res. 2015;16(2):25. http://nbn-resolving.de/urn:nbn:de:0114-fqs1502256 . Accessed 17 May 2018.

Hagaman AK, Wutich A. How many interviews are enough to identify metathemes in multisited and cross-cultural research? Another perspective on guest, bunce, and Johnson’s (2006) landmark study. Field Methods. 2017;29(1):23–41.

Hennink MM, Kaiser BN, Marconi VC. Code saturation versus meaning saturation: how many interviews are enough? Qual Health Res. 2017;27(4):591–608.

Marshall B, Cardon P, Poddar A, Fontenot R. Does sample size matter in qualitative research?: a review of qualitative interviews in IS research. J Comput Inform Syst. 2013;54(1):11–22.

Mason M. Sample size and saturation in PhD studies using qualitative interviews. Forum Qual Soc Res 2010;11(3):8. http://nbn-resolving.de/urn:nbn:de:0114-fqs100387 . Accessed 17 May 2018.

Safman RM, Sobal J. Qualitative sample extensiveness in health education research. Health Educ Behav. 2004;31(1):9–21.

Saunders MN, Townsend K. Reporting and justifying the number of interview participants in organization and workplace research. Br J Manag. 2016;27(4):836–52.

Sobal J. 2001. Sample extensiveness in qualitative nutrition education research. J Nutr Educ. 2001;33(4):184–92.

Thomson SB. 2010. Sample size and grounded theory. JOAAG. 2010;5(1). http://www.joaag.com/uploads/5_1__Research_Note_1_Thomson.pdf . Accessed 17 May 2018.

Baker SE, Edwards R. How many qualitative interviews is enough?: expert voices and early career reflections on sampling and cases in qualitative research. National Centre for Research Methods Review Paper. 2012; http://eprints.ncrm.ac.uk/2273/4/how_many_interviews.pdf . Accessed 17 May 2018.

Ogden J, Cornwell D. The role of topic, interviewee, and question in predicting rich interview data in the field of health research. Sociol Health Illn. 2010;32(7):1059–71.

Green J, Thorogood N. Qualitative methods for health research. London: Sage; 2004.

Ritchie J, Lewis J, Elam G. Designing and selecting samples. In: Ritchie J, Lewis J, editors. Qualitative research practice: a guide for social science students and researchers. London: Sage; 2003. p. 77–108.

Britten N. Qualitative research: qualitative interviews in medical research. BMJ. 1995;311(6999):251–3.

Creswell JW. Qualitative inquiry and research design: choosing among five approaches. 2nd ed. London: Sage; 2007.

Fugard AJ, Potts HW. Supporting thinking on sample sizes for thematic analyses: a quantitative tool. Int J Soc Res Methodol. 2015;18(6):669–84.

Emmel N. Themes, variables, and the limits to calculating sample size in qualitative research: a response to Fugard and Potts. Int J Soc Res Methodol. 2015;18(6):685–6.

Braun V, Clarke V. (Mis) conceptualising themes, thematic analysis, and other problems with Fugard and Potts’ (2015) sample-size tool for thematic analysis. Int J Soc Res Methodol. 2016;19(6):739–43.

Hammersley M. Sampling and thematic analysis: a response to Fugard and Potts. Int J Soc Res Methodol. 2015;18(6):687–8.

Charmaz K. Constructing grounded theory: a practical guide through qualitative analysis. London: Sage; 2006.

Bowen GA. Naturalistic inquiry and the saturation concept: a research note. Qual Res. 2008;8(1):137–52.

Morse JM. Data were saturated. Qual Health Res. 2015;25(5):587–8.

O’Reilly M, Parker N. ‘Unsatisfactory saturation’: a critical exploration of the notion of saturated sample sizes in qualitative research. Qual Res. 2013;13(2):190–7.

Manen M, Higgins I, Riet P. A conversation with max van Manen on phenomenology in its original sense. Nurs Health Sci. 2016;18(1):4–7.

Dey I. Grounding grounded theory. San Francisco, CA: Academic Press; 1999.

Hays DG, Wood C, Dahl H, Kirk-Jenkins A. Methodological rigor in journal of counseling & development qualitative research articles: a 15-year review. J Couns Dev. 2016;94(2):172–83.

Moher D, Liberati A, Tetzlaff J, Altman DG, Prisma Group. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med 2009; 6(7): e1000097.

Hsieh HF, Shannon SE. Three approaches to qualitative content analysis. Qual Health Res. 2005;15(9):1277–88.

Boyatzis RE. Transforming qualitative information: thematic analysis and code development. Thousand Oaks, CA: Sage; 1998.

Levitt HM, Motulsky SL, Wertz FJ, Morrow SL, Ponterotto JG. Recommendations for designing and reviewing qualitative research in psychology: promoting methodological integrity. Qual Psychol. 2017;4(1):2–22.

Morrow SL. Quality and trustworthiness in qualitative research in counseling psychology. J Couns Psychol. 2005;52(2):250–60.

Barroso J, Sandelowski M. Sample reporting in qualitative studies of women with HIV infection. Field Methods. 2003;15(4):386–404.

Glenton C, Carlsen B, Lewin S, Munthe-Kaas H, Colvin CJ, Tunçalp Ö, et al. Applying GRADE-CERQual to qualitative evidence synthesis findings—paper 5: how to assess adequacy of data. Implement Sci. 2018;13(Suppl 1):14.

Onwuegbuzie AJ. Leech NL. A call for qualitative power analyses. Qual Quant. 2007;41(1):105–21.

Sandelowski M. Real qualitative researchers do not count: the use of numbers in qualitative research. Res Nurs Health. 2001;24(3):230–40.

Erickson F. Qualitative methods in research on teaching. In: Wittrock M, editor. Handbook of research on teaching. 3rd ed. New York: Macmillan; 1986. p. 119–61.

Bradbury-Jones C, Taylor J, Herber O. How theory is used and articulated in qualitative research: development of a new typology. Soc Sci Med. 2014;120:135–41.

Greenhalgh T, Annandale E, Ashcroft R, Barlow J, Black N, Bleakley A, et al. An open letter to the BMJ editors on qualitative research. BMJ. 2016;i563:352.

Download references

Acknowledgments

We would like to thank Dr. Paula Smith and Katharine Lee for their comments on a previous draft of this paper as well as Natalie Ann Mitchell and Meron Teferra for assisting us with data extraction.

This research was initially conceived of and partly conducted with financial support from the Multidisciplinary Assessment of Technology Centre for Healthcare (MATCH) programme (EP/F063822/1 and EP/G012393/1). The research continued and was completed independent of any support. The funding body did not have any role in the study design, the collection, analysis and interpretation of the data, in the writing of the paper, and in the decision to submit the manuscript for publication. The views expressed are those of the authors alone.

Availability of data and materials

Supporting data can be accessed in the original publications. Additional File 2 lists all eligible studies that were included in the present analysis.

Author information

Authors and affiliations.

Department of Psychology, University of Bath, Building 10 West, Claverton Down, Bath, BA2 7AY, UK

Konstantina Vasileiou & Julie Barnett

School of Psychology, Newcastle University, Ridley Building 1, Queen Victoria Road, Newcastle upon Tyne, NE1 7RU, UK

Susan Thorpe

Department of Computer Science, Brunel University London, Wilfred Brown Building 108, Uxbridge, UB8 3PH, UK

Terry Young

You can also search for this author in PubMed   Google Scholar

Contributions

JB and TY conceived the study; KV, JB, and TY designed the study; KV identified the articles and extracted the data; KV and JB assessed eligibility of articles; KV, JB, ST, and TY contributed to the analysis of the data, discussed the findings and early drafts of the paper; KV developed the final manuscript; KV, JB, ST, and TY read and approved the manuscript.

Corresponding author

Correspondence to Konstantina Vasileiou .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

Terry Young is an academic who undertakes research and occasional consultancy in the areas of health technology assessment, information systems, and service design. He is unaware of any direct conflict of interest with respect to this paper. All other authors have no competing interests to declare.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional Files

Additional file 1:.

Editorial positions on qualitative research and sample considerations (where available). (DOCX 12 kb)

Additional File 2:

List of eligible articles included in the review ( N  = 214). (DOCX 38 kb)

Additional File 3:

Data Extraction Form. (DOCX 15 kb)

Additional File 4:

Citations used by articles to support their position on saturation. (DOCX 14 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Vasileiou, K., Barnett, J., Thorpe, S. et al. Characterising and justifying sample size sufficiency in interview-based studies: systematic analysis of qualitative health research over a 15-year period. BMC Med Res Methodol 18 , 148 (2018). https://doi.org/10.1186/s12874-018-0594-7

Download citation

Received : 22 May 2018

Accepted : 29 October 2018

Published : 21 November 2018

DOI : https://doi.org/10.1186/s12874-018-0594-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Sample size
  • Sample size justification
  • Sample size characterisation
  • Data adequacy
  • Qualitative health research
  • Qualitative interviews
  • Systematic analysis

BMC Medical Research Methodology

ISSN: 1471-2288

sample size of qualitative research

InterQ Research

How to Justify Sample Size in Qualitative Research

InterQ Research Explains How To Justify Sample Size In Qualitative Research

  • March 21, 2023

Article Summary : Sample sizes in qualitative research can be much lower than sample sizes in quantitative research. The key is having the right participant segmentation and study design. Data saturation is also a key principle to understand.

Qualitative research is a bit of a puzzle for new practitioners: since it is done via interviewing participants, observation, or studying people’s patterns and movements (in the case of user experience design), one can’t obviously have a huge sample size that is statistically significant. Interviewing 200+ people is not only incredibly time-consuming, it’s also quite expensive.

And, moreover, the goal of qualitative research is not to understand how much or how many. The goal is to collect themes and see patterns. It’s to uncover the “why” versus the amount.

So in this post, we’re going to explore the question every qualitative researcher asks, at one point or another: How do you justify the sample size in qualitative research?

Here are some guidelines.

Qualitative sample size guideline #1: Segmentation of participants

In qualitative research, because the goal is to understand themes and patterns of a particular subset (versus a broad population), the first step is segmentation. You may also know of this as “ persona ” development, but regardless of what you call it, the idea is to first bucket your various buyer/customer types into like-categories. For example, if you’re selling sales software, your target isn’t every single company who sells products. It’s likely much more specific: like mid-market sized VP-level sales execs who have a technology product and use a cloud-based CRM. If that’s your main buyer, that’s your segment who you would focus on in qualitative research.

Generally, most companies have multiple targets, so the trick is to think about all the various buyers/consumers and identify which underlying traits they have in common, as well as which traits differentiate them from other targets. Typically, this is where quantitative data comes into play: either through internal data analysis or surveys. Whatever your process, this is step 1 to figure out the segments you will be bucketing participants into so you can move into the qualitative phase, where you’ll ask in-depth questions, via interviews, to each segment category. At this stage, it’s time to bring in your recruiting company to find your participants.

Qualitative sample size guideline #2: Figure out the appropriate study design

After you’ve tackled your segmentation exercise and know how to divide up your participants, you’ll need to think through the qualitative methodology that is most appropriate for answering your research questions. At InterQ Research, we always design studies through the lens of contextual research. This means that you want to set up your studies to be as close to real life as possible. Is your product sale done through a group discussion or individual decision? Often, when teams decide on software or technology stacks, they’ll want to test it and talk amongst themselves. If this is the case, you would need to interview the team or a team of like-minded professionals to see how they come to a decision. In this case, focus groups would be a great methodology.

Conversely, if your product is thought through on an individual-basis, like, perhaps, a person navigating a website when purchasing a plane ticket, then you’d want to interview the individual, alone. In this case, you’d want to choose a hybrid approach, of a user experience/journey mapping exercise, along with an in-depth interview.

In qualitative research, there are numerous methodologies, and frequently, mixed-methodologies work best, in order to see the context of how people behave, as well as to understand how they think.

But I digress. Let’s get back to sample sizes in qualitative research.

Qualitative sample size guideline #3: Your sample size is completed when you reach saturation

So far we’ve covered how to first segment your audiences, and then we’ve talked about the methodology to choose, based on context. The third principle in qualitative research is to understand the theory of data saturation.

Saturation in qualitative research means that, when interviewing a distinct segment of participants, you are able to explore all of the common themes the sample set has in common. In other words, after doing, let’s say, 15 interviews about a specific topic, you start to hear the participants all say similar things. Since you have a fairly homogenous sample, these themes will start to come out after 10-20 interviews, if you’ve done your recruiting well (and sometimes as soon as 6 interviews). Once you hear the same themes, with no new information, this is data saturation.

The beauty of qualitative research is that if you:

  • Segment your audiences carefully, into distinct groups, and,
  • Choose the right methodology

You’ll start to hit saturation, and you will get diminishing returns with more interviews. In this manner, qualitative research can have smaller sample sizes than quantitative, since it’s thematic, versus statistical.

Let’s wrap it up: So what is the ideal sample size in qualitative research?

To bring this one home, let’s answer the question we sought out to investigate: the sample size in qualitative research.

Typically, sample sizes will range from 6-20, per segment. (So if you have 5 segments, 6 is your multiplier for the total number you’ll need, so you would have a total sample size of 30.) For very specific tasks, such as in user experience research, moderators will see the same themes after as few as 5-6 interviews. In most studies, though, researchers will reach saturation after 10-20 interviews. The variable here depends on how homogenous the sample is, as well as the type of questions being asked. Some researchers aim for a bakers dozen (13), and see if they’ve reached saturation after 13. If not, the study can be expanded to find more participants so that all the themes can be explored. But 13 is a good place to start.

Interested in running a qualitative research study? Request a proposal > 

Author Bio: Joanna Jones is the founder and CEO of InterQ Research. At InterQ, she oversees study design, manages clients, and moderators studies.

sample size of qualitative research

  • Request Proposal
  • Participate in Studies
  • Our Leadership Team
  • Our Approach
  • Mission, Vision and Core Values
  • Qualitative Research
  • Quantitative Research
  • Research Insights Workshops
  • Customer Journey Mapping
  • Millennial & Gen Z Market Research
  • Market Research Services
  • Our Clients
  • InterQ Blog

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Sample Size in Qualitative Interview Studies: Guided by Information Power

Affiliations.

  • 1 1 University of Copenhagen, Copenhagen, Denmark.
  • 2 2 Uni Research Health, Bergen, Norway.
  • 3 3 University of Bergen, Bergen, Norway.
  • PMID: 26613970
  • DOI: 10.1177/1049732315617444

Sample sizes must be ascertained in qualitative studies like in quantitative studies but not by the same means. The prevailing concept for sample size in qualitative studies is "saturation." Saturation is closely tied to a specific methodology, and the term is inconsistently applied. We propose the concept "information power" to guide adequate sample size for qualitative studies. Information power indicates that the more information the sample holds, relevant for the actual study, the lower amount of participants is needed. We suggest that the size of a sample with sufficient information power depends on (a) the aim of the study, (b) sample specificity, (c) use of established theory, (d) quality of dialogue, and (e) analysis strategy. We present a model where these elements of information and their relevant dimensions are related to information power. Application of this model in the planning and during data collection of a qualitative study is discussed.

Keywords: information power; methodology; participants; qualitative; sample size; saturation.

PubMed Disclaimer

Similar articles

  • A simple method to assess and report thematic saturation in qualitative research. Guest G, Namey E, Chen M. Guest G, et al. PLoS One. 2020 May 5;15(5):e0232076. doi: 10.1371/journal.pone.0232076. eCollection 2020. PLoS One. 2020. PMID: 32369511 Free PMC article.
  • Informing a priori Sample Size Estimation in Qualitative Concept Elicitation Interview Studies for Clinical Outcome Assessment Instrument Development. Turner-Bowker DM, Lamoureux RE, Stokes J, Litcher-Kelly L, Galipeau N, Yaworsky A, Solomon J, Shields AL. Turner-Bowker DM, et al. Value Health. 2018 Jul;21(7):839-842. doi: 10.1016/j.jval.2017.11.014. Epub 2018 Mar 7. Value Health. 2018. PMID: 30005756
  • Open-ended interview questions and saturation. Weller SC, Vickers B, Bernard HR, Blackburn AM, Borgatti S, Gravlee CC, Johnson JC. Weller SC, et al. PLoS One. 2018 Jun 20;13(6):e0198606. doi: 10.1371/journal.pone.0198606. eCollection 2018. PLoS One. 2018. PMID: 29924873 Free PMC article.
  • Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification. Wolahan SM, Hirt D, Glenn TC. Wolahan SM, et al. In: Kobeissy FH, editor. Brain Neurotrauma: Molecular, Neuropsychological, and Rehabilitation Aspects. Boca Raton (FL): CRC Press/Taylor & Francis; 2015. Chapter 25. In: Kobeissy FH, editor. Brain Neurotrauma: Molecular, Neuropsychological, and Rehabilitation Aspects. Boca Raton (FL): CRC Press/Taylor & Francis; 2015. Chapter 25. PMID: 26269925 Free Books & Documents. Review.
  • An increasing number of qualitative research papers in oncology and palliative care: does it mean a thorough development of the methodology of research? Borreani C, Miccinesi G, Brunelli C, Lina M. Borreani C, et al. Health Qual Life Outcomes. 2004 Jan 23;2:7. doi: 10.1186/1477-7525-2-7. Health Qual Life Outcomes. 2004. PMID: 14741052 Free PMC article. Review.
  • Adolescents' perspectives on a novel digital treatment targeting eating disorders: a qualitative study. Holgersen G, Abdi-Dezfuli SE, Friis Darrud S, Stornes Espeset EM, Bircow Elgen I, Nordgreen T. Holgersen G, et al. BMC Psychiatry. 2024 Jun 5;24(1):423. doi: 10.1186/s12888-024-05866-1. BMC Psychiatry. 2024. PMID: 38840080 Free PMC article.
  • Dismantle and rebuild: the importance of preparedness and self-efficacy before, during and after allogeneic haematopoietic cell transplantation. Holmberg K, Bergkvist K, Wengström Y, Hagelin CL. Holmberg K, et al. J Cancer Surviv. 2024 Jun 3. doi: 10.1007/s11764-024-01622-2. Online ahead of print. J Cancer Surviv. 2024. PMID: 38829473
  • "And Now that I Feel Safe…I'm Coming Out of Fight or Flight": A Qualitative Exploration of Challenges and Opportunities for Residents' Mental Health in Substance Use Recovery Housing. Stewart HLN, Wilkerson JM, Gallardo KR, Zoschke IN, Gillespie D, Rodriguez SA, McCurdy SA. Stewart HLN, et al. Community Ment Health J. 2024 Jun 1. doi: 10.1007/s10597-024-01301-7. Online ahead of print. Community Ment Health J. 2024. PMID: 38822922
  • Telephone consulting for 'Personalised Care and Support Planning' with people with long-term conditions: a qualitative study of healthcare professionals' experiences during COVID-19 restrictions and beyond. McCann S, Entwistle VA, Oliver L, Lewis-Barned N, Haines R, Cribb A. McCann S, et al. BMC Prim Care. 2024 May 31;25(1):193. doi: 10.1186/s12875-024-02443-z. BMC Prim Care. 2024. PMID: 38822282 Free PMC article.
  • "More than just a walk in the park": A multi-stakeholder qualitative exploration of community-based walking sport programmes for middle-aged and older adults. Sivaramakrishnan H, Phoenix C, Quested E, Thogersen-Ntoumani C, Gucciardi DF, Cheval B, Ntoumanis N. Sivaramakrishnan H, et al. Qual Res Sport Exerc Health. 2023 Apr 2;15(6):772-788. doi: 10.1080/2159676X.2023.2197450. eCollection 2023. Qual Res Sport Exerc Health. 2023. PMID: 38812823 Free PMC article.

Related information

  • Cited in Books

LinkOut - more resources

Full text sources, other literature sources.

  • scite Smart Citations

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

Root out friction in every digital experience, super-charge conversion rates, and optimise digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered straight to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Meet the operating system for experience management

  • Free Account
  • Product Demos
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Employee Exit Interviews
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories
  • Artificial Intelligence

Market Research

  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results.

language

  • English/AU & NZ
  • Español/Europa
  • Español/América Latina
  • Português Brasileiro
  • REQUEST DEMO
  • Experience Management
  • Sampling Methods
  • Systematic Random Sampling

Try Qualtrics for free

The complete guide to systematic random sampling.

15 min read In this article, we’ll highlight what systematic random sampling is and how you can use it to create random sampling surveys to get a clear understanding of a target population.

If you want a highly effective (and accurate) method for selecting a random sample from a research population, systematic random sampling might be just what you’re looking for.

In this article, we’ll break down what systematic random sampling is and how you can use it to  get a clear understanding of a target population .

What is systematic random sampling?

Systemic random sampling visual image

Systematic random sampling is a probability sampling method. This means it uses chance and randomisation to select sample data that represents a population.

After  determining the right sample size , researchers assign a regular interval number they will use to select which members of the target population will be included in the sample.

The sample interval (k) is decided by dividing the population size (N) by the sample size (n).

If you had a list of 1,000 customers ( your target population ) and you wanted to survey 200 of them, your interval would be 5. This means that you would sample every  5th  person in your list of 1,000 customers.

1,000 / 200 = 5

To ensure a random sample, researchers use a random starting point within the range from 0-k. So if k = 5 you might randomly start with the 2nd name in the list and then sample every 5th person (e.g. 2, 7, 12, 17 and so on).

Types of systematic sampling

Although systematic sampling is a relatively simple concept, it can be performed in a few different ways.

Systematic random sampling

This is the ‘classic’ form of systematic sampling where items or individuals are selected at a predetermined interval.

Circular systematic sampling

With circular systematic sampling, the sampling process continues past the end of the population list, until the whole population has been sampled.

Linear systematic sampling

Linear systematic sampling doesn’t repeat, unlike circular systematic sampling. It produces a number using a form of  skip logic  to select the sample members.

When to use systematic sampling

Systematic sampling can be used whenever you want the benefits of randomly sampling the population you’re studying. It can be especially useful in situations where you don’t have details of the entire population before you begin your study. This is because systematic sampling is rule-based, so you can just apply the interval you’ve chosen to the data.

Systematic random sampling is valuable when you’re on a tight budget or a short timescale, but it may not be right for your project if there is any risk of data manipulation.

What is the risk of bias in systematic sampling?

Systematic sampling can remove some of the unpredictability from a sample, meaning that a researcher could potentially manipulate the results by choosing a starting point that favours their preferred outcome.

However,  this risk is very slight  because the only random component of the sample is the selection of the starting point. After that, the process moves in a set pattern until the sample is complete. For example, in a population of 21 and starting at 1, using an interval of 3 would result in the numbers 3, 6, 9, 12, 15, 18, and 21 being used.

It’s also important to ensure that your list is not ordered in any way that could introduce bias, e.g. if your list was ordered male, female, male, female, and you picked every  other  person, you would  introduce a great degree of bias .

Benefits of systematic sampling

There are several benefits to using a systematic sample for research.

It’s simple to use

Because of the way systematic sampling is structured, surveys based on it are easy to create  and the data is easy to  analyse .

This type of sampling is particularly beneficial where the budget is limited as the sample selection process is straightforward.

It provides some control over the process

While systematic sampling retains an element of randomness, it also introduces some control and process into the selection process.

It’s low risk

Systematic sampling carries a low-risk factor because there’s a low chance that the data can be contaminated. This is because of the even distribution of members to form samples.

It’s resistant to bias

After the initial starting point, researchers have little control over who gets selected for systematic sampling. The selection process is truly random, creating a buffer against favouritism when it comes to data collection.

Limitations of systematic sampling

Although there are significant advantages to systematic sampling, it can carry some risks that you need to be aware of.

An assumption that the population size is known

Like all  probability sampling  techniques, systematic sampling requires that every sample member has a known, non-zero likelihood of being selected.

If the true population sizes are not known and cannot be determined, you can’t use the formula k = N/n  to determine the sample interval or likelihood of being selected.

Another challenge is that it can be difficult to  achieve the desired sample size  if the true population size is not known in advance. If you need a sample of 200 people, you may not know whether to sample every 5th person or every 10th person.

Tip: you can still use systematic sampling if you don’t know the population size

There’s a way to use systematic sampling when the population size is not known in advance — if you are prepared to guess the interval. Imagine you want to conduct an exit poll after an important election. Although you do not know in advance how many people are going to show up at the election site, you can choose to sample every 5th person who leaves the voting location between when it opens and closes.

The true population size will then be approximately 5 times the size of your sample.

You need randomness in the population

If the population you’re assessing has a standard pattern (e.g., the list is ordered male, female, male, female), it can remove the randomness of the sampling interval and inadvertently lead to an unrepresentative sample such as all males. This isn’t common, but it’s worth being aware of before you plan your research project.

Let’s say you were surveying a high school class and wanted to select half of the  students for an assessment . If the student list was in a boy, girl, boy, girl pattern and your sampling interval was an even number, you could end up with all boys or all girls in your sample.

To avoid this, make sure your data is sorted in such a way that won’t introduce any kind of repeating pattern — for example, you could ask the children to line up in order of height. That removes any repeating pattern (e.g., boy, girl, boy, girl) that might interfere with the systematic sample. If you had an electronic list of students in the class, you could sort the list in a random way such as alphabetically by student’s first name.

Higher potential for data manipulation

While systematic sampling is reliable and offers a natural degree of unpredictability and randomness, it’s also open to manipulation by individual researchers, albeit in a limited way.

The method the researcher chooses for their sample collection could potentially result in a higher chance of achieving a predetermined outcome, rather than using simple random selection.

How to perform systematic random sampling

Systemic random sampling visual image

Let’s run through the steps for systematic random sampling:

  • Confirm the population total

It’s easier to achieve the  desired sample size  when you know the total population of the sample in advance.  Once you have this number, the rest of the sampling process is simple.

  • Determine your sample size

When you know the size of the sample you want to draw from the population, you can establish the interval needed to achieve the desired sample size from the population.

There are several factors to consider, including population size, the margin of error (confidence interval), and budget.

For example, you may want a sample size that yields a  margin of error of +/- 5 points, or maybe it’s determined by your budget, e.g. you’re giving out $10 gift cards but can only afford 200.

Find out more about calculating your sample size .

  • Determine your sampling interval

For this, you simply divide the total population by the sample size. If the total population size is not known in advance (for example, if you’re conducting an exit poll during an election), then estimate the population size based on historical data (e.g. the previous election) to determine your sampling interval.

  • Select your random starting point

Starting your selection at a random number helps to retain the randomness of your selection and removes the risk of cluster or manipulation. The random start number will be between 1 and the sample interval.

E.g. to get a sample of 100 out of 1,000, you would select every 10th person.

  • Add your sampling interval until you have the desired sample.

Continue choosing your sample members at regular intervals until you have the sample size you need to complete your study.

Systematic random sampling use cases and examples

  • Systematic random samples can be used in social research — for example, to survey a large number of households by running a list of addresses through the interval selection process.
  • It could be used for collecting customer experience feedback, for example by choosing at intervals from a list of people who attended a large conference.
  • On a ‘passing traffic’ basis, systematic sampling could be used to survey visitors to a store or shopping mall, drivers using a toll booth, people waiting in line, or calls in a contact center.

How to use systematic sampling

Let’s say a sample size of 20 out of a population of 100 is required. For this study, each member of the population is assigned a number from 001 to 100.

We first calculate the interval by dividing the total number of people in the population (100) by the number we want in the sample (20). This gives us a sample interval of 5.

We then select a number between 1 and the sampling interval from a random number generator. Let’s say we get 4 — this is where we start. We count down the list (using linear systematic sampling or circular systematic sampling) from our starting point (person 4) and select each 5th person.

For example, if we start at 4, the next would be 9, 14, 19, 24, 29, and so on. When you reach the end of the list, you should have your desired number (20) for the sample size.

What are the alternatives to systematic sampling?

sample size of qualitative research

Simple random sampling

A  simple random sample  is one where every member of the population of interest has an equal probability of being selected, and the selection is done in a completely arbitrary way. You can select randomly from a population using a tool such as a number table or a computer, or by using a draw or lottery system.

Stratified random sampling

Stratified random sampling involves dividing your population into groups (called strata) that are linked by a particular characteristic, such as income bracket or nationality. A random sample is taken from each group, and then these are pooled together to create a random sample of the entire population. The purpose is to make sure each group is represented in your sample.

sample size of qualitative research

Should you use systematic sampling or simple random sampling?

What’s the difference between a systematic sample and a simple random sample?

In  each sampling method , every person has an equal chance of being chosen. It’s also true that both methods  could create some bias  — e.g. a larger percentage of females in the sample than in the population.

The primary reasons to use a systematic sample in place of a random one are

(1) when the population size is unknown in advance (e.g. an exit poll during an election)

(2) if you don’t have an electronic list or you need to sample on the go. For example, you can use systematic sampling to select 1 out of every 5 people that enter a waiting room at a hospital, whereas with a random sample you would need the complete list of patients for the day, which could present problems if things change or last-minute emergencies turn up.

Want to learn more about sampling?

Choosing the right method for sampling your target audience can make all the difference when it comes to the reliability of your data and analysis.

In our guide to  sampling methods  and best practices, we explore other  probability  and  non-probability sampling methods , including  stratified sampling  and cluster sampling,  convenience sampling , quota sampling, and more — to give you the information you need to conduct comprehensive research.

If you want to learn more about sampling methods, you can read our  ultimate guide to sampling methods and best practices .

How Qualtrics can help

We specialise in providing solutions that empower everyone to gather research insights and take action. With Qualtrics CoreXM, you can make sophisticated research simple.

Whether you’re in marketing, product development, or HR — with CoreXM, you can rapidly turn insights from surveys into actionable strategies.

Featuring advanced and incredibly flexible survey tools, built-in intelligence (driven by iQ™), a fully automated platform, and  research services , CoreXM enables you to run the surveys you need to get answers to your most important  marketing ,  branding ,  customer , and  product  questions.

eBook: How to minimise sampling and non-sampling error

Related resources

Panels & Samples

Reward Survey Participants 15 min read

Manage your market research survey panel 14 min read, what is a research panel 10 min read, representative samples 13 min read, market intelligence 9 min read, qualitative research questions 11 min read, ethnographic research 11 min read, request demo.

Ready to learn more about Qualtrics?

  • Open access
  • Published: 11 June 2024

Perception of enhanced learning in medicine through integrating of virtual patients: an exploratory study on knowledge acquisition and transfer

  • Zhien Li 1 ,
  • Maryam Asoodar 1 ,
  • Nynke de Jong 2 ,
  • Tom Keulers 3 ,
  • Xian Liu 1 &
  • Diana Dolmans 1  

BMC Medical Education volume  24 , Article number:  647 ( 2024 ) Cite this article

323 Accesses

Metrics details

Introduction

Virtual Patients (VPs) have been shown to improve various aspects of medical learning, however, research has scarcely delved into the specific factors that facilitate the knowledge gain and transfer of knowledge from the classroom to real-world applications. This exploratory study aims to understand the impact of integrating VPs into classroom learning on students’ perceptions of knowledge acquisition and transfer.

The study was integrated into an elective course on “Personalized Medicine in Cancer Treatment and Care,” employing a qualitative and quantitative approach. Twenty-two second-year medical undergraduates engaged in a VP session, which included role modeling, practice with various authentic cases, group discussion on feedback, and a plenary session. Student perceptions of their learning were measured through surveys and focus group interviews and analyzed using descriptive statistics and thematic analysis.

Quantitative data shows that students highly valued the role modeling introduction, scoring it 4.42 out of 5, and acknowledged the practice with VPs in enhancing their subject matter understanding, with an average score of 4.0 out of 5. However, students’ reflections on peer dialogue on feedback received mixed reviews, averaging a score of 3.24 out of 5. Qualitative analysis (of focus-group interviews) unearthed the following four themes: ‘Which steps to take in clinical reasoning’, ‘Challenging their reasoning to enhance deeper understanding’, ‘Transfer of knowledge ‘, and ' Enhance Reasoning through Reflections’. Quantitative and qualitative data are cohered.

The study demonstrates evidence for the improvement of learning by incorporating VPs with learning activities. This integration enhances students’ perceptions of knowledge acquisition and transfer, thereby potentially elevating students’ preparedness for real-world clinical settings. Key facets like expert role modeling and various authentic case exposures were valued for fostering a deeper understanding and active engagement, though with some mixed responses towards peer feedback discussions. While the preliminary findings are encouraging, the necessity for further research to refine feedback mechanisms and explore a broader spectrum of medical disciplines with larger sample sizes is underscored. This exploration lays a groundwork for future endeavors aimed at optimizing VP-based learning experiences in medical education.

Peer Review reports

In Medical Education, a persistent challenge lies in the bridge between acquiring theoretical knowledge and applying it in real-world clinical scenarios. Many medical students struggle with translating their classroom learning into practical settings. The primary challenge lies in effectively translating the concepts students have learned into authentic patient interactions. This gap is particularly concerning because it affects the quality of patient care, as medical students are not just learning to acquire knowledge but must be able to apply this knowledge in complex healthcare settings.

One approach to address this challenge is the use of Virtual Patients (VPs), a computer-based simulation of real-life clinical scenarios for students to train clinical skills [ 1 ]. Research has shown that using VPs in the classroom can effectively improve various aspects of learning, from core knowledge and clinical reasoning to decision-making skills and knowledge transfer [ 2 , 3 , 4 , 5 ]. The VPs provide students with the opportunity to practice skills in a safe and controlled simulation environment.

Recent studies have focused on optimizing the design and arrangement of VPs as part of learning activities to facilitate both knowledge acquisition and retention [ 6 , 7 , 8 ]. For instance, Verkuyl, Hughes [ 8 ] demonstrated that using VPs as gamification tools can improve students’ confidence, engagement, and satisfaction.

However, studies focusing on the specific factors that contribute to these improvements when integrating VPs into the classroom are limited, particularly in understanding how to use VPs in the classroom to facilitate the transfer of knowledge students’ gain from the class to the subsequent studying stage of their education and eventual practice.

Acquisition and transfer of knowledge are critical factors in medical education, as medical students must be able to apply their knowledge and skills to real-world clinical scenarios [ 9 ]. Research suggests that for the effective transfer of knowledge, students should be immersed in authentic environments, enabling the transition of learned competencies to advanced stages [ 10 , 11 , 12 , 13 ].

Despite the consensus on the efficacy of VPs as a tool, there is a gap in understanding how to integrate VPs in the classroom to optimize students’ learning, especially in facilitating learning transfer. The effectiveness of VPs is not just in their use but also in how they are used by students to enhance their understanding on how to reason and make decisions about medical treatments when dealing with clinical cases. Without a clear and deep understanding, we risk underutilizing their potential and losing opportunities for medical students to become well prepared for real-world clinical scenarios.

Certain elements, such as role modeling instruction [ 14 , 15 , 16 ], using various authentic cases [ 17 , 18 , 19 ], and engaging in peer discussions on feedback [ 20 , 21 , 22 ], emerge as potential key components that could be integrated to maximize the knowledge acquisition via VPs. For instance, Stalmeijer, Dolmans [ 23 ] show how an expert, serving as a role model, provides guidance that facilitates student learning by demonstrating clinical skills and reasoning out loud. While there is ample evidence supporting the advantages of inclusion of VPs in education, there is not enough research focusing on the detailed aspects of effective instructional design techniques. This paper delves into these components, seeking to understand how the VP integration influences students’ learning and knowledge transfer. Figure  1 shows the theoretical framework of how integrating VPs in class affects students’ learning and might impact the transfer of learning in a simulated VP environment to practice.

figure 1

Relationship of implementing, impact factor, and transfer of training

This exploratory study aims to investigate how instructional design elements such as role modeling, various authentic cases, and peer dialogues on feedback within VP sessions affect students’ learning from the learner’s perceptions. The core research question in this study focuses on how the implementation of role modeling, various authentic cases, and peer dialogue on feedback in VPs, influences learners’ perception of knowledge gain and transfer in personalized medicine.

The study was conducted at Maastricht University in the elective course, “Personalized Medicine in Cancer Treatment and Care”. This course is open to second-year undergraduate medical students of Maastricht University.

Participants

Initially, 24 students enrolled in this course for the academic year of 2022–2023, and 22 students participated in the Virtual Patient session. In total, 19 students voluntarily completed the survey designed to evaluate their experiences and perceptions of the Virtual Patients session. Thereafter, 9 of the 19 survey respondents voluntarily agreed to participate in three focus group interviews, with 2–4 students in each focus group. Students were informed that participation in this research study had no impact on student’s academic performance or their continuation in their studies.

Intervention

The instructional approach for the VP cases was structured in a specific format for the students. Figure  2 shows the instructional design for VP integration. The first stage was a role-modeling phase, where an expert demonstrated the clinical reasoning process using VP Case A. This was followed by a practice session where students worked in pairs on two different VP cases (Case B and C). After that, students formed two larger groups each including 5 or 6 students, and discussed the system feedback that was provided by VP platform. Finally, the expert summarized the session and addressed students’ questions. The whole intervention lasted 120 min. Figure  1 gives an overview of the intervention steps.

figure 2

The flow of integrated virtual patient session

1. Role modeling (30 min): The intervention started with an expert, a clinician with teaching experience, demonstrating a clinical case (Case A) and showing the clinical reasoning process by thinking aloud. The expert served as a role model in showcasing the approach toward clinical problem-solving, provided supportive information, and demonstrated how to proceed through the case. The aim of the role modeling session was to empower students to apply the insights and methodology gained from experts in case A to solve subsequent cases (case B and case C), Although these cases shared similarities in underlying principles, they diverged on patient characteristics such as age, complications, and smoking history that can influence patient treatment outcomes.

2 and 3. Two VP pair tasks (20 min each): In this segment, the 22 participating students were paired, resulting in 11 pairs. These pairs were then divided into two groups. Group 1 (6 pairs) and group 2 (5 pairs) alternated in going through Case B and Case C to account for the practice effect. These cases were variations of the clinical cases introduced during the role-modeling demonstration, differing in patient characteristics such as age, complications, and smoking history to challenge the students’ reasoning. Students were encouraged to work collaboratively.

4. Feedback discussion (30 min): Upon completion of the VP cases, an automated feedback is immediately provided about the reasoning analysis. Participants were instructed to save this feedback for later discussion. After that, Students were organized into groups of six, based on the sequence in which they engaged with the cases. For instance, those who first practiced with Case B and then proceeded to Case C formed Group (1) Conversely, students who started with case C and then moved on to case B were assembled into Group (2) To foster meaningful dialogue, students engaged in discussions focused on the feedback generated by the Virtual Patient system, guided by a printed discussion guide distributed to each group (see Appendix 2 ). The discussion aimed to deepen students’ understanding and enrich their conversations about the cases they had just completed.

5. Plenary (15 min): This part lasted 15 min. Hosted by the expert to summarize the session and address questions or doubts raised by students.

During the practice and discussion sessions, the expert circulated among the groups to offer additional guidance and support.

The virtual patient cases

Three Virtual Patient (VP) cases (Case A, B, and C) were created to enhance students’ comprehension of specific concepts, knowledge, and skills in clinical reasoning. The VP practice was developed on the P-Scribe ( www.pscribe.nl ) learning platform, a web-based e-learning system based in the Netherlands. The platform facilitates the design and implementation of text-based VP sessions (Appendix 4 ).

While these cases shared a foundation on authentic head and neck cancer treatment, they were characterized by varying patient characteristics in terms of age, gender, and medical history (anamnesis).

figure 3

VP case flow chart

Within each VP case, students were presented with a scenario related to neck cancer. Figure  3 shows the chart of a VP case. Each case starts with an overview of the patient and their medical history which students had to use to make an initial assessment. After this, students encountered a mix of multiple-choice and open-ended practice questions. These questions guided students in planning diagnostics, formulating a diagnosis, and devising a treatment plan tailored to the patient’s specific needs. Immediate feedback was provided after students submitted each response, and comprehensive summative feedback was given at the conclusion of each case to foster understanding and learning from any potential misjudgments or oversights (See Appendix 4 ).

Measurement instruments

Learning-perception survey : The survey (Appendix 1 ) consisted of 20 items, structured into five primary sections: general experience, intended learning outcome, role modeling, practicing with various authentic cases, and reflection on peer dialogue around feedback. The first item asked about students’ general experience through the whole session. The second item focused on their perception of intended learning outcomes. Six items then focused on the students’ perceptions of learning through role modeling followed by 5 items addressing perceptions related to their learning on practicing with authentic cases. The final seven items explored students’ perception of learning from dialogue around feedback. Participants indicated their level of agreement for each statement using a 5-point Likert scale: 1 denoting “Strongly Disagree”, 2 for “Disagree”, 3 for “Neutral”, 4 for “Agree”, and 5 for “Strongly Agree”. For interpretation, average scores below 3 were considered as “in need for improvement”, those of 4 or higher as ‘good’, and those between 3 and 4 as ‘neutral’.

Focus group interviews : Three focus group interviews (Appendix 3 ) were conducted to dive deeper into students’ perceptions of their learning experience, knowledge gain, and knowledge transfer in real-world settings. The focus group took place after the survey and the survey data did not affect the development of the focus group questions. In focus group 1, two students, in focus group 2, two students and in focus group 3, five students participated. The interviews were structured around a series of questions that explored students’ perceptions of their learning across specifically designed sections. These sections included Role Modeling, Practice with Various Authentic Cases, and Dialogue around Feedback. The structure aimed to understand students’ perspectives on each key component of the learning sections.

The analysis of the survey data was conducted by calculating the mean, standard deviation, and the Alpha Coefficient for the responses pertaining to each of the five key dimensions of the survey. The mean score provided an indicator of the average student perception, while the standard deviation offered insights into the variability of the responses. The Alpha Coefficient, a measure of internal consistency, was computed to assess the reliability of the survey dimensions. Through these statistical measures, an overall understanding of the students’ perceptions regarding the various aspects of the Virtual Patients was attained, facilitating a robust analysis aligned with the research objectives.

The focus-group interview data were analyzed following the thematic analysis procedure set out by Braun and Clarke [ 24 ]: (1) familiarize yourself with your data, (2) generate initial codes, (3) search for themes, (4) review themes, (5) define and name themes, and (6) produce the report. The interview was guided by pre-existing frameworks or theories in medical education. This ensured the capture of major aspects of the VP learning experience as underscored in the existing literature: role modeling, using various authentic cases, and peer dialogue around feedback [ 16 , 17 , 18 , 20 , 21 ]. The focus group interview was recorded, transcribed, and coded by three team members and ordered in initial themes (Z.L, M.A, and X.L). These themes were discussed with the larger team. We used a process of inductive and deductive analysis and used the three design principles of role modeling, practice with various authentic cases, and group discussion on feedback as sensitizing concepts to study the data [ 24 ]. Thereafter, quantitative and qualitative analyses were collectively appraised, compared, and checked for inconsistencies. In this triangulation, the themes identified in focus-group interviews were explanatory to the descriptive statistics of the survey.

Trustworthiness

Several measures were taken to enhance the study’s trustworthiness. First, triangulation was achieved by employing multiple data collection methods, including surveys and focus group interviews. The interview data collection continued until saturation was reached, ensuring a comprehensive understanding of the student’s experiences and perceptions. Secondly, the coding process followed an iterative approach. Team members initially coded transcripts independently, and then met to reach a consensus before moving on to code subsequent transcripts. Three researchers conducted the coding independently to minimize bias and enhance the validity of the findings. Finally, a member check among a sample of the focus group interviewees was conducted. In response to the question asking whether they agreed with summaries of preliminary results and would provide comments, confirmatory responses were received as well as some minor additional comments and clarifications. The latter were taken into account in the analysis and interpretation of the data.

Ethical approval

The Maastricht University Ethical Committee reviewed and approved this study. The approval number is FHML-REC/2023/021.

The findings from both the survey data and focus group interviews were presented to explore students’ perceptions of the effectiveness of the Virtual Patient (VP) Session in enhancing their clinical reasoning skills.

Survey data

The survey explored students’ perceptions across five key dimensions: General Experience, Intended Learning Outcome, Role Modeling, Practicing with Various Authentic Cases, and students’ reflection on Peer Dialogue around Feedback. The students scored the VP sessions on 20 items (Table  1 ). The scores varied between M = 2.95 to M = 4.58, on a scale of 1–5.

For the General Experience of Virtual Patient Session (Items Q1-Q2) the average score was M = 4.13 (SD = 0.70). Specifically, the overall experience was positively rated at M = 4.11. The component that assessed the improvement of clinical reasoning skills received an average score of M = 4.16.

Regarding the Students’ Perception of Learning from Role Modeling (Items Q3-Q8), the average score was M = 4.38 (SD = 0.61). Students agreed that the expert demonstration at the start of the session helped them understand the intended learning outcomes and was useful in guiding them through the Virtual Patient cases, with scores ranging from M = 4.26 to M = 4.58.

Students’ perception of learning from practicing with various authentic cases (Items Q9-Q13), received an average score of M = 4.00 (SD = 0.86). The scores measured the students’ perception of how well the provided Virtual Patient cases matched their current level of understanding, enhanced their comprehension of the subject matter, and helped them grasp the complexities inherent in real-world clinical scenarios.

For their perception of learning from Peer Dialogue around Feedback (Questions 14–20), the average score was M = 3.24 (SD = 1.05). These scores measure the students’ perception of the effectiveness of peer dialogue in enhancing understanding, generating strategies to address feedback, and prioritizing areas of improvement.

Focus group interview data

The interviews revealed five themes: ' Which steps to take in clinical reasoning’, ' Asking challenging questions to enhance deeper understanding of knowledge’, ‘The variety in cases helps to enhance transfer to the real world’, and ‘Deeper understanding of reasoning through reflections’.

Which steps to take in clinical reasoning

Students acknowledged the expert’s initial demonstration helped them to develop structured knowledge and gain understanding of the clinical reasoning process.

I think it (Role modeling) helps to find a pattern in clinical reasoning as well. At first, it (the expert) explained to us. For example, are there possible lymph nodes? Yes or no. Then you need to do this and this…Then you can make kind of…pattern that differs for the diagnosis and the prognosis. So you can make kind of a diagram in your head. Which you can use later on. And your knowledge becomes more structured. (Focus Group 2, Student B)

Students also perceived that the integrated practice with Virtual Patients helped them to anticipate the subsequent steps in clinical reasoning. They indicated the patterns learned through practicing with virtual Patients helped them understand the procedures they needed to follow to evaluate the patient.

I think now I know the steps which they (the procedural) followed to evaluate the patient, so first we can do this and then that. First, you determine the TNM (Tumour, Node, Metastasis) staging and do the endoscopy, then the TNM staging, and then you make the treatment plan. Now it’s more clear how they do those steps. (Focus Group 1, Student A)

Moreover, students thought the pair work and dialogue helped them think and clarify with each other what steps they needed to do in clinical reasoning when they had different opinions.

Yeah, that (pair working) was really nice because you can discuss, like I think do this and the other one says, you know, I think do that step, and then you’re already discussing the answers which is really nice to have. (The discussion) really make you think about the steps. (Focus Group 1, Student b)

Challenging their reasoning to enhance deeper understanding

Students reported how the course design differed from other blocks. According to the students, the VP practice was particularly beneficial in helping them integrate knowledge, and make the knowledge their own.

It (the VP practice) helps you to integrate knowledge because other blocks are really only lectures, they are all listening and listening. So the virtual patient was really nice to make this stuff our own. (Focus Group 2, Student A)

Students indicated the examples given by the expert helped them get a better understanding of the more detailed TNM (Tumor, Node, Metastasis) table, that are used in clinical reasoning.

Yeah, she (the expert) gave examples and guided the reading of the tables for TNM (Tumor, Node, Metastasis) staging, and those were also in the Virtual Patient cases, but because she already used them once and explained how we have to use them, it became more clear to us, what these tables are for and how they are used (Focus Group 1, Student B) .

The students noted that in VP practice sessions, compared with passive learning in traditional lectures, they were challenged to engage directly with the material by making clinical decisions, such as selecting appropriate tests to reach a diagnosis.

In lectures, we passively learn the trajectory from symptoms to diagnosis. During Virtual Patient practice, we actively process it. So you have to make decisions and select the test etc. (Focus Group 2, Student B)

Students indicated that practicing with the VP cases challenged them to look up information and reasoned by themselves. They gave an example of the imaging practice in which they were tasked with examining specific body parts in medical images on their own, they thought they were challenged to reason about what they saw instead of getting the information directly.

Yeah, also the (medical) imaging in the assignments where you need to look at a specific part of the body, normally you just see a picture and someone says, yeah, this is the stomach or this is the heart, whatever, and now you need to look it up yourself and think about it yourself, what you see, so that really helps. (Focus Group 1, Student B)

Furthermore, they emphasized the questions asked by experts challenged them to think, put the knowledge in their own words and apply the knowledge with their own reasoning.

The questions she (the expert) asked really make you think about the things she’s learning(teaching). So if she asks questions, you’re really thinking, and yeah, you’re challenged to put it in your own words. (Focus Group 1, Student B) For instance, she (the expert) asked questions that not from official guidelines, instead, it came from where widely doctor worked and her personal experiences. I applied what she said with my own reasoning behind it. (Focus Group 2, Student B)

Transfer of knowledge

Students perceived that practicing with VP cases in different situations offered them hands-on experience, where they actively engaged with various situations, which prepared them for future patient interactions.

Having cases that are closer to the real world, like the comorbidity we discussed, would make it more realistic. (For instance, ) What if he also has obesity or diabetes? Those are the patients that we are going to see in the future. So it helps out a lot to have those different conditions as well. (Focus Group 2, Student B)

Students also indicated their preference for the structured approach of the VP session, where an initial demonstration by an expert, sharing their clinical experience, followed by hands-on practice with VP cases was perceived to enhance transfer to practice. This method, as described by the student, bridged the gap between theoretical knowledge and practical application. They think this structure made the knowledge clear and further helped them to transfer their knowledge from theory to practice.

You (the Virtual Patient session that integrated with role modeling, authentic VP practice, and peer discussion around feedback) made it (the clinical reasoning) clear for me because of the first case we discussed with the teacher. Well, he discussed it and showed us how to think, and how to get things from certain perspectives with risk factors, age, et cetera. And then we do it ourselves. We had to find out what was wrong and go on. So I quite liked it. It gave me a deeper understanding. (Focus Group 3, Student A)

Students indicated the sense of practical immersion is amplified by the “side information that you don’t really need” (Focus Group 3, Student E) from the cases. They highlighted the side information represented the interaction with real patients and made them think of clinical situations in real-world settings.

(Side) information would be more realistic, also side information that you don’t really need because a patient also tells you a lot of things, and some of those things aren’t as important, but you still need to decide if they are important or not. What do you see, why do you see it, what’s different than normal. (Focus Group 3, Student E)

Moreover, several students indicated that the hypothetical “what-if” discussions during the role modeling session helped them with reasoning, prompting them to consider complications that might arise in real-life medical situations.

So for example, about age, it’s more difficult to do a treatment above 70. (What if that patient) has things like smoking history and that kind of stuff. I think it’s really valuable because you have already had an example about it (Demonstrating Case A). (Focus Group 1, Student A)

Students indicated that the diagnosis practice in VP led them to realize the difference in real-world scenarios. They said while in the simulated environment might seem easy to choose multiple diagnostic options, in the real world, medical professionals must make more selective decisions due to limitations. They think this experience taught them to think of prioritizing and decision-making in a realistic medical setting.

Yeah, maybe also there (in VP cases) were also a question about which imaging techniques you would use and then it was Echo or CT, MRI, there was also an option where you could listen to the lungs and some of the people also checked that one, but it isn’t really necessary, so you think it only takes one minute, so why not, but in the real world there isn’t always time to do everything, so it’s also good to think what is really necessary and what’s not. (Focus Group 1, Student A)

Enhance reasoning through reflections

During the VP session, students received feedback and conducted conversations around the feedback provided by the Virtual Patient system. Students thought the peer dialogues around feedback provided opportunities for collective reflection and insights, allowing them to pinpoint areas of improvement.

I thought that (the peer dialogue) was really useful, because sometimes one person, for example, when the teacher explains everything, you don’t pick up everything he says. She (your peer) might pick up a different thing, and I pick up a different thing, and we can ask each other, do you know how this works? So I thought that was really useful. (Focus Group 3, Student B)

The students emphasized the importance of expressing and discussing different opinions. They noted that such interactions could provide new insights and perspectives that they would not have considered independently, thereby enriching their understanding.

When you do have different opinions, I think they (your peers) can give you insight that you maybe didn’t have for yourself. So you can add to each other’s knowledge. If somebody has another view, then we can discuss it. It (the discussion) brightens my tunnel view. Also having to say it (the knowledge) out loud and explaining your thoughts to someone else can also help, I think. (Focus Group 2, Student A)

When talking about the peer dialogues around feedback during the VP session, Some students highlighted the benefits of immediate feedback, which provided them with clarity and instant validation. However, others saw value in delayed feedback, as it fostered discussion and multiple interpretations.

I liked that the Virtual Patient program, that it gave you immediate feedback. That was really handy. And I also liked the discussion afterward so we could speak about it a bit more (Focus Group 3, Student B) . There was immediate feedback on most questions, so you knew if you had been correct or wrong. But for the learning process it might be handy to have that after the group discussion, because now we all have the same answer. (Focus Group 2, Student B)

The study demonstrated the perception of students’ learning and knowledge transfer by integrating VP cases with role modeling introductions, and peer dialogue around feedback, specifically in the context of personalized medicine in cancer treatment and care. The survey reflected a positive learning experience and students reported they gained a better understanding of the clinical reasoning process as well as which steps to take when dealing with a clinical case through this specific course design with integration of VP cases. Qualitative data showed that the integration of VPs into the educational setting clearly shifted the students from being passive observers in a traditional lecture-based format to active participants in a simulated clinical environment. This shift is in line with previous research findings, which suggest that the use of VPs in clinical training actively engages learners and encourages the application of their knowledge [ 4 ].

The quantitative data revealed that students highly valued the role modeling session, as indicated by the high average scores. Qualitative data explained that the role modeling session enabled students to not only observe the clinical process being demonstrated but also to engage in active thinking by interacting with the expert. As discussed by Cruess, Cruess [ 15 ], role modeling not only consciously imparts knowledge but also unconsciously influences students’ attitudes and behaviors, making the learning experience more relatable to the clinical environment. In this study, by sharing clinical reasoning and personal anecdotes during the class, experts made the learning experience more relatable to the clinical environment that students would face in the future. This mirrored the role modeling research by Morgenroth, Ryan [ 25 ] which emphasizes the importance of role models in shaping the self-concept and motivation of individuals. Moreover, the qualitative data showed that the demonstration by the expert serves as a fundamental pre-knowledge for students to cover the knowledge gap and prepare them with the following practice. This finding aligns with van Merrienboer’s scaffolding concept emphasizing the importance of initial expert guidance in learning processes [ 16 ].

Followed by the role modeling demonstration, students practiced on two VP cases in pairs and perceived that the VP practice enhanced their clinical reasoning skills, and also helped them understand the real-world clinical setting. The result showed that the variety and real-life complexity of cases in the VP sessions were perceived to be essential for students’ knowledge gain and transfer. The positive perception of various authentic cases aligns with previous research highlighting the importance of exposure to diverse and authentic scenarios in medical training [ 17 , 18 ]. Moreover, the hypothetical “what-if” scenarios further enhanced students’ analytical abilities, preparing them for the multifaceted challenges they would encounter in real-world medical situations. Survey responses (Q10, mean = 4.37; Q13, mean = 4.05 in Table  1 ) indicated a consensus among students on the improvement with this practice in understanding and applying knowledge. Our findings corroborate with Jonassen and Hernandez-Serrano [ 26 ]’s study emphasis on the importance of authentic learning environments for effective knowledge transfer.

After the practice, students discussed the feedback provided by the VP system. Despite its mixed quantitative reception, the peer dialogue on feedback was qualitatively found to be a vital component for promoting critical thinking, discussion, and reflection. The Feedback from the VPs, both immediate and delayed, along with peer dialogue, emerged as crucial elements in students’ learning process. In this study, students showed different preferences for receiving feedback. Some students preferred immediate feedback, however, others preferred delayed feedback. How feedback was provided notably influenced peer interactions. Given that immediate feedback was dispensed upon submission of answers, the peer dialogues automatically started when students noticed disparities or encountered obstacles. Such dialogues not only served to resolve ambiguities but also fostered collective reflection, enhancing comprehension of the subject. By vocalizing their thoughts and engaging in active discussions, students were able to solidify their understanding and uncover nuances they might have missed otherwise. This aligns with the importance of engaging in peer discussions on feedback as outlined in the theoretical background [ 20 , 21 , 22 ].

When looking at the integration of VP cases with the particular course design, students perceived that the expert demonstration, followed by VP practice, and peer dialogue around feedback fostered a comprehensive understanding, allowing them to integrate diverse clinical knowledge, which in turn promoted understanding. The “Watch-think-do-reflect” structure not only ensured better knowledge retention but also enhanced students’ enthusiasm towards the subject. Observing model demonstrations enabled students to assimilate clinical nuances and contemplate real-world applications. Subsequent hands-on practice with VP cases fortified their cognitive structures, honing their clinical reasoning. Ultimately, students perceived that reflective peer discussions on feedback solidified their learnings, enhancing knowledge retention.

Limitations

This study employed a survey and focus group interviews that provided a comprehensive understanding of students’ perceptions of learning. However, there are several limitations. The study had a small sample size and was conducted in the context of an elective course, which may limit the generalizability of the findings. Furthermore, the study was exploratory in nature and did not measure actual learning outcomes or long-term retention, which are critical aspects of educational impact.

Implications for future research

Future research should investigate whether integrating Virtual Patients (VPs) into classroom activities enhance student learning outcomes by incorporating learning assessments and involving larger and more diverse participant groups to validate our findings. Additionally, a deeper analysis of students’ reasoning processes and interactions could provide insights into how and why knowledge gain and transfer are fostered or hindered. Furthermore, it is also important to understand the most beneficial moment for integrating VPs into educational settings to enhance transfer from a simulated to a real practice setting. This understanding could inform the development of more effective educational strategies and interventions.

The integration of Virtual Patients into classroom learning appears to offer a promising approach to enrich medical education. Key elements such as role modeling and various authentic cases contribute positively to students’ perception of learning, as well as peer dialogue on feedback. However, the approach to peer dialogue on feedback may need to be refined for more consistent benefits. Furthermore, studies with larger sample sizes and broader participant groups are essential to provide robust support for the efficacy of this educational approach and its components.

Data availability

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Cook DA, Triola MM. Virtual patients: a critical literature review and proposed next steps. Med Educ. 2009;43(4):303–11.

Article   Google Scholar  

Garrett BM, Callear D. The value of intelligent multimedia simulation for teaching clinical decision-making skills. Nurse Educ Today. 2001;21(5):382–90.

Peddle M, Bearman M, Nestel D. Virtual patients and nontechnical skills in undergraduate health professional education: an integrative review. Clin Simul Nurs. 2016;12(9):400–10.

Sanders CL, Kleinert HL, Free T, Slusher I, Clevenger K, Johnson S, et al. Caring for children with intellectual and developmental disabilities: virtual patient instruction improves students’ knowledge and comfort level. J Pediatr Nurs. 2007;22(6):457–66.

Sijstermans R, Jaspers MWM, Bloemendaal PM, Schoonderwaldt EM. Training inter-physician communication using the dynamic patient Simulator®. Int J Med Informatics. 2007;76(5):336–43.

Buttussi F, Chittaro L. Effects of different types of virtual reality display on presence and learning in a safety training scenario. IEEE Trans Vis Comput Graph. 2017;24(2):1063–76.

Makransky G, Bonde MT, Wulff JS, Wandall J, Hood M, Creed PA, et al. Simulation based virtual learning environment in medical genetics counseling: an example of bridging the gap between theory and practice in medical education. BMC Med Educ. 2016;16(1):1–9.

Verkuyl M, Hughes M, Tsui J, Betts L, St-Amant O, Lapum JL. Virtual gaming simulation in nursing education: a focus group study. J Nurs Educ. 2017;56(5):274–80.

Burke LA, Hutchins HM. Training transfer: an integrative literature review. Hum Resour Dev Rev. 2007;6(3):263–96.

Durning SJ, Artino AR Jr, Schuwirth L, Van Der Vleuten C. Clarifying assumptions to enhance our understanding and assessment of clinical reasoning. Acad Med. 2013;88(4):442–8.

Marei HF, Donkers J, Al-Eraky MM, van Merrienboer JJ. The effectiveness of sequencing virtual patients with lectures in a deductive or inductive learning approach. Med Teach. 2017;39(12):1268–74.

Marei HF, Donkers J, Al-Eraky MM, Van Merrienboer JJ. Collaborative use of virtual patients after a lecture enhances learning with minimal investment of cognitive load. Med Teach. 2019;41(3):332–9.

Tolsgaard MG, Jepsen RM, Rasmussen MB, Kayser L, Fors U, Laursen LC, et al. The effect of constructing versus solving virtual patient cases on transfer of learning: a randomized trial. Perspect Med Educ. 2016;5:33–8.

Burgess A, Oates K, Goulston K. Role modelling in medical education: the importance of teaching skills. Clin Teach. 2016;13(2):134–7.

Cruess SR, Cruess RL, Steinert Y. Role modelling—making the most of a powerful teaching strategy. BMJ. 2008;336(7646):718–21.

Van Merriënboer JJ, Kirschner PA. Ten steps to complex learning: a systematic approach to four-component instructional design: Routledge; 2017.

Google Scholar  

Berman NB, Durning SJ, Fischer MR, Huwendiek S, Triola MM. The role for virtual patients in the future of medical education. Acad Med. 2016;91(9):1217–22.

Lowell VL, Yang M. Authentic learning experiences to improve online instructor’s performance and self-efficacy: the design of an online mentoring program. TechTrends. 2023;67(1):112–23.

Sevy-Biloon J, Chroman T. Authentic use of technology to improve EFL communication and motivation through international language exchange video chat. Teach Engl Technol. 2019;19(2):44–58.

Dmoshinskaia N, Gijlers H, de Jong T. Giving feedback on peers’ concept maps as a learning experience: does quality of reviewed concept maps matter? Learn Environ Res. 2022;25(3):823–40.

Foster A, Chaudhary N, Kim T, Waller JL, Wong J, Borish M, et al. Using virtual patients to teach empathy: a randomized controlled study to enhance medical students’ empathic communication. Simul Healthc. 2016;11(3):181–9.

Schillings M, Roebertsen H, Savelberg H, Whittingham J, Dolmans D. Peer-to-peer dialogue about teachers’ written feedback enhances students’ understanding on how to improve writing skills. Educational Stud. 2020;46(6):693–707.

Stalmeijer RE, Dolmans DH, Snellen-Balendong HA, van Santen-Hoeufft M, Wolfhagen IH, Scherpbier AJ. Clinical teaching based on principles of cognitive apprenticeship: views of experienced clinical teachers. Acad Med. 2013;88(6):861–5.

Braun V, Clarke V. Using thematic analysis in psychology. Qualitative Res Psychol. 2006;3(2):77–101.

Morgenroth T, Ryan MK, Peters K. The motivational theory of role modeling: how role models influence role aspirants’ goals. Rev Gen Psychol. 2015;19(4):465–83.

Jonassen DH, Hernandez-Serrano J. Case-based reasoning and instructional design: using stories to support problem solving. Education Tech Research Dev. 2002;50(2):65–77.

Download references

Acknowledgements

Thanks to all the participants and education workers who contributed to the study. ZL was supported by a scholarship granted by the China Scholarship Council. Thanks for the support of my family, and thanks Ang Li for joining our family.

ZL was supported by a scholarship granted by the China Scholarship Council (CSC, 202208440100).

Author information

Authors and affiliations.

Department of Educational Development & Research, School of Health Professions Education, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, Netherlands

Zhien Li, Maryam Asoodar, Xian Liu & Diana Dolmans

School of Health Professions Education, Department of Health Services Research, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, Netherlands

Nynke de Jong

Department of Radiation Oncology (Maastro), GROW School for Oncology and Reproduction, Maastricht University Medical Centre, Maastricht, The Netherlands

Tom Keulers

You can also search for this author in PubMed   Google Scholar

Contributions

ZL, MA, DD, and NJ conceived of the presented idea. MA and DD verified the analytical methods. TK and ZL contribute to the creation of learning materials. ZL analyzed the data and drafted the manuscript under the supervision of MA and DD. All authors contributed to the article and approved the submitted version.

Corresponding author

Correspondence to Zhien Li .

Ethics declarations

Ethics approval and consent to participate.

The Maastricht University Ethical Committee reviewed and approved this study. The approval number is FHML-REC/2023/021. All participants were informed about the aims, methods, their right to withdraw, and anticipated benefits of the study. Written informed consent was obtained from all participants prior to their inclusion in the study.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary material 2, supplementary material 3, supplementary material 4, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Li, Z., Asoodar, M., de Jong, N. et al. Perception of enhanced learning in medicine through integrating of virtual patients: an exploratory study on knowledge acquisition and transfer. BMC Med Educ 24 , 647 (2024). https://doi.org/10.1186/s12909-024-05624-7

Download citation

Received : 18 March 2024

Accepted : 03 June 2024

Published : 11 June 2024

DOI : https://doi.org/10.1186/s12909-024-05624-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Virtual patient
  • Medical education
  • Role modeling
  • Authentic cases

BMC Medical Education

ISSN: 1472-6920

sample size of qualitative research

Journal of Applied Crystallography Journal of Applied
Crystallography

Journals Logo

1. Introduction

2. formulation of the proposed framework, 3. formulation of a multicomponent monodisperse spheres model, 4. numerical experiments, 5. discussion, 6. conclusions.

sample size of qualitative research

Format BIBTeX
EndNote
RefMan
Refer
Medline
CIF
SGML
Plain Text
Text

sample size of qualitative research

research papers \(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

JOURNAL OF
APPLIED
CRYSTALLOGRAPHY

Open Access

Qu­antitative selection of sample structures in small-angle scattering using Bayesian methods

a Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Chiba 277-8561, Japan, b Japan Synchrotron Radiation Research Institute, Sayo, Hyogo 679-5198, Japan, c National Institute for Materials Science, Tsukuba, Ibaraki 305-0047, Japan, and d Facalty of Advanced Science and Technology, Kumamoto University, Kumamoto 860-8555, Japan * Correspondence e-mail: [email protected]

Small-angle scattering (SAS) is a key experimental technique for analyzing nanoscale structures in various materials. In SAS data analysis, selecting an appropriate mathematical model for the scattering intensity is critical, as it generates a hypothesis of the structure of the experimental sample. Traditional model selection methods either rely on qualitative approaches or are prone to overfitting. This paper introduces an analytical method that applies Bayesian model selection to SAS measurement data, enabling a quantitative evaluation of the validity of mathematical models. The performance of the method is assessed through numerical experiments using artificial data for multicomponent spherical materials, demonstrating that this proposed analysis approach yields highly accurate and interpretable results. The ability of the method to analyze a range of mixing ratios and particle size ratios for mixed components is also discussed, along with its precision in model evaluation by the degree of fitting. The proposed method effectively facilitates quantitative analysis of nanoscale sample structures in SAS, which has traditionally been challenging, and is expected to contribute significantly to advancements in a wide range of fields.

Keywords: small-angle X-ray scattering ; small-angle neutron scattering ; nanostructure analysis ; model selection ; Bayesian inference .

SAS measurement data are expressed in terms of scattering intensity that corresponds to a scattering vector, a physical quantity representing the scattering angle. Data analysis requires selection and parameter estimation of a mathematical model of the scattering intensity that contains information about the structure of the specimen. This selection process is critical as it involves assumptions about the structure of the specimen.

We conducted numerical experiments to assess the effectiveness of our proposed method. These experiments are based on synthetic data used to estimate the number of distinct components in a specimen, which was modeled as a mixture of monodisperse spheres of varying radii, scattering length densities and volume fractions. The results demonstrate the high accuracy, interpretability and stability of our method, even in the presence of measurement noise. To discuss the utility of the proposed method, we compare our approach with traditional model selection methods based on the reduced χ -squared error.

In this section, we present a detailed formulation of our algorithm for selecting mathematical models for SAS specimens using Bayesian model selection. The pseudocode for this algorithm is provided in Algorithm 1.

2.1. Bayesian model selection

The likelihood is thus expressed as

Let φ ( K ) be the prior distribution of the parameter K that characterizes the model, and φ ( Ξ | K ) be the prior distribution of the model parameters Ξ . Then, from Bayes' theorem, the posterior distribution of the parameters given the measurement data can be written as

2.2. Calculation of marginal likelihood

Sampling from the joint probability distribution at each inverse temperature gives

2.3. Estimation of model parameters

In this paper, we consider isotropic scattering and focus on the scattering vector's magnitude q , defined as

Monodisperse spheres are spherical particles of uniform radius. The scattering intensity I ( q ,  ξ ) of a specimen composed of sufficiently dilute monodisperse spheres of a single type for the scattering vector magnitude q is given by

To formulate the scattering intensity of a specimen composed of K types of monodisperse sphere, we assume a dilute system and denote the particle size of the k th component in the sample as R k and the scale as S k . The scattering intensity of a sample composed of K types of monodisperse sphere is then given by


An illustration of a mixture of two types of spherical specimen. This shows scenarios with two components ( = 2), including mixtures of spherical particles of different sizes or volume fractions, and aggregates from a single particle type approximated as a large sphere.

The numerical experiments reported in this section were conducted with a burn-in period of 10 5 and a sample size of 10 5 for the REMC. We set the number of replicas for REMC, the values of inverse temperature and the step size of the Metropolis method taking into consideration the state exchange rate and the acceptance rate.

4.1. Generation of synthetic data

(i) Set the number of data points to N = 400 and define the scattering vector magnitudes at N equally spaced points within the interval [0.1, 3] to obtain { q i } i =1 N =400 (nm −1 ).

In this section, we consider cases with pseudo-measurement times of T = 1 and T = 0.1. Generally, smaller values of T indicate greater effects from measurement noise.

4.2. Setting the prior distributions

In the Bayesian model selection framework, prior knowledge concerning the parameters Ξ and the model-characterizing parameter K is set as their prior distributions.

In this numerical experiment, the prior distributions for the parameters Ξ were set as Gamma distributions based on the pseudo-measurement time T used during data generation, while the prior for K was a discrete uniform distribution over the interval [1, 4].


Plots of the prior distributions for various parameters. ( ) Prior distribution of , φ( ). ( ) Prior distribution of ) Prior distribution of , φ( ). ( ) Prior distribution of , φ( ).

4.3. Results for two-component monodisperse spheres based on scale ratio

The ratio of the scale parameters S 1 and S 2 for spheres 1 and 2 during data generation, denoted r S , is defined as


Parameter values used for data generation with varying

  Sphere 1 Sphere 2
Radius (nm) 2 10
Scale 250 {250, 100, 20, 0.5, 0.1, 0.05}
Background (cm ) 0.01
Pseudo-measurement time {1, 0.1}

Fitting to synthetic data generated at various values and residual plots. Panels and show cases for pseudo-measurement times of = 1 and = 0.1, respectively. In plots ( )–( ) and ( )–( ), the scale ratio is displayed in descending order for = 1 and = 0.1, respectively. Black circles represent the generated data and the black dotted lines indicate the true scattering intensity curves. For models = 1, = 2, = 3 and = 4, the fitting curves and residual plots are represented by blue dashed–dotted lines, red dashed lines, orange solid lines and green dotted lines, respectively. Fitting curves were plotted using 1000 parameter samples that were randomly selected from the posterior probability distributions for each model. The width of the distribution of these fitting curves reflects the confidence level at each point.

Results of Bayesian model selection among models = 1–4 for varying values. Panel shows the posterior probability for each model using data generated with a pseudo-measurement time of = 1, and panel shows results for = 0.1. In cases ( )–( ) and ( )–( ), the scale ratio is displayed in descending order for = 1 and = 0.1, respectively. The height of each bar corresponds to the average values calculated for ten data sets generated with different random seeds, with maximum and minimum values shown as error bars. Areas highlighted in red indicate cases where, on average, the highest probability was found for the true model with = 2, while blue backgrounds indicate that models other than = 2 were associated with the highest probability on average.


The number of times each model was associated with the highest probability in numerical experiments for ten data sets generated with different random seeds at each value

) = 1

 
1 2 3 4
( ) 1.0 0 0 0
( ) 0.4 0 0 0
( ) 0.08 0 0 0
( ) 0.002 0 0 0
( ) 0.0004 0 0 0
( ) 0.0002 2 0 0
) = 0.1

 
1 2 3 4
( ) 1.0 0 0 0
( ) 0.4 0 0 0
( ) 0.08 0 0 0
( ) 0.002 0 0 0
( ) 0.0004 1 0 0
( ) 0.0002 0 0 0

4.4. Results for two-component monodisperse spheres based on radius ratio

During synthetic data generation, the ratio of the radii R 1 and R 2 of spheres 1 and 2, denoted r R , was defined as

In this setup, we generated seven types of data by varying the value of r R for pseudo-measurement times of T = 1 and T  = 0.1.


Parameter values used for data generation when varying

  Sphere 1 Sphere 2
Radius (nm) {9.9, 9.7, 9.5, 0.5, 0.5, 0.4, 0.3} 10
Scale 250 100
Background (cm ) 0.01  
Pseudo-measurement time {1, 0.1}  

Fitting to synthetic data generated at various values and residual plots. Panels and show cases for pseudo-measurement times of = 1 and = 0.1, respectively. In plots ( )–( ) and ( )–( ), the radius ratio is displayed in descending order for = 1 and = 0.1, respectively. Black circles represent the generated data and the black dotted lines indicate the true scattering intensity curves. For models = 1, = 2, = 3 and = 4, the fitting curves and residual plots are represented by blue dashed–dotted lines, red dashed lines, orange solid lines and green dotted lines, respectively. Fitting curves were plotted using 1000 parameter samples that were randomly selected from the posterior probability distributions for each model. The width of the distribution of these fitting curves reflects the confidence level at each point.

Results of Bayesian model selection among models = 1–4 for varying values. Panel shows the posterior probability of each model using data generated with a pseudo-measurement time of = 1, and panel shows results for = 0.1. In cases ( )–( ) and ( )–( ), the radius ratio is displayed in descending order for = 1 and = 0.1, respectively. The height of each bar corresponds to the average values calculated for ten data sets generated with different random seeds, with the maximum and minimum values shown as error bars. Areas highlighted in red indicate cases where the true model = 2 was most highly supported, while the blue backgrounds indicate that the likelihood of a model other than = 2 was the highest.


The number of times each model was most highly supported in numerical experiments for ten data sets generated by varying values

) = 1

 
1 2 3 4
( ) 0.99 1 0 0
( ) 0.97 0 0 0
( ) 0.95 0 0 0
( ) 0.5 0 0 0
( ) 0.05 0 0 0
( ) 0.04 1 0 0
( ) 0.03 0 0 0
) = 0.1

 
1 2 3 4
( ) 0.99 0 0 0
( ) 0.97 2 0 0
( ) 0.95 0 0 0
( ) 0.5 0 0 0
( ) 0.05 1 0 0
( ) 0.04 3 0 0
( ) 0.03 0 0 0

5.1. Limitations of the proposed method

5.2. model selection based on χ -squared error.

In SAS data analysis, selecting an appropriate mathematical model for the analysis is a crucial but challenging process. In this subsection, we compare the conventional model selection method based on the χ -squared error with the results of model selection using our proposed method.


The fitting results and residual plots for the data shown in Fig. 3 ( ) were derived using parameters that minimize the χ-squared error from the posterior probability distributions for models ranging from = 1 to = 4. For each of these models, the fitting curves and their corresponding residual plots are represented by blue dashed–dotted lines, red dashed lines, orange solid lines and green dotted lines, respectively. The legend indicates the reduced χ-squared values for each model ( = 1 to = 4).


Model selection results based on reduced χ-squared values

-squared value to 1 for ten data sets generated with different random seeds for each setting = 1. Labels ( ) to ( ) refer to the settings in Figs. 3–4 and Table 2. The cases with the highest level of support for each data set are shown in bold.

 
1 2 3 4
( ) 1.0 0 2 0\sim
( ) 0.4 0 0 1
( ) 0.08 0 0 1
( ) 0.002 0 0 0
( ) 0.0004 0 4 1
( ) 0.0002 0 2 0

In this paper, we have introduced a Bayesian model selection framework for SAS data analysis that quantitatively evaluates model validity through posterior probabilities. We have conducted numerical experiments using synthetic data for a two-component system of monodisperse spheres to assess the performance of the proposed method.

We have identified the analytical limits of the proposed method, under the settings of this study, with respect to the scale and radius ratios of two-component spherical particles, and compared the performance of traditional model selection methods based on the reduced χ -squared.

The numerical experiments and subsequent discussion reveal the range of parameters that can be analyzed using the proposed method. Within that range, our method provides stable and highly accurate model selection, even for data with significant noise or in situations in which qualitative model determination is challenging. In comparison with the traditional method of selecting models based on fitting curves and data residuals, it was found that the proposed method offers greater accuracy and stability.

SAS is used to study specimens with a variety of structures other than spheres, including cylinders, core–shell structures, lamellae and more. The proposed method should be applied to other sample models to determine the feasibility of expanding the analysis beyond the case examined here to broader experimental settings. Future work could benefit from using the proposed method to conduct real data analysis, which is expected to yield new insights through our more efficient analysis approach.

Funding information

This work was supported by JST CREST (grant Nos. PMJCR1761 and JPMJCR1861) from the Japan Science and Technology Agency (JST) and by a JSPS KAKENHI Grant-in-Aid for Scientific Research (A) (grant No. 23H00486).

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence , which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.

Follow J. Appl. Cryst.

COMMENTS

  1. Big enough? Sampling in qualitative inquiry

    O'Reilly M, Parker N (2013) Unsatisfactory saturation: A critical exploration of the notion of saturated sample sizes in qualitative research. Qualitative Research 13(2): 190-197. Crossref. ISI. Google Scholar. Patton MQ (2002) Two decades of developments in qualitative inquiry: A personal, experiential perspective.

  2. Sample sizes for saturation in qualitative research: A systematic

    Sample sizes in qualitative research are guided by data adequacy, so an effective sample size is less about numbers (n's) and more about the ability of data to provide a rich and nuanced account of the phenomenon studied. Ultimately, determining and justifying sample sizes for qualitative research cannot be detached from the study ...

  3. Series: Practical guidance to qualitative research. Part 3: Sampling

    The usually small sample size in qualitative research depends on the information richness of the data, the variety of participants (or other units), the broadness of the research question and the phenomenon, the data collection method (e.g., individual or group interviews) and the type of sampling strategy.

  4. Determining the Sample Size in Qualitative Research

    finds a variation of the sample size from 1 to 95 (averages being of 31 in the first ca se and 28 in the. second). The research region - one of t he cultural factors, plays a significant role in ...

  5. Sample Size and its Importance in Research

    The sample size for a study needs to be estimated at the time the study is proposed; too large a sample is unnecessary and unethical, and too small a sample is unscientific and also unethical. The necessary sample size can be calculated, using statistical software, based on certain assumptions. If no assumptions can be made, then an arbitrary ...

  6. Sample Size in Qualitative Interview Studies: Guided by Information

    The prevailing concept for sample size in qualitative studies is "saturation." Saturation is closely tied to a specific methodology, and the term is inconsistently applied. We propose the concept "information power" to guide adequate sample size for qualitative studies.

  7. Sample sizes for saturation in qualitative research: A systematic

    These results provide strong empirical guidance on effective sample sizes for qualitative research, which can be used in conjunction with the characteristics of individual studies to estimate an appropriate sample size prior to data collection. This synthesis also provides an important resource for researchers, academic journals, journal ...

  8. Sample size for qualitative research

    Sample size in qualitative research is always mentioned by reviewers of qualitative papers but discussion tends to be simplistic and relatively uninformed. The current paper draws attention to how sample sizes, at both ends of the size continuum, can be justified by researchers. This will also aid reviewers in their making of comments about the ...

  9. Characterising and justifying sample size sufficiency in interview

    Sample size in qualitative research has been the subject of enduring discussions [4, 10, 11]. Whilst the quantitative research community has established relatively straightforward statistics-based rules to set sample sizes precisely, the intricacies of qualitative sample size determination and assessment arise from the methodological ...

  10. Sample Size Policy for Qualitative Studies Using In-Depth Interviews

    There are several debates concerning what sample size is the right size for such endeavors. Most scholars argue that the concept of saturation is the most important factor to think about when mulling over sample size decisions in qualitative research (Mason, 2010).Saturation is defined by many as the point at which the data collection process no longer offers any new or relevant data.

  11. Sample size in qualitative research

    Determining adequate sample size in qualitative research is ultimately a matter of judgment and experience in evaluating the quality of the information collected against the uses to which it will be put, the particular research method and purposeful sampling strategy employed, and the research product intended. ©1995 John Wiley & Sons, Inc. ...

  12. Sample size for qualitative research.

    Purpose: Qualitative researchers have been criticised for not justifying sample size decisions in their research. This short paper addresses the issue of which sample sizes are appropriate and valid within different approaches to qualitative research. Design/methodology/approach: The sparse literature on sample sizes in qualitative research is reviewed and discussed. This examination is ...

  13. (I Can't Get No) Saturation: A simulation and guidelines for sample

    I explore the sample size in qualitative research that is required to reach theoretical saturation. I conceptualize a population as consisting of sub-populations that contain different types of information sources that hold a number of codes. Theoretical saturation is reached after all the codes in the population have been observed once in the sample. I delineate three different scenarios to ...

  14. (PDF) Qualitative Research Designs, Sample Size and Saturation: Is

    The burden of offering adequate sample sizes in research has been one of. the major criticisms against qualitative s tudies. One of the most acceptable standards in qualitative research is to ...

  15. Sample size in qualitative research

    Sample Size. A common misconception about sampling in qualitative research is that numbers are unimportant in ensuring the adequacy of a sampling strategy. Yet, simple sizes may be too small to support claims of having achieved either informational redundancy or theoretical saturation, or too large to permit the ….

  16. Factors to determine your sample size for qualitative research

    Consider: Planning the journey of your qualitative research. 1. Have different research methods for different stages of your research journey. 2. Be open to new methods of collecting data and information. 3. Break up your larger sample into smaller groups depending on how they answer or score in preliminary research activities.

  17. Sampling in Qualitative Research

    There is seldom a simple answer to the question of sample or cell size in qualitative research. There is no single formula or criterion to use. A "gold standard" that will calculate the number of people to interview is lacking (cf. Morse 1994). The question of sample size cannot be determined by prior knowledge of effect sizes, numbers of ...

  18. Characterising and justifying sample size sufficiency in interview

    Sample adequacy in qualitative inquiry pertains to the appropriateness of the sample composition and size.It is an important consideration in evaluations of the quality and trustworthiness of much qualitative research [] and is implicated - particularly for research that is situated within a post-positivist tradition and retains a degree of commitment to realist ontological premises - in ...

  19. PDF Quantitative Approaches for Estimating Sample Size for Qualitative

    Sample size estimation in qualitative research: Conclusions 1) Specific approaches can be used to estimate sample size in qualitative research, e.g. to assess concept saturation. -These need to be considered alongside other issues, and may also only be able to be applied once data have been collected.

  20. How to Justify Sample Size in Qualitative Research

    To bring this one home, let's answer the question we sought out to investigate: the sample size in qualitative research. Typically, sample sizes will range from 6-20, per segment. (So if you have 5 segments, 6 is your multiplier for the total number you'll need, so you would have a total sample size of 30.) For very specific tasks, such as ...

  21. Sample Size in Qualitative Interview Studies: Guided by Information

    The prevailing concept for sample size in qualitative studies is "saturation." Saturation is closely tied to a specific methodology, and the term is inconsistently applied. We propose the concept "information power" to guide adequate sample size for qualitative studies. Information power indicates that the more information the sample holds ...

  22. Sample Sizes in Qualitative UX Research: A Definitive Guide

    A formula for determining qualitative sample size. In 2013, Research by Design published a whitepaper by Donna Bonde which included research-backed guidelines for qualitative sampling in a market research context. Victor Yocco, writing in 2017, drew on these guidelines to create a formula determining qualitative sample sizes.

  23. Qualitative Sample Size Calculator

    What is a good sample size for a qualitative research study? Our sample size calculator will work out the answer based on your project's scope, participant characteristics, researcher expertise, and methodology. Just answer 4 quick questions to get a super actionable, data-backed recommendation for your next study.

  24. Systematic Random Sampling: The Complete Guide

    E.g. to get a sample of 100 out of 1,000, you would select every 10th person. Add your sampling interval until you have the desired sample. Continue choosing your sample members at regular intervals until you have the sample size you need to complete your study. Systematic random sampling use cases and examples

  25. Perception of enhanced learning in medicine through integrating of

    Virtual Patients (VPs) have been shown to improve various aspects of medical learning, however, research has scarcely delved into the specific factors that facilitate the knowledge gain and transfer of knowledge from the classroom to real-world applications. This exploratory study aims to understand the impact of integrating VPs into classroom learning on students' perceptions of knowledge ...

  26. (IUCr) Qu­antitative selection of sample structures in small-angle

    Traditional model selection methods either rely on qualitative approaches or are prone to overfitting. ... The numerical experiments reported in this section were conducted with a burn-in period of 10 5 and a sample size of 10 5 for the REMC. We set the number of replicas for REMC, the values of inverse temperature and the step size of the ...