Crystallography
![sample size of qualitative research Journals Logo](https://journals.iucr.org/logos/iucr_journals_logo_spaces.png)
1. Introduction
2. formulation of the proposed framework, 3. formulation of a multicomponent monodisperse spheres model, 4. numerical experiments, 5. discussion, 6. conclusions.
![sample size of qualitative research](https://journals.iucr.org/logos/buttonlogos/settings.png)
Format | | BIBTeX |
| | EndNote |
| | RefMan |
| | Refer |
| | Medline |
| | CIF |
| | SGML |
| | Plain Text |
| | Text |
|
![sample size of qualitative research](https://journals.iucr.org/logos/buttonlogos/pageviews.png)
research papers \(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)
| JOURNAL OF APPLIED CRYSTALLOGRAPHY |
![sample size of qualitative research Open Access](https://journals.iucr.org/logos/open.png)
Quantitative selection of sample structures in small-angle scattering using Bayesian methods
a Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Chiba 277-8561, Japan, b Japan Synchrotron Radiation Research Institute, Sayo, Hyogo 679-5198, Japan, c National Institute for Materials Science, Tsukuba, Ibaraki 305-0047, Japan, and d Facalty of Advanced Science and Technology, Kumamoto University, Kumamoto 860-8555, Japan * Correspondence e-mail: [email protected]
Small-angle scattering (SAS) is a key experimental technique for analyzing nanoscale structures in various materials. In SAS data analysis, selecting an appropriate mathematical model for the scattering intensity is critical, as it generates a hypothesis of the structure of the experimental sample. Traditional model selection methods either rely on qualitative approaches or are prone to overfitting. This paper introduces an analytical method that applies Bayesian model selection to SAS measurement data, enabling a quantitative evaluation of the validity of mathematical models. The performance of the method is assessed through numerical experiments using artificial data for multicomponent spherical materials, demonstrating that this proposed analysis approach yields highly accurate and interpretable results. The ability of the method to analyze a range of mixing ratios and particle size ratios for mixed components is also discussed, along with its precision in model evaluation by the degree of fitting. The proposed method effectively facilitates quantitative analysis of nanoscale sample structures in SAS, which has traditionally been challenging, and is expected to contribute significantly to advancements in a wide range of fields.
Keywords: small-angle X-ray scattering ; small-angle neutron scattering ; nanostructure analysis ; model selection ; Bayesian inference .
SAS measurement data are expressed in terms of scattering intensity that corresponds to a scattering vector, a physical quantity representing the scattering angle. Data analysis requires selection and parameter estimation of a mathematical model of the scattering intensity that contains information about the structure of the specimen. This selection process is critical as it involves assumptions about the structure of the specimen.
We conducted numerical experiments to assess the effectiveness of our proposed method. These experiments are based on synthetic data used to estimate the number of distinct components in a specimen, which was modeled as a mixture of monodisperse spheres of varying radii, scattering length densities and volume fractions. The results demonstrate the high accuracy, interpretability and stability of our method, even in the presence of measurement noise. To discuss the utility of the proposed method, we compare our approach with traditional model selection methods based on the reduced χ -squared error.
In this section, we present a detailed formulation of our algorithm for selecting mathematical models for SAS specimens using Bayesian model selection. The pseudocode for this algorithm is provided in Algorithm 1.
2.1. Bayesian model selection
The likelihood is thus expressed as
Let φ ( K ) be the prior distribution of the parameter K that characterizes the model, and φ ( Ξ | K ) be the prior distribution of the model parameters Ξ . Then, from Bayes' theorem, the posterior distribution of the parameters given the measurement data can be written as
2.2. Calculation of marginal likelihood
Sampling from the joint probability distribution at each inverse temperature gives
2.3. Estimation of model parameters
In this paper, we consider isotropic scattering and focus on the scattering vector's magnitude q , defined as
Monodisperse spheres are spherical particles of uniform radius. The scattering intensity I ( q , ξ ) of a specimen composed of sufficiently dilute monodisperse spheres of a single type for the scattering vector magnitude q is given by
To formulate the scattering intensity of a specimen composed of K types of monodisperse sphere, we assume a dilute system and denote the particle size of the k th component in the sample as R k and the scale as S k . The scattering intensity of a sample composed of K types of monodisperse sphere is then given by
| An illustration of a mixture of two types of spherical specimen. This shows scenarios with two components ( = 2), including mixtures of spherical particles of different sizes or volume fractions, and aggregates from a single particle type approximated as a large sphere. |
The numerical experiments reported in this section were conducted with a burn-in period of 10 5 and a sample size of 10 5 for the REMC. We set the number of replicas for REMC, the values of inverse temperature and the step size of the Metropolis method taking into consideration the state exchange rate and the acceptance rate.
4.1. Generation of synthetic data
(i) Set the number of data points to N = 400 and define the scattering vector magnitudes at N equally spaced points within the interval [0.1, 3] to obtain { q i } i =1 N =400 (nm −1 ).
In this section, we consider cases with pseudo-measurement times of T = 1 and T = 0.1. Generally, smaller values of T indicate greater effects from measurement noise.
4.2. Setting the prior distributions
In the Bayesian model selection framework, prior knowledge concerning the parameters Ξ and the model-characterizing parameter K is set as their prior distributions.
In this numerical experiment, the prior distributions for the parameters Ξ were set as Gamma distributions based on the pseudo-measurement time T used during data generation, while the prior for K was a discrete uniform distribution over the interval [1, 4].
| Plots of the prior distributions for various parameters. ( ) Prior distribution of , φ( ). ( ) Prior distribution of ) Prior distribution of , φ( ). ( ) Prior distribution of , φ( ). |
4.3. Results for two-component monodisperse spheres based on scale ratio
The ratio of the scale parameters S 1 and S 2 for spheres 1 and 2 during data generation, denoted r S , is defined as
Parameter values used for data generation with varying | | Sphere 1 | Sphere 2 | Radius (nm) | 2 | 10 | Scale | 250 | {250, 100, 20, 0.5, 0.1, 0.05} | Background (cm ) | 0.01 | Pseudo-measurement time | {1, 0.1} | | | Fitting to synthetic data generated at various values and residual plots. Panels and show cases for pseudo-measurement times of = 1 and = 0.1, respectively. In plots ( )–( ) and ( )–( ), the scale ratio is displayed in descending order for = 1 and = 0.1, respectively. Black circles represent the generated data and the black dotted lines indicate the true scattering intensity curves. For models = 1, = 2, = 3 and = 4, the fitting curves and residual plots are represented by blue dashed–dotted lines, red dashed lines, orange solid lines and green dotted lines, respectively. Fitting curves were plotted using 1000 parameter samples that were randomly selected from the posterior probability distributions for each model. The width of the distribution of these fitting curves reflects the confidence level at each point. | | Results of Bayesian model selection among models = 1–4 for varying values. Panel shows the posterior probability for each model using data generated with a pseudo-measurement time of = 1, and panel shows results for = 0.1. In cases ( )–( ) and ( )–( ), the scale ratio is displayed in descending order for = 1 and = 0.1, respectively. The height of each bar corresponds to the average values calculated for ten data sets generated with different random seeds, with maximum and minimum values shown as error bars. Areas highlighted in red indicate cases where, on average, the highest probability was found for the true model with = 2, while blue backgrounds indicate that models other than = 2 were associated with the highest probability on average. | The number of times each model was associated with the highest probability in numerical experiments for ten data sets generated with different random seeds at each value | | | | 1 | 2 | 3 | 4 | ( ) 1.0 | 0 | | 0 | 0 | ( ) 0.4 | 0 | | 0 | 0 | ( ) 0.08 | 0 | | 0 | 0 | ( ) 0.002 | 0 | | 0 | 0 | ( ) 0.0004 | 0 | | 0 | 0 | ( ) 0.0002 | | 2 | 0 | 0 | | | | | 1 | 2 | 3 | 4 | ( ) 1.0 | 0 | | 0 | 0 | ( ) 0.4 | 0 | | 0 | 0 | ( ) 0.08 | 0 | | 0 | 0 | ( ) 0.002 | 0 | | 0 | 0 | ( ) 0.0004 | | 1 | 0 | 0 | ( ) 0.0002 | | 0 | 0 | 0 | | 4.4. Results for two-component monodisperse spheres based on radius ratioDuring synthetic data generation, the ratio of the radii R 1 and R 2 of spheres 1 and 2, denoted r R , was defined as In this setup, we generated seven types of data by varying the value of r R for pseudo-measurement times of T = 1 and T = 0.1. Parameter values used for data generation when varying | | Sphere 1 | Sphere 2 | Radius (nm) | {9.9, 9.7, 9.5, 0.5, 0.5, 0.4, 0.3} | 10 | Scale | 250 | 100 | Background (cm ) | 0.01 | | Pseudo-measurement time | {1, 0.1} | | | | Fitting to synthetic data generated at various values and residual plots. Panels and show cases for pseudo-measurement times of = 1 and = 0.1, respectively. In plots ( )–( ) and ( )–( ), the radius ratio is displayed in descending order for = 1 and = 0.1, respectively. Black circles represent the generated data and the black dotted lines indicate the true scattering intensity curves. For models = 1, = 2, = 3 and = 4, the fitting curves and residual plots are represented by blue dashed–dotted lines, red dashed lines, orange solid lines and green dotted lines, respectively. Fitting curves were plotted using 1000 parameter samples that were randomly selected from the posterior probability distributions for each model. The width of the distribution of these fitting curves reflects the confidence level at each point. | | Results of Bayesian model selection among models = 1–4 for varying values. Panel shows the posterior probability of each model using data generated with a pseudo-measurement time of = 1, and panel shows results for = 0.1. In cases ( )–( ) and ( )–( ), the radius ratio is displayed in descending order for = 1 and = 0.1, respectively. The height of each bar corresponds to the average values calculated for ten data sets generated with different random seeds, with the maximum and minimum values shown as error bars. Areas highlighted in red indicate cases where the true model = 2 was most highly supported, while the blue backgrounds indicate that the likelihood of a model other than = 2 was the highest. | The number of times each model was most highly supported in numerical experiments for ten data sets generated by varying values | | | | 1 | 2 | 3 | 4 | ( ) 0.99 | | 1 | 0 | 0 | ( ) 0.97 | 0 | | 0 | 0 | ( ) 0.95 | 0 | | 0 | 0 | ( ) 0.5 | 0 | | 0 | 0 | ( ) 0.05 | 0 | | 0 | 0 | ( ) 0.04 | 1 | | 0 | 0 | ( ) 0.03 | | 0 | 0 | 0 | | | | | 1 | 2 | 3 | 4 | ( ) 0.99 | | 0 | 0 | 0 | ( ) 0.97 | 2 | | 0 | 0 | ( ) 0.95 | 0 | | 0 | 0 | ( ) 0.5 | 0 | | 0 | 0 | ( ) 0.05 | 1 | | 0 | 0 | ( ) 0.04 | | 3 | 0 | 0 | ( ) 0.03 | | 0 | 0 | 0 | | 5.1. Limitations of the proposed method5.2. model selection based on χ -squared error. In SAS data analysis, selecting an appropriate mathematical model for the analysis is a crucial but challenging process. In this subsection, we compare the conventional model selection method based on the χ -squared error with the results of model selection using our proposed method. | The fitting results and residual plots for the data shown in Fig. 3 ( ) were derived using parameters that minimize the χ-squared error from the posterior probability distributions for models ranging from = 1 to = 4. For each of these models, the fitting curves and their corresponding residual plots are represented by blue dashed–dotted lines, red dashed lines, orange solid lines and green dotted lines, respectively. The legend indicates the reduced χ-squared values for each model ( = 1 to = 4). | Model selection results based on reduced χ-squared values | -squared value to 1 for ten data sets generated with different random seeds for each setting = 1. Labels ( ) to ( ) refer to the settings in Figs. 3–4 and Table 2. The cases with the highest level of support for each data set are shown in bold. | | | | 1 | 2 | 3 | 4 | ( ) 1.0 | 0 | 2 | | 0\sim | ( ) 0.4 | 0 | 0 | | 1 | ( ) 0.08 | 0 | 0 | | 1 | ( ) 0.002 | 0 | 0 | | 0 | ( ) 0.0004 | 0 | 4 | | 1 | ( ) 0.0002 | 0 | 2 | | 0 | | In this paper, we have introduced a Bayesian model selection framework for SAS data analysis that quantitatively evaluates model validity through posterior probabilities. We have conducted numerical experiments using synthetic data for a two-component system of monodisperse spheres to assess the performance of the proposed method. We have identified the analytical limits of the proposed method, under the settings of this study, with respect to the scale and radius ratios of two-component spherical particles, and compared the performance of traditional model selection methods based on the reduced χ -squared. The numerical experiments and subsequent discussion reveal the range of parameters that can be analyzed using the proposed method. Within that range, our method provides stable and highly accurate model selection, even for data with significant noise or in situations in which qualitative model determination is challenging. In comparison with the traditional method of selecting models based on fitting curves and data residuals, it was found that the proposed method offers greater accuracy and stability. SAS is used to study specimens with a variety of structures other than spheres, including cylinders, core–shell structures, lamellae and more. The proposed method should be applied to other sample models to determine the feasibility of expanding the analysis beyond the case examined here to broader experimental settings. Future work could benefit from using the proposed method to conduct real data analysis, which is expected to yield new insights through our more efficient analysis approach. Funding informationThis work was supported by JST CREST (grant Nos. PMJCR1761 and JPMJCR1861) from the Japan Science and Technology Agency (JST) and by a JSPS KAKENHI Grant-in-Aid for Scientific Research (A) (grant No. 23H00486). This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence , which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited. Follow J. Appl. Cryst. | ![](//omraadeinfo.online/777/templates/cheerup1/res/banner1.gif) |
| | | | |
COMMENTS
O'Reilly M, Parker N (2013) Unsatisfactory saturation: A critical exploration of the notion of saturated sample sizes in qualitative research. Qualitative Research 13(2): 190-197. Crossref. ISI. Google Scholar. Patton MQ (2002) Two decades of developments in qualitative inquiry: A personal, experiential perspective.
Sample sizes in qualitative research are guided by data adequacy, so an effective sample size is less about numbers (n's) and more about the ability of data to provide a rich and nuanced account of the phenomenon studied. Ultimately, determining and justifying sample sizes for qualitative research cannot be detached from the study ...
The usually small sample size in qualitative research depends on the information richness of the data, the variety of participants (or other units), the broadness of the research question and the phenomenon, the data collection method (e.g., individual or group interviews) and the type of sampling strategy.
finds a variation of the sample size from 1 to 95 (averages being of 31 in the first ca se and 28 in the. second). The research region - one of t he cultural factors, plays a significant role in ...
The sample size for a study needs to be estimated at the time the study is proposed; too large a sample is unnecessary and unethical, and too small a sample is unscientific and also unethical. The necessary sample size can be calculated, using statistical software, based on certain assumptions. If no assumptions can be made, then an arbitrary ...
The prevailing concept for sample size in qualitative studies is "saturation." Saturation is closely tied to a specific methodology, and the term is inconsistently applied. We propose the concept "information power" to guide adequate sample size for qualitative studies.
These results provide strong empirical guidance on effective sample sizes for qualitative research, which can be used in conjunction with the characteristics of individual studies to estimate an appropriate sample size prior to data collection. This synthesis also provides an important resource for researchers, academic journals, journal ...
Sample size in qualitative research is always mentioned by reviewers of qualitative papers but discussion tends to be simplistic and relatively uninformed. The current paper draws attention to how sample sizes, at both ends of the size continuum, can be justified by researchers. This will also aid reviewers in their making of comments about the ...
Sample size in qualitative research has been the subject of enduring discussions [4, 10, 11]. Whilst the quantitative research community has established relatively straightforward statistics-based rules to set sample sizes precisely, the intricacies of qualitative sample size determination and assessment arise from the methodological ...
There are several debates concerning what sample size is the right size for such endeavors. Most scholars argue that the concept of saturation is the most important factor to think about when mulling over sample size decisions in qualitative research (Mason, 2010).Saturation is defined by many as the point at which the data collection process no longer offers any new or relevant data.
Determining adequate sample size in qualitative research is ultimately a matter of judgment and experience in evaluating the quality of the information collected against the uses to which it will be put, the particular research method and purposeful sampling strategy employed, and the research product intended. ©1995 John Wiley & Sons, Inc. ...
Purpose: Qualitative researchers have been criticised for not justifying sample size decisions in their research. This short paper addresses the issue of which sample sizes are appropriate and valid within different approaches to qualitative research. Design/methodology/approach: The sparse literature on sample sizes in qualitative research is reviewed and discussed. This examination is ...
I explore the sample size in qualitative research that is required to reach theoretical saturation. I conceptualize a population as consisting of sub-populations that contain different types of information sources that hold a number of codes. Theoretical saturation is reached after all the codes in the population have been observed once in the sample. I delineate three different scenarios to ...
The burden of offering adequate sample sizes in research has been one of. the major criticisms against qualitative s tudies. One of the most acceptable standards in qualitative research is to ...
Sample Size. A common misconception about sampling in qualitative research is that numbers are unimportant in ensuring the adequacy of a sampling strategy. Yet, simple sizes may be too small to support claims of having achieved either informational redundancy or theoretical saturation, or too large to permit the ….
Consider: Planning the journey of your qualitative research. 1. Have different research methods for different stages of your research journey. 2. Be open to new methods of collecting data and information. 3. Break up your larger sample into smaller groups depending on how they answer or score in preliminary research activities.
There is seldom a simple answer to the question of sample or cell size in qualitative research. There is no single formula or criterion to use. A "gold standard" that will calculate the number of people to interview is lacking (cf. Morse 1994). The question of sample size cannot be determined by prior knowledge of effect sizes, numbers of ...
Sample adequacy in qualitative inquiry pertains to the appropriateness of the sample composition and size.It is an important consideration in evaluations of the quality and trustworthiness of much qualitative research [] and is implicated - particularly for research that is situated within a post-positivist tradition and retains a degree of commitment to realist ontological premises - in ...
Sample size estimation in qualitative research: Conclusions 1) Specific approaches can be used to estimate sample size in qualitative research, e.g. to assess concept saturation. -These need to be considered alongside other issues, and may also only be able to be applied once data have been collected.
To bring this one home, let's answer the question we sought out to investigate: the sample size in qualitative research. Typically, sample sizes will range from 6-20, per segment. (So if you have 5 segments, 6 is your multiplier for the total number you'll need, so you would have a total sample size of 30.) For very specific tasks, such as ...
The prevailing concept for sample size in qualitative studies is "saturation." Saturation is closely tied to a specific methodology, and the term is inconsistently applied. We propose the concept "information power" to guide adequate sample size for qualitative studies. Information power indicates that the more information the sample holds ...
A formula for determining qualitative sample size. In 2013, Research by Design published a whitepaper by Donna Bonde which included research-backed guidelines for qualitative sampling in a market research context. Victor Yocco, writing in 2017, drew on these guidelines to create a formula determining qualitative sample sizes.
What is a good sample size for a qualitative research study? Our sample size calculator will work out the answer based on your project's scope, participant characteristics, researcher expertise, and methodology. Just answer 4 quick questions to get a super actionable, data-backed recommendation for your next study.
E.g. to get a sample of 100 out of 1,000, you would select every 10th person. Add your sampling interval until you have the desired sample. Continue choosing your sample members at regular intervals until you have the sample size you need to complete your study. Systematic random sampling use cases and examples
Virtual Patients (VPs) have been shown to improve various aspects of medical learning, however, research has scarcely delved into the specific factors that facilitate the knowledge gain and transfer of knowledge from the classroom to real-world applications. This exploratory study aims to understand the impact of integrating VPs into classroom learning on students' perceptions of knowledge ...
Traditional model selection methods either rely on qualitative approaches or are prone to overfitting. ... The numerical experiments reported in this section were conducted with a burn-in period of 10 5 and a sample size of 10 5 for the REMC. We set the number of replicas for REMC, the values of inverse temperature and the step size of the ...