Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Reliability vs Validity in Research | Differences, Types & Examples

Reliability vs Validity in Research | Differences, Types & Examples

Published on 3 May 2022 by Fiona Middleton . Revised on 10 October 2022.

Reliability and validity are concepts used to evaluate the quality of research. They indicate how well a method , technique, or test measures something. Reliability is about the consistency of a measure, and validity is about the accuracy of a measure.

It’s important to consider reliability and validity when you are creating your research design , planning your methods, and writing up your results, especially in quantitative research .

Table of contents

Understanding reliability vs validity, how are reliability and validity assessed, how to ensure validity and reliability in your research, where to write about reliability and validity in a thesis.

Reliability and validity are closely related, but they mean different things. A measurement can be reliable without being valid. However, if a measurement is valid, it is usually also reliable.

What is reliability?

Reliability refers to how consistently a method measures something. If the same result can be consistently achieved by using the same methods under the same circumstances, the measurement is considered reliable.

What is validity?

Validity refers to how accurately a method measures what it is intended to measure. If research has high validity, that means it produces results that correspond to real properties, characteristics, and variations in the physical or social world.

High reliability is one indicator that a measurement is valid. If a method is not reliable, it probably isn’t valid.

However, reliability on its own is not enough to ensure validity. Even if a test is reliable, it may not accurately reflect the real situation.

Validity is harder to assess than reliability, but it is even more important. To obtain useful results, the methods you use to collect your data must be valid: the research must be measuring what it claims to measure. This ensures that your discussion of the data and the conclusions you draw are also valid.

Prevent plagiarism, run a free check.

Reliability can be estimated by comparing different versions of the same measurement. Validity is harder to assess, but it can be estimated by comparing the results to other relevant data or theory. Methods of estimating reliability and validity are usually split up into different types.

Types of reliability

Different types of reliability can be estimated through various statistical methods.

Types of validity

The validity of a measurement can be estimated based on three main types of evidence. Each type can be evaluated through expert judgement or statistical methods.

To assess the validity of a cause-and-effect relationship, you also need to consider internal validity (the design of the experiment ) and external validity (the generalisability of the results).

The reliability and validity of your results depends on creating a strong research design , choosing appropriate methods and samples, and conducting the research carefully and consistently.

Ensuring validity

If you use scores or ratings to measure variations in something (such as psychological traits, levels of ability, or physical properties), it’s important that your results reflect the real variations as accurately as possible. Validity should be considered in the very earliest stages of your research, when you decide how you will collect your data .

  • Choose appropriate methods of measurement

Ensure that your method and measurement technique are of high quality and targeted to measure exactly what you want to know. They should be thoroughly researched and based on existing knowledge.

For example, to collect data on a personality trait, you could use a standardised questionnaire that is considered reliable and valid. If you develop your own questionnaire, it should be based on established theory or the findings of previous studies, and the questions should be carefully and precisely worded.

  • Use appropriate sampling methods to select your subjects

To produce valid generalisable results, clearly define the population you are researching (e.g., people from a specific age range, geographical location, or profession). Ensure that you have enough participants and that they are representative of the population.

Ensuring reliability

Reliability should be considered throughout the data collection process. When you use a tool or technique to collect data, it’s important that the results are precise, stable, and reproducible.

  • Apply your methods consistently

Plan your method carefully to make sure you carry out the same steps in the same way for each measurement. This is especially important if multiple researchers are involved.

For example, if you are conducting interviews or observations, clearly define how specific behaviours or responses will be counted, and make sure questions are phrased the same way each time.

  • Standardise the conditions of your research

When you collect your data, keep the circumstances as consistent as possible to reduce the influence of external factors that might create variation in the results.

For example, in an experimental setup, make sure all participants are given the same information and tested under the same conditions.

It’s appropriate to discuss reliability and validity in various sections of your thesis or dissertation or research paper. Showing that you have taken them into account in planning your research and interpreting the results makes your work more credible and trustworthy.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Middleton, F. (2022, October 10). Reliability vs Validity in Research | Differences, Types & Examples. Scribbr. Retrieved 20 May 2024, from https://www.scribbr.co.uk/research-methods/reliability-or-validity/

Is this article helpful?

Fiona Middleton

Fiona Middleton

Other students also liked, the 4 types of validity | types, definitions & examples, a quick guide to experimental design | 5 steps & examples, sampling methods | types, techniques, & examples.

Grad Coach

Validity & Reliability In Research

A Plain-Language Explanation (With Examples)

By: Derek Jansen (MBA) | Expert Reviewer: Kerryn Warren (PhD) | September 2023

Validity and reliability are two related but distinctly different concepts within research. Understanding what they are and how to achieve them is critically important to any research project. In this post, we’ll unpack these two concepts as simply as possible.

This post is based on our popular online course, Research Methodology Bootcamp . In the course, we unpack the basics of methodology  using straightfoward language and loads of examples. If you’re new to academic research, you definitely want to use this link to get 50% off the course (limited-time offer).

Overview: Validity & Reliability

  • The big picture
  • Validity 101
  • Reliability 101 
  • Key takeaways

First, The Basics…

First, let’s start with a big-picture view and then we can zoom in to the finer details.

Validity and reliability are two incredibly important concepts in research, especially within the social sciences. Both validity and reliability have to do with the measurement of variables and/or constructs – for example, job satisfaction, intelligence, productivity, etc. When undertaking research, you’ll often want to measure these types of constructs and variables and, at the simplest level, validity and reliability are about ensuring the quality and accuracy of those measurements .

As you can probably imagine, if your measurements aren’t accurate or there are quality issues at play when you’re collecting your data, your entire study will be at risk. Therefore, validity and reliability are very important concepts to understand (and to get right). So, let’s unpack each of them.

Free Webinar: Research Methodology 101

What Is Validity?

In simple terms, validity (also called “construct validity”) is all about whether a research instrument accurately measures what it’s supposed to measure .

For example, let’s say you have a set of Likert scales that are supposed to quantify someone’s level of overall job satisfaction. If this set of scales focused purely on only one dimension of job satisfaction, say pay satisfaction, this would not be a valid measurement, as it only captures one aspect of the multidimensional construct. In other words, pay satisfaction alone is only one contributing factor toward overall job satisfaction, and therefore it’s not a valid way to measure someone’s job satisfaction.

validity and reliability in research questionnaire

Oftentimes in quantitative studies, the way in which the researcher or survey designer interprets a question or statement can differ from how the study participants interpret it . Given that respondents don’t have the opportunity to ask clarifying questions when taking a survey, it’s easy for these sorts of misunderstandings to crop up. Naturally, if the respondents are interpreting the question in the wrong way, the data they provide will be pretty useless . Therefore, ensuring that a study’s measurement instruments are valid – in other words, that they are measuring what they intend to measure – is incredibly important.

There are various types of validity and we’re not going to go down that rabbit hole in this post, but it’s worth quickly highlighting the importance of making sure that your research instrument is tightly aligned with the theoretical construct you’re trying to measure .  In other words, you need to pay careful attention to how the key theories within your study define the thing you’re trying to measure – and then make sure that your survey presents it in the same way.

For example, sticking with the “job satisfaction” construct we looked at earlier, you’d need to clearly define what you mean by job satisfaction within your study (and this definition would of course need to be underpinned by the relevant theory). You’d then need to make sure that your chosen definition is reflected in the types of questions or scales you’re using in your survey . Simply put, you need to make sure that your survey respondents are perceiving your key constructs in the same way you are. Or, even if they’re not, that your measurement instrument is capturing the necessary information that reflects your definition of the construct at hand.

If all of this talk about constructs sounds a bit fluffy, be sure to check out Research Methodology Bootcamp , which will provide you with a rock-solid foundational understanding of all things methodology-related. Remember, you can take advantage of our 60% discount offer using this link.

Need a helping hand?

validity and reliability in research questionnaire

What Is Reliability?

As with validity, reliability is an attribute of a measurement instrument – for example, a survey, a weight scale or even a blood pressure monitor. But while validity is concerned with whether the instrument is measuring the “thing” it’s supposed to be measuring, reliability is concerned with consistency and stability . In other words, reliability reflects the degree to which a measurement instrument produces consistent results when applied repeatedly to the same phenomenon , under the same conditions .

As you can probably imagine, a measurement instrument that achieves a high level of consistency is naturally more dependable (or reliable) than one that doesn’t – in other words, it can be trusted to provide consistent measurements . And that, of course, is what you want when undertaking empirical research. If you think about it within a more domestic context, just imagine if you found that your bathroom scale gave you a different number every time you hopped on and off of it – you wouldn’t feel too confident in its ability to measure the variable that is your body weight 🙂

It’s worth mentioning that reliability also extends to the person using the measurement instrument . For example, if two researchers use the same instrument (let’s say a measuring tape) and they get different measurements, there’s likely an issue in terms of how one (or both) of them are using the measuring tape. So, when you think about reliability, consider both the instrument and the researcher as part of the equation.

As with validity, there are various types of reliability and various tests that can be used to assess the reliability of an instrument. A popular one that you’ll likely come across for survey instruments is Cronbach’s alpha , which is a statistical measure that quantifies the degree to which items within an instrument (for example, a set of Likert scales) measure the same underlying construct . In other words, Cronbach’s alpha indicates how closely related the items are and whether they consistently capture the same concept . 

Reliability reflects whether an instrument produces consistent results when applied to the same phenomenon, under the same conditions.

Recap: Key Takeaways

Alright, let’s quickly recap to cement your understanding of validity and reliability:

  • Validity is concerned with whether an instrument (e.g., a set of Likert scales) is measuring what it’s supposed to measure
  • Reliability is concerned with whether that measurement is consistent and stable when measuring the same phenomenon under the same conditions.

In short, validity and reliability are both essential to ensuring that your data collection efforts deliver high-quality, accurate data that help you answer your research questions . So, be sure to always pay careful attention to the validity and reliability of your measurement instruments when collecting and analysing data. As the adage goes, “rubbish in, rubbish out” – make sure that your data inputs are rock-solid.

Literature Review Course

Psst… there’s more!

This post is an extract from our bestselling short course, Methodology Bootcamp . If you want to work smart, you don't want to miss this .

You Might Also Like:

Research aims, research objectives and research questions

THE MATERIAL IS WONDERFUL AND BENEFICIAL TO ALL STUDENTS.

THE MATERIAL IS WONDERFUL AND BENEFICIAL TO ALL STUDENTS AND I HAVE GREATLY BENEFITED FROM THE CONTENT.

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly
  • Sign Up Now
  • -- Navigate To -- CR Dashboard Connect for Researchers Connect for Participants
  • Log In Log Out Log In
  • Recent Press
  • Papers Citing Connect
  • Connect for Participants
  • Connect for Researchers
  • Connect AI Training
  • Managed Research
  • Prime Panels
  • MTurk Toolkit
  • Health & Medicine
  • Conferences
  • Knowledge Base

What Are Survey Validity and Reliability?

Flat Earth

Let’s start by agreeing that it isn’t always easy to measure people’s attitudes, thoughts, and feelings. People are complex. They may not always want to divulge what they really think or they may not be able to accurately report what they think. Nevertheless, behavioral scientists persist, using surveys, experiments, and observations to learn why people do what they do.   

At the heart of good research methods are two concepts known as survey validity and reliability. Digging into these concepts can get a bit wonky. However, understanding validity and reliability is important for both the people who conduct and consume research. Thus, we lay out the details of both constructs in this blog.

What is Survey Validity?

Validity refers to how reasonable, accurate, and justifiable a claim, conclusion, or decision is. Within the context of survey research, validity is the answer to the question: does this research show what it claims to show? There are four types of validity within survey research.

Four Types of Survey Validity

  • Statistical validity

Statistical validity is an assessment of how well the numbers in a study support the claims being made. Suppose a survey says 25% of people believe the Earth is flat. An assessment of statistical validity asks whether that 25% is based on a sample of 12 or 12,000.

There is no one way to evaluate claims of statistical validity. For a survey or poll, judgments of statistical validity may entail looking at the margin of error. For studies that examine the association between multiple variables or conduct an experiment, judgments of statistical validity may entail examining the study’s effect size or statistical significance . Regardless of the particulars of the study, statistical validity is concerned with whether what the research claims is supported by the data.

  • Construct validity

Construct validity is an assessment of how well a research team has measured or manipulated the variable(s) in their study. Assessments of construct validity can range from a subjective judgment about whether questions look like they measure what they’re supposed to measure to a mathematical assessment of how well different questions or measures are related to each other.

  • Face validity – Do the items used in a study look like they measure what they’re supposed to? That’s the type of judgment researchers make when assessing face validity. There’s no fancy math, just a judgment about whether things look right on the surface. 

Face validity is sometimes assessed by experts. In the case of a survey instrument to measure beliefs about whether the earth is flat, a researcher may want to show the initial version of the instrument to an expert on the flat earth theory to get their feedback as to whether the items look right.

  • Content validity – Content validity is a judgment about whether your survey instrument captures all the relevant components of what you’re trying to measure.  

For example, suppose we wrote 10 items to measure flat-Earth beliefs. An assessment of content validity would judge how well these questions cover different conceptual components of the flat-Earth conspiracy. 

Obviously, the scale would need to include items measuring people’s beliefs about the shape of the Earth (e.g., do you believe the Earth is flat?). But given how much flat-Earth beliefs contradict basic science and information from official channels like NASA, we might also include questions that measure trust in science (e.g., The scientific method usually leads to accurate conclusions) and government institutions (e.g., Most of what NASA says about the shape of the Earth is false). 

Content validity is one of the most important aspects of validity, and it largely depends on one’s theory about the construct. For example, if one’s theory of intelligence includes creativity as a component (creativity is part of the ‘content’ of intelligence) a test cannot be valid if it does not measure creativity. Many theoretical disagreements about measurement center around content validity. 

  • Criterion validity – Unlike face validity and content validity, criterion validity is a more objective measure of whether an item or scale measures what it is supposed to measure. 

To establish criterion validity researchers may look to see if their instrument predicts a concrete, real world-behavior. In our flat-Earth example, we might assess whether people who score high in flat-Earth beliefs spend more time watching flat-Earth videos on YouTube or attend flat-Earth events. If people who score high on the measure also tend to engage in behaviors associated with flat-Earth beliefs, we have evidence of criterion validity.

  • External validity

Almost all research relies on sampling . Because researchers do not have the time and resources to talk to everyone they are interested in studying, they often rely on a sample of people to make inferences about a larger population. 

External validity is concerned with assessing how well the findings from a single study apply to people, settings, and circumstances not included in the study. In other words, external validity is concerned with how well the results from a study generalize to other people, places, and situations.

Perhaps the easiest way to think about external validity is with polling. Opinion polls ask a sample of people what they think about a policy, topic, or political candidate at a particular moment. An assessment of external validity considers how the sample was gathered and whether it is likely that people in the sample represent people in the population who did not participate in the research. With some types of research such as polling, external validity is always a concern .    

  • Internal validity (for experiments)

Finally, a fourth type of validity that only applies to experiments or A/B tests is internal validity. Internal validity assesses whether the research team has designed and carried out their work in a way that allows you to have confidence that the results of their study are due only to the manipulated (i.e. independent) variables. 

What is Survey Reliability? 

Everyone knows what it means for something to be reliable. Reliable things are dependable and consistent. Survey reliability means the same thing. When assessing reliability, researchers want to know whether the measures they use produce consistent and dependable results.

Imagine you’re interested in measuring whether people believe in the flat-Earth conspiracy theory. According to some polling, as many as 1 in 6 U.S. adults are unsure if the Earth is round. 

validity and reliability in research questionnaire

If beliefs about the roundness of the Earth are the construct we’re interested in measuring, we have to decide how to operationalize , or measure, that construct. Often, researchers operationalize a construct with a survey instrument—questions intended to measure a belief or attitude. At other times, a construct can be operationalized by observing behavior or people’s verbal or written descriptions of a topic.

Whichever way a construct is operationalized, researchers need to know whether their measures are reliable, and reliability is often assessed in three different ways. 

3 Ways to Assess Survey Reliability

  • Test-retest reliability

If I asked 1,000 people today if they believe the Earth is round and asked the same questions next week or next month, would the results be similar? If so, then we would say the questions have high test-retest reliability. Questions that produce different results each time participants answer them have poor reliability and are not useful for research. 

  • Internal reliability

Internal reliability applies to measures with multiple self-report items. So, if we created a 10-item instrument to measure belief in a flat-Earth, an assessment of internal reliability would examine whether people who tend to agree with one item (e.g., the Earth is flat) also agree with other items in the scale (e.g., images from space showing the Earth as round are fake).   

  • Interrater reliability

Sometimes, researchers collect data that requires judgment about participants’ responses. Imagine, for example, observing people’s behavior within an internet chat room devoted to the flat-Earth conspiracy. One way to measure belief in a flat-Earth would be to make judgments about how much each person’s postings indicate their belief that the Earth is flat. 

Interrater reliability is an assessment of how well the judgments of two or more different raters agree with one another. So, if one coder believes that a participant’s written response indicates a strong belief in a flat-Earth, how likely is another person to independently agree.   

Measuring Survey Reliability and Validity: Putting Things Together

The information above is technical. So, how do people evaluate reliability and validity in the real world? Do they work through a checklist of the concepts above? Not really. 

When evaluating research, judgments of reliability and validity are often based on a mixture of information provided by the research team and critical evaluation by the consumer. Take, for example, the polling question about flat-Earth beliefs at the beginning.

The data suggesting that as many as 1 in 6 U.S. adults are unsure about the shape of the Earth was released by a prominent polling organization. In their press release, the organization claimed that just 84% of U.S. adults believe that the earth is round. But is that true?

To evaluate the validity of this claim we might inspect the questions that were asked (face validity), what the margin of error is and how many people participated in the poll (statistical validity), and where the participants came from and how they were sampled (external validity). 

In assessing these characteristics, we might ask whether we would get the same result with differently worded questions, whether there were enough people in the poll to feel confident about the margin of error, and whether another sample of adults would produce the same or different results.

Some forms of reliability and validity are harder to pin down than others. But without considering reliability and validity it is hard to evaluate whether any form of research really shows what it claims to show. 

Related Articles

How to conduct an online brand awareness survey.

Over the last decade or so, there has been a bit of a boom in the U.S beer industry. In 2010 there were about 1,800 breweries nationwide; today, there are...

How to Conduct an Online Pricing Survey

Imagine you’re on the beach with a friend. It’s hot and sunny. After a few hours you’re really thirsty, and just as you’re thinking ‘I could use a drink’ your...

SUBSCRIBE TO RECEIVE UPDATES

2024 grant application form, personal and institutional information.

  • Full Name * First Last
  • Position/Title *
  • Affiliated Academic Institution or Research Organization *

Detailed Research Proposal Questions

  • Project Title *
  • Research Category * - Antisemitism Islamophobia Both
  • Objectives *
  • Methodology (including who the targeted participants are) *
  • Expected Outcomes *
  • Significance of the Study *

Budget and Grant Tier Request

  • Requested Grant Tier * - $200 $500 $1000 Applicants requesting larger grants may still be eligible for smaller awards if the full amount requested is not granted.
  • Budget Justification *

Research Timeline

  • Projected Start Date * MM slash DD slash YYYY Preference will be given to projects that can commence soon, preferably before September 2024.
  • Estimated Completion Date * MM slash DD slash YYYY Preference will be given to projects that aim to complete within a year.
  • Project Timeline *
  • Phone This field is for validation purposes and should be left unchanged.

  • Name * First Name Last Name
  • I would like to request a demo of the Sentry platform
  • Comments This field is for validation purposes and should be left unchanged.
  • Name * First name Last name

  • Name * First Last
  • Name This field is for validation purposes and should be left unchanged.
  • Name * First and Last
  • Please select the best time to discuss your project goals/details to claim your free Sentry pilot for the next 60 days or to receive 10% off your first Managed Research study with Sentry.
  • Email This field is for validation purposes and should be left unchanged.

  • Email * Enter Email Confirm Email
  • Organization
  • Job Title *

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • Write for Us
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 18, Issue 3
  • Validity and reliability in quantitative studies
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • Roberta Heale 1 ,
  • Alison Twycross 2
  • 1 School of Nursing, Laurentian University , Sudbury, Ontario , Canada
  • 2 Faculty of Health and Social Care , London South Bank University , London , UK
  • Correspondence to : Dr Roberta Heale, School of Nursing, Laurentian University, Ramsey Lake Road, Sudbury, Ontario, Canada P3E2C6; rheale{at}laurentian.ca

https://doi.org/10.1136/eb-2015-102129

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Evidence-based practice includes, in part, implementation of the findings of well-conducted quality research studies. So being able to critique quantitative research is an important skill for nurses. Consideration must be given not only to the results of the study but also the rigour of the research. Rigour refers to the extent to which the researchers worked to enhance the quality of the studies. In quantitative research, this is achieved through measurement of the validity and reliability. 1

  • View inline

Types of validity

The first category is content validity . This category looks at whether the instrument adequately covers all the content that it should with respect to the variable. In other words, does the instrument cover the entire domain related to the variable, or construct it was designed to measure? In an undergraduate nursing course with instruction about public health, an examination with content validity would cover all the content in the course with greater emphasis on the topics that had received greater coverage or more depth. A subset of content validity is face validity , where experts are asked their opinion about whether an instrument measures the concept intended.

Construct validity refers to whether you can draw inferences about test scores related to the concept being studied. For example, if a person has a high score on a survey that measures anxiety, does this person truly have a high degree of anxiety? In another example, a test of knowledge of medications that requires dosage calculations may instead be testing maths knowledge.

There are three types of evidence that can be used to demonstrate a research instrument has construct validity:

Homogeneity—meaning that the instrument measures one construct.

Convergence—this occurs when the instrument measures concepts similar to that of other instruments. Although if there are no similar instruments available this will not be possible to do.

Theory evidence—this is evident when behaviour is similar to theoretical propositions of the construct measured in the instrument. For example, when an instrument measures anxiety, one would expect to see that participants who score high on the instrument for anxiety also demonstrate symptoms of anxiety in their day-to-day lives. 2

The final measure of validity is criterion validity . A criterion is any other instrument that measures the same variable. Correlations can be conducted to determine the extent to which the different instruments measure the same variable. Criterion validity is measured in three ways:

Convergent validity—shows that an instrument is highly correlated with instruments measuring similar variables.

Divergent validity—shows that an instrument is poorly correlated to instruments that measure different variables. In this case, for example, there should be a low correlation between an instrument that measures motivation and one that measures self-efficacy.

Predictive validity—means that the instrument should have high correlations with future criterions. 2 For example, a score of high self-efficacy related to performing a task should predict the likelihood a participant completing the task.

Reliability

Reliability relates to the consistency of a measure. A participant completing an instrument meant to measure motivation should have approximately the same responses each time the test is completed. Although it is not possible to give an exact calculation of reliability, an estimate of reliability can be achieved through different measures. The three attributes of reliability are outlined in table 2 . How each attribute is tested for is described below.

Attributes of reliability

Homogeneity (internal consistency) is assessed using item-to-total correlation, split-half reliability, Kuder-Richardson coefficient and Cronbach's α. In split-half reliability, the results of a test, or instrument, are divided in half. Correlations are calculated comparing both halves. Strong correlations indicate high reliability, while weak correlations indicate the instrument may not be reliable. The Kuder-Richardson test is a more complicated version of the split-half test. In this process the average of all possible split half combinations is determined and a correlation between 0–1 is generated. This test is more accurate than the split-half test, but can only be completed on questions with two answers (eg, yes or no, 0 or 1). 3

Cronbach's α is the most commonly used test to determine the internal consistency of an instrument. In this test, the average of all correlations in every combination of split-halves is determined. Instruments with questions that have more than two responses can be used in this test. The Cronbach's α result is a number between 0 and 1. An acceptable reliability score is one that is 0.7 and higher. 1 , 3

Stability is tested using test–retest and parallel or alternate-form reliability testing. Test–retest reliability is assessed when an instrument is given to the same participants more than once under similar circumstances. A statistical comparison is made between participant's test scores for each of the times they have completed it. This provides an indication of the reliability of the instrument. Parallel-form reliability (or alternate-form reliability) is similar to test–retest reliability except that a different form of the original instrument is given to participants in subsequent tests. The domain, or concepts being tested are the same in both versions of the instrument but the wording of items is different. 2 For an instrument to demonstrate stability there should be a high correlation between the scores each time a participant completes the test. Generally speaking, a correlation coefficient of less than 0.3 signifies a weak correlation, 0.3–0.5 is moderate and greater than 0.5 is strong. 4

Equivalence is assessed through inter-rater reliability. This test includes a process for qualitatively determining the level of agreement between two or more observers. A good example of the process used in assessing inter-rater reliability is the scores of judges for a skating competition. The level of consistency across all judges in the scores given to skating participants is the measure of inter-rater reliability. An example in research is when researchers are asked to give a score for the relevancy of each item on an instrument. Consistency in their scores relates to the level of inter-rater reliability of the instrument.

Determining how rigorously the issues of reliability and validity have been addressed in a study is an essential component in the critique of research as well as influencing the decision about whether to implement of the study findings into nursing practice. In quantitative studies, rigour is determined through an evaluation of the validity and reliability of the tools or instruments utilised in the study. A good quality research study will provide evidence of how all these factors have been addressed. This will help you to assess the validity and reliability of the research and help you decide whether or not you should apply the findings in your area of clinical practice.

  • Lobiondo-Wood G ,
  • Shuttleworth M
  • ↵ Laerd Statistics . Determining the correlation coefficient . 2013 . https://statistics.laerd.com/premium/pc/pearson-correlation-in-spss-8.php

Twitter Follow Roberta Heale at @robertaheale and Alison Twycross at @alitwy

Competing interests None declared.

Read the full text or download the PDF:

  • How it works

researchprospect post subheader

Reliability and Validity – Definitions, Types & Examples

Published by Alvin Nicolas at August 16th, 2021 , Revised On October 26, 2023

A researcher must test the collected data before making any conclusion. Every  research design  needs to be concerned with reliability and validity to measure the quality of the research.

What is Reliability?

Reliability refers to the consistency of the measurement. Reliability shows how trustworthy is the score of the test. If the collected data shows the same results after being tested using various methods and sample groups, the information is reliable. If your method has reliability, the results will be valid.

Example: If you weigh yourself on a weighing scale throughout the day, you’ll get the same results. These are considered reliable results obtained through repeated measures.

Example: If a teacher conducts the same math test of students and repeats it next week with the same questions. If she gets the same score, then the reliability of the test is high.

What is the Validity?

Validity refers to the accuracy of the measurement. Validity shows how a specific test is suitable for a particular situation. If the results are accurate according to the researcher’s situation, explanation, and prediction, then the research is valid. 

If the method of measuring is accurate, then it’ll produce accurate results. If a method is reliable, then it’s valid. In contrast, if a method is not reliable, it’s not valid. 

Example:  Your weighing scale shows different results each time you weigh yourself within a day even after handling it carefully, and weighing before and after meals. Your weighing machine might be malfunctioning. It means your method had low reliability. Hence you are getting inaccurate or inconsistent results that are not valid.

Example:  Suppose a questionnaire is distributed among a group of people to check the quality of a skincare product and repeated the same questionnaire with many groups. If you get the same response from various participants, it means the validity of the questionnaire and product is high as it has high reliability.

Most of the time, validity is difficult to measure even though the process of measurement is reliable. It isn’t easy to interpret the real situation.

Example:  If the weighing scale shows the same result, let’s say 70 kg each time, even if your actual weight is 55 kg, then it means the weighing scale is malfunctioning. However, it was showing consistent results, but it cannot be considered as reliable. It means the method has low reliability.

Internal Vs. External Validity

One of the key features of randomised designs is that they have significantly high internal and external validity.

Internal validity  is the ability to draw a causal link between your treatment and the dependent variable of interest. It means the observed changes should be due to the experiment conducted, and any external factor should not influence the  variables .

Example: age, level, height, and grade.

External validity  is the ability to identify and generalise your study outcomes to the population at large. The relationship between the study’s situation and the situations outside the study is considered external validity.

Also, read about Inductive vs Deductive reasoning in this article.

Looking for reliable dissertation support?

We hear you.

  • Whether you want a full dissertation written or need help forming a dissertation proposal, we can help you with both.
  • Get different dissertation services at ResearchProspect and score amazing grades!

Threats to Interval Validity

Threats of external validity, how to assess reliability and validity.

Reliability can be measured by comparing the consistency of the procedure and its results. There are various methods to measure validity and reliability. Reliability can be measured through  various statistical methods  depending on the types of validity, as explained below:

Types of Reliability

Types of validity.

As we discussed above, the reliability of the measurement alone cannot determine its validity. Validity is difficult to be measured even if the method is reliable. The following type of tests is conducted for measuring validity. 

Does your Research Methodology Have the Following?

  • Great Research/Sources
  • Perfect Language
  • Accurate Sources

If not, we can help. Our panel of experts makes sure to keep the 3 pillars of Research Methodology strong.

Does your Research Methodology Have the Following?

How to Increase Reliability?

  • Use an appropriate questionnaire to measure the competency level.
  • Ensure a consistent environment for participants
  • Make the participants familiar with the criteria of assessment.
  • Train the participants appropriately.
  • Analyse the research items regularly to avoid poor performance.

How to Increase Validity?

Ensuring Validity is also not an easy job. A proper functioning method to ensure validity is given below:

  • The reactivity should be minimised at the first concern.
  • The Hawthorne effect should be reduced.
  • The respondents should be motivated.
  • The intervals between the pre-test and post-test should not be lengthy.
  • Dropout rates should be avoided.
  • The inter-rater reliability should be ensured.
  • Control and experimental groups should be matched with each other.

How to Implement Reliability and Validity in your Thesis?

According to the experts, it is helpful if to implement the concept of reliability and Validity. Especially, in the thesis and the dissertation, these concepts are adopted much. The method for implementation given below:

Frequently Asked Questions

What is reliability and validity in research.

Reliability in research refers to the consistency and stability of measurements or findings. Validity relates to the accuracy and truthfulness of results, measuring what the study intends to. Both are crucial for trustworthy and credible research outcomes.

What is validity?

Validity in research refers to the extent to which a study accurately measures what it intends to measure. It ensures that the results are truly representative of the phenomena under investigation. Without validity, research findings may be irrelevant, misleading, or incorrect, limiting their applicability and credibility.

What is reliability?

Reliability in research refers to the consistency and stability of measurements over time. If a study is reliable, repeating the experiment or test under the same conditions should produce similar results. Without reliability, findings become unpredictable and lack dependability, potentially undermining the study’s credibility and generalisability.

What is reliability in psychology?

In psychology, reliability refers to the consistency of a measurement tool or test. A reliable psychological assessment produces stable and consistent results across different times, situations, or raters. It ensures that an instrument’s scores are not due to random error, making the findings dependable and reproducible in similar conditions.

What is test retest reliability?

Test-retest reliability assesses the consistency of measurements taken by a test over time. It involves administering the same test to the same participants at two different points in time and comparing the results. A high correlation between the scores indicates that the test produces stable and consistent results over time.

How to improve reliability of an experiment?

  • Standardise procedures and instructions.
  • Use consistent and precise measurement tools.
  • Train observers or raters to reduce subjective judgments.
  • Increase sample size to reduce random errors.
  • Conduct pilot studies to refine methods.
  • Repeat measurements or use multiple methods.
  • Address potential sources of variability.

What is the difference between reliability and validity?

Reliability refers to the consistency and repeatability of measurements, ensuring results are stable over time. Validity indicates how well an instrument measures what it’s intended to measure, ensuring accuracy and relevance. While a test can be reliable without being valid, a valid test must inherently be reliable. Both are essential for credible research.

Are interviews reliable and valid?

Interviews can be both reliable and valid, but they are susceptible to biases. The reliability and validity depend on the design, structure, and execution of the interview. Structured interviews with standardised questions improve reliability. Validity is enhanced when questions accurately capture the intended construct and when interviewer biases are minimised.

Are IQ tests valid and reliable?

IQ tests are generally considered reliable, producing consistent scores over time. Their validity, however, is a subject of debate. While they effectively measure certain cognitive skills, whether they capture the entirety of “intelligence” or predict success in all life areas is contested. Cultural bias and over-reliance on tests are also concerns.

Are questionnaires reliable and valid?

Questionnaires can be both reliable and valid if well-designed. Reliability is achieved when they produce consistent results over time or across similar populations. Validity is ensured when questions accurately measure the intended construct. However, factors like poorly phrased questions, respondent bias, and lack of standardisation can compromise their reliability and validity.

You May Also Like

Inductive and deductive reasoning takes into account assumptions and incidents. Here is all you need to know about inductive vs deductive reasoning.

This article provides the key advantages of primary research over secondary research so you can make an informed decision.

You can transcribe an interview by converting a conversation into a written format including question-answer recording sessions between two or more people.

USEFUL LINKS

LEARNING RESOURCES

researchprospect-reviews-trust-site

COMPANY DETAILS

Research-Prospect-Writing-Service

  • How It Works

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 20 May 2024

Validity and reliability of the Persian version of food preferences questionnaire (Persian-FPQ) in Iranian adolescents

  • Zahra Heidari 1 ,
  • Awat Feizi   ORCID: orcid.org/0000-0002-1930-0340 1 &
  • Fahimeh Haghighatdoost 2  

Scientific Reports volume  14 , Article number:  11493 ( 2024 ) Cite this article

Metrics details

  • Epidemiology
  • Health care
  • Risk factors

The assessment of dietary intakes and habits using reliable and youth-specific measurement tools during adolescence is essential. The aim of the present study was to culturally adapt and investigate the psychometric properties of the Persian version of the food preferences questionnaire (Persian-FPQ) among Iranian adolescents. This methodological cross-sectional study was conducted among 452 Persian-speaking adolescents, living in Isfahan, Iran. Translation of the FPQ was performed using forward–backward method. Intra Class Correlation (ICC) and Cronbach’s α were used to assess test–retest reliability and internal consistency, respectively. Construct validity was investigated by using exploratory factor analysis (EFA). Divergent validity was determined using correlation analysis with Kessler Psychological Distress Scale (K-10). Known-group validity was assessed based on differences in mean food preference score between boys and girls and different categories of body mass index (BMI). The internal and external reliabilities for the Persian-FPQ were in the range of good to excellent in all domains (Cronbach’s α: 0.76–0.96 and ICCs: 0.982–0.998). Boys had higher scores of food preferences than girls, indicating good known-group validity. Construct validity evaluated by EFA led to extraction of seven factors (“Vegetables”, “Fruit”, “Dairy”, “Snacks”, “Meat/Fish”, “Starches” and “Miscellaneous foods”), explaining 37.8% of the variance. Divergent validity revealed significant negative correlations between five sub-scales of the Persian-FPQ and psychological distress. The Persian version of the FPQ is a reliable and valid instrument with applicability in a broad range of the population of Persian-speaking adolescents for assessing food preferences in community-based research projects.

Introduction

Eating habits are related to the wide varieties of attitudes and behaviors such as food acceptance, food selection, food consumption, and food waste in eating area which they are considered as “conscious, collective and repetitive behaviors affect people acts regarding selection, consumption of specific foods or diets in the context of effective social and cultural factors in their society 1 . Adolescence is an important developmental life part where health behaviors are often constructed and stabilized as habits. Unhealthy eating habits including snacking on foods with high energy and low intakes of fruit and vegetables are particularly common adolescents' habitual diets. These dietary behaviors have important effects on both short- and long-life term of physical, physiological and psychological health conditions. Eating behaviors and habits developed during childhood and adolescence stage tend to continue until adulthood 2 . Food preferences are the qualitative evaluative attitudes that people tend toward foods and diets and also how much people like and dislike them. Quantitative food preference measurement has been a part of the field of food habits 3 . Food acceptance or palatability has been shown to be a major predictor of eating habits. Additionally, food preferences differ across ethnicities, in part because culture influences the range of foods to which young children are exposed. Several studies showed that the socioeconomic status, ethnicity and culture affect food choices and eating habits of children and their families or caregivers 4 , 5 ; similarly food preferences have been analysed in terms of number of demographic variables, including race, gender, geography, age, taste physiology, and many disease states.

Each region of the world has specific characteristics that play a role in dietary and other lifestyle habits of the population, which may change during the time span. It was shown that dietary habits have changed in the Eastern Mediterranean region during the past four decades; in which the intake of fat, particularly saturated fat, sweetened beverages, and free sugar, has dramatically increased, and concurrently the intake of fiber, fruits and vegetables decreased. This, in turn, leads to increasing risk factors for non-communicable diseases particularly cardiovascular disease, diabetes, obesity and cancer 6 . These changes are more pronounced in school-aged children in which children skip healthy habits such as breakfast eating and substitute it by many snacks and unhealthy foods 7 .

Recently in modern life, people are highly aware about the quality of food they consumed, and interrelationship of environmental factors and its health effects on food products in which this affects the consumers’ concerns on healthier lifestyle and environmental issues and finally on how they select and buy food products based on their perspectives and attitude toward food quality 8 . Recent changes in food and dietary habits in developing nations have attracted a special attention for up-to-date evaluation in food preference and diet behaviors particularly among adolescence. Such examinations should be initiated with adolescents as they are in a critical stage of developing in their own attitudes towards many habits and among them nutrition as a whole and experiencing independence in food choice during school hours 7 . The possible reasons for unhealthy eating habits are changes in lifestyle, industrialization, social determinants, and environment from childhood to adolescence 9 .

Accordingly, there is rigorous need to evaluate the patterns of eating habits available among adolescence. Most available food preference questionnaires evaluate adult populations eating behaviors such as mostly used original and modified Food Choice Questionnaire (FCQ) and some sex-specific tools 10 , 11 , while someone examine the youth habits 12 . Some other created instruments evaluate the specific preference for foods with high fat and caloric content 10 , 11 . Several studies on evaluating adolescents’ preference for food have been conducted based on qualitative methods 13 , 14 , 15 , 16 . For quantitative research, one appropriate tool is Steptoe’s Food Choice Questionnaire (FCQ) 13 . Several adjustments have been made to the scale to expand the factors or to accommodate a number of factors for a specific population other than the UK population among whom it was used originally 14 , 17 , 18 .

Previous efforts to develop valid and reliable food preference instruments for school aged children were not able to cover all factors associated with the child’s food preferences 7 , 19 . There is only one questionnaire for measuring food preference in adolescents without specific psychometric evaluations 20 . There is no instrument that measures willingness to consume specific food items that has been developed for use within adolescent population worldwide particularly the one available instrument also covers some food items that is applicable for specific geographic areas. Taste and food preferences as interrelated factors are important potential predictors or antecedents of dietary behaviours change and food intake 21 which may affects strongly human future health 22 . Concerning adolescents unhealthy dietary behaviours that may be a leading cause of adolescent obesity, which is estimated to reach 1 billion by 2025 and increasing other non-communicable diseases and the adolescence age is a critical phase of development, transitioning between childhood and adulthood 23 , a better understanding of how taste factors influence food consumption and dietary habits in this population would aid in the design of dietary strategies for health promotion. Accordingly, reliable and youth-specific measurement tools are needed not only for evaluating of the food preference, dietary intake and habits in this population but also for assessing its predictive roles in mental and physical outcomes.

In the current study, as the first over the world we aimed to culturally adapt, translate and evaluate the psychometric properties of food preferences questionnaire (FPQ) developed by Smith et el. 20 and expand its content to cover other foods which are commonly consumed by Iranian adolescents. Therefore, we evaluated internal and test–retest reliability and different validity aspects of the extended FPQ.

Study design and participants

This methodological cross-sectional study was conducted between May 2021 and June 2022 among 452 aged 11–18 years old Persian-speaking adolescents in Isfahan, a largest city in centre of Iran. The adolescents who contributed to our study were from high schools in different educational districts of Isfahan through multistage cluster random sampling. Isfahan has 6 educational districts, of which 4 districts (including districts 1, 2, 3 and 4) were randomly selected as the first‑stage cluster. Then, 10 schools (5 for girls and 5 for boys) were randomly selected from educational districts (2 from district 3 and 4 from district 2 and 2 from each other educational districts). Due to the covid-19 pandemic, the questionnaires were distributed and completed both electronically and in printed forms. At first, the questionnaires were reviewed and approved officially by the central education department of Isfahan province. Then, the link to the electronic version of the questionnaires was officially sent to the headmaster of the selected schools through administrative automation. Then the headmaster informed the teachers about the objectives and content of research and sent them the electronic link of the questionnaires. After that, the teachers put the link of the questionnaires in the virtual class group (that was created in one of the social media networks covering all students of the selected classes) and requested them to complete the questionnaires. Some headmasters of selected high school agreed to conduct the survey by printed forms. On the other hand, online electronic link to the questionnaires along with the explanation of the purpose of the research, guides on how to complete the questionnaire and consent to participate in the study were prepared on the Porsline website (link: https://survey.porsline.ir/n/survey/203178/build/ ). We included only those students who lived in Isfahan city and without major psychological and cognitive problems and physical illness at the time of sample recruitment. Finally, the data of 452 students were used in data analysis, which was used to evaluate the construct, divergent and known-group validities of the questionnaire. All students received enough information about the study and also provided informed consent to participate in our study. This study complies with the Declaration of Helsinki and was performed according to ethics committee approval. The protocol of our study was ethically approved by the National Institute for Medical research Development (NIMAD) (Research project No:982938, ethical approval No” IR.NIMAD.REC.1398.062).

The food preferences questionnaire (FPQ)

Smith et al. developed a questionnaire to obtain information about preferences of various food items from adolescents and adults 20 . It comprised a list of 62 food items which primarily was based on the other tool specifically designed for children 24 . Participants were asked about how much on average they enjoy eating each food item. The FPQ assesses preferences for the six categories of food items (vegetables—18 items, fruit—7 items, meat/fish—12 items, dairy—10 items, snacks—9 items, starches—6 items). For each food item, a six-point Likert scale was used as follows: (1) dislike a lot, (2) dislike a little, (3) neither like nor dislike, (4) like a little, (5) like a lot, and (6) not applicable (For any food that the respondent does not know or remember having tried before). This questionnaire also includes two other questions which evaluate (1) adhering to any specific eating plan (Do you identify as any of the following? (Vegan, Vegetarian, Pescetarian (no meat, but eat fish and/or shellfish) and None of the above options)) and (2) having food allergy (Are you allergic to any of the food items such as Peanuts?) Sub-scales of the questionnaire showed moderate to good external reliability (ICCs = 0.61 to 0.95). Internal reliability was reasonable for all food groups (vegetables: α = 0.89; fruit: α = 0.84; meat or fish: α = 0.81; dairy: α = 0.77; and snacks: α = 0.80) 20 , 25 . It is worth to noting that during the adaptation and validation process, we added several other food items according to Iranian culture to the FPQ-62, which will be explained in the next section.

Translation and content validity

Permission was obtained from the initial developer (Andrea Smith, University College London, London) and the methodology recommended by Beaton et al. was followed to translate the FPQ-62 from English into Persian language 26 . In the forward stage, two completely fluent expert translators translated items of questionnaire into Persian. One of the translators had knowledge on the concept of the questions, but the second one was unaware of the items in the original instrument. Then, a unified version was prepared by the translators. After that, the final form was backward translated into English by two other translators to compare with the original version based on conceptual balance. After a careful review by researchers (A.F. & F.H.), necessary changes were made and the provisional Persian version of the FPQ questionnaire was prepared. After translating the questionnaire into Persian language, the content validity was evaluated qualitatively by a team of dietitians. The results of this phase are summarized in Table 1 . Some food items were removed because they are not consumed in Iranian culture (such as bacon, whose consumption is forbidden in Islam). Some uncommon items were replaced with common counterparts in Iranian diet. For example, the types of Iranian cheese were considered in detailed categories and included in the questionnaire instead of those in the original version. Finally, frequently-used food items in Iran were added to complete the existing food items. These changes led to an initial draft containing 93 food items for the Persian language version.

Psychometric analysis of the Persian-FPQ

Construct validity.

The factor structure of the Persian-FPQ was examined using the exploratory factor analysis (EFA) on 452 adolescents. Kaiser–Meyer–Olkin (KMO) measure of sample size adequacy (values > 0.7) and Bartlett’s test of sphericity (P < 0.05) for evaluating factorability were examined before conducting EFA 27 . During EFA, principal component extraction approach was used along with orthogonal Varimax rotation for interpretability. The number of factors was guided by eigenvalues more than 1 and Scree plot. We retained items with loading values greater than 0.20 for getting both acceptable aspects of interpretability and correlation between items and their related factors. According to the loaded items in each factor, each extracted factor was labeled. We computed the relevant score of each sub-scale (factor) for each participant by summing up of related items multiplied by their loading values.

Known-groups validity

Known-groups validity was assessed based on the Persian-FPQ ability to discriminate between girls and boys and adolescents in body mass index (BMI) groups in terms of food preferences. BMI was categorized into four groups using sex-specific. BMI for age percentile curves developed by the World Health Organization (underweight (less than or equal to 5th percentile), normal weight (between 5 and 85th percentiles), overweight (between 85 and 95th percentiles), and obese (equal to or more than 95th percentile)) was used 28 . We hypothesized that there would be a significant difference in terms of food preference scores between girls and boys 19 and BMI groups. Accordingly, the known-groups validity of the measure is supported if distribution of the Persian-FPQ items is significantly different between considered groups. We distributed the Persian-FPQ questionnaire to 270 girl and 182 boy students and compared their responses. We tested difference in mean score of each sub-scale between gender groups using independent samples t‑test and across BMI groups by one-way analysis of variance (ANOVA). Normality of continuous data was evaluated using Kolmogorov–Smirnov test and Q-Q plot.

Divergent validity

Divergent validity was evaluated using Pearson correlation coefficients between the score of each Persian-FPQ sub-scale and the Kessler Psychological Distress Scale (K-10). We hypothesized that there are negative correlation between some Persian-FPQ dimensions such as fruits and vegetables with psychological distress 29 , 30 , 31 . The K-10 is a 10-item questionnaire that is used to measure psychological distress 32 . The questions in this instrument ask how frequently in the past month the participant has felt tired out for no good reason (Q1), nervous (Q2), so nervous that nothing could calm them down (Q3), disappointed or hopeless (Q4), restless or fidgety (Q5), so restless that they could not sit still (Q6), depressed (Q7), so depressed that nothing could cheer them up (Q8), feeling that everything was an effort (Q9), and feeling worthless (Q10). Responses to each question were scored in a five-point Likert scale as (1) None of the time, (2) A little of the time, (3) Some of the time, (4) Most of the time, (5) All of the time. The total score of k-10 varies from 10 to 50 and a higher score indicates greater psychological distress. The validity and reliability of the Persian version of this questionnaire has been assessed and confirmed earlier (Cronbach’α = 0.83) 33 , 34 .

  • Reliability

To investigate internal consistency and test–retest reliability of Persian-FPQ, 50 adolescents were recruited. Participants were asked to complete the Persian-FPQ measure at two separate days with an interval of 7–10 days. To evaluate test–retest reliability, the intra class correlation coefficient (ICC) coefficient with 95% confidence using two-way mixed model was estimated. We considered the ICC values less than 0.5 as poor, 0.5–0.75 as moderate, 0.75–0.9 as good and more than 0.9 as excellent reliability 35 . We also used Cronbach’s α coefficient in order to evaluate internal consistency and values between 0.70–0.8, 0.8–0.9 was considered as acceptable and good, respectively and more than 0.9 as excellent internal reliability 35 . The extent of the “ceiling and floor effects” was calculated by assessing the distribution of the Persian-FPQ scores.

Ceiling and floor effects

Floor and ceiling effects are defined as the proportion of respondents respond/choice the highest (ceiling) or lowest (floor) possible score of items of a questionnaire and it subscales measuring the sensitivity and coverage. We followed the criteria ≥ 15% as indication of occurring each aspect of floor and ceiling effects 35 .

Other measurements and statistical analysis

Additional data about weight, height, gender, and education level were also collected. In this paper, quantitative and qualitative variables were expressed as mean ± SD and number (precent), respectively. In all statistical analyses P-value < 0.05 was considered as significant level. All analyses were conducted using SPSS software (version 16; SPSS Inc., Chicago, IL, USA).

Ethics approval and consent to participate

All students received enough information about the study and also provided informed consent to participate in our study. The protocol of the study was ethically approved by the National Institute for Medical research Development (NIMAD) (Research project No:982938, ethical approval No” IR.NIMAD.REC.1398.062).

Participant characteristics

A total of 452 adolescents, including 270 (59.7%) girls, participated in the current study. The mean ± SD age was 15.7 ± 1.78 and 14.97 ± 1.66 years for girls and boys, respectively (P > 0.05). About 26% of the participants were obese or overweight. Nearly 93% of the participants did not follow a specific diet. The prevalence of food allergy was estimated to be around 27.41% and 15.93% among girls and boys, respectively (P < 0.01). The mean ± SD psychological distress score was 16.24 ± 10.59 and 10.70 ± 8.87 for girls and boys, respectively (P < 0.01) (Table 2 ).

Construct validity was evaluated by using EFA. We identified seven dimensions from Persian-FPQ measure based on 90 food items: (1) a ‘vegetables’ factor, characterized by high interest to green pepper, garlic, cabbage, celery, onion, turnip, broccoli, beetroot, bell pepper, red peppers, green beans, peas, spinach, mushrooms, carrots, salad leaves (e.g. lettuce), raw tomatoes, tomato paste, corn, and cucumber; (2) a ‘fruit’ factor, which characterized by high interest to cucumber, apricots, cherry, yellow plum, tangerine, pomegranate, peaches, grape, sour cherry, sweet lemon, oranges, dew melon, mulberry, cantaloup, kiwi, strawberries, watermelon, apples, fig, melon, persimmon, and sour green plum; (3) a ‘dairy’ factor, which characterized by high interest to cream, plain low-fat milk, plain full fat milk, porridge, other types of milk (such as cocoa milk), butter, rice-pudding (rice-milk), plain biscuits, eggs, mast, Haleem, and cheese; (4) a ‘snacks’ factor, which characterized by high interest to plain biscuits, salty snacks, chocolate, ketchup, chocolate biscuits, mayonnaise, ice cream, cake, chewable jelly chocolates, sausages, chips, and cream cheese; (5) a ‘meat/fish’ factor, which characterized by high interest to fatty fish, low-fat fish, beef burgers, smoked salmon, lamb, white meat burger, beef, chicken, tinned tuna, ham and eggs; (6) a ‘starches’ factor, which characterized by high interest to whole meal baguette bread, baguette bread without bran, plain boiled rice, traditional bread without bran, traditional whole meal bread, rice and beans, bran cereal, breakfast cereal, potatoes, and baked legumes; (7) finally, a ‘Miscellaneous foods’ factor, which characterized by high interest to avocadoes, margarine, custard, other cheeses (such as parmesan cheese), grapefruit, and parsnips. These factors were accounted for 8.17%, 7.46%, 5.20%, 4.90%, 4.74%, 4.45%, and 2.96% of total variance, respectively. A KMO value 0.847 and P < 0.05 for the Bartlett’s test confirmed data viability for conducting a reliable factor analysis in terms of sample size adequacy and factorability.

Table 3 provides the factor loadings of seven extracted factors from Persian-FPQ items. It should be noted that in the process of construct validity, the lentil soup item was removed due to its low factor loading. We also combined different types of yogurts and two types of cheese (i.e. Iranian white cheese and Traditional cheese) for better interpretability.

Known-groups and divergent validity

For known-groups validity evaluation we compared mean score of Persian-FPQ’s subscales between gender and BMI groups (Table 4 ). Mean ± SD of all the extracted subscales of the Persian-FPQ was significantly higher in boys than girls (P < 0.05). However, no significant difference was observed in terms of the mean value of Persian-FPQ subscales in BMI groups, except for the fifth subscale i.e. ‘meat/fish’ factor. The mean of this factor for obese teenagers was significantly higher than other groups (P = 0.025).

Divergent validity was confirmed by significant negative correlations between five Persian-FPQ subscales (i.e. vegetables, fruit, dairy, meat/fish and starches) and psychological distress measure (P < 0.01) (Table 4 ).

Reliability analyses

The reliability analysis results and descriptive statistics for the seven Persian-FPQ scales are shown in Table 5 . The ICC coefficient for the total score of the Persian-FPQ suggests strong test–retest reliability (ICC = 0.998, 95% CI 0.996 to 0.999; P < 0.001). The ICC coefficients for the extracted subscales including “vegetables”, “fruit”, “dairy”, “snacks”, “meat/fish”, “starches” and “Miscellaneous foods” were estimated to be more than 0.9 indicating excellent test–retest reliability.

Cronbach’s alpha coefficient to indicate item internal consistency for each scale is presented in Table 5 and all scales showed satisfactory results (varied from 0.76 (good) to 0.96 (excellent)). The Cronbach’s alpha coefficient 0.957 for the total score of the Persian-FPQ suggests excellent internal consistency.

Ceiling and floor effect

The percentage of respondents scoring at the highest level (i.e., ceiling effect) was between 0.4 to 9.7% for all subscales, while the percentage of participants scoring at the lowest level i.e., less than 1% (floor effect) was minimal for all subscales. These results indicate high sensitivity and coverage of our validated questionnaire at both ends.

In the current study, the psychometric properties of the Persian version of FPQ were evaluated. To the best of our knowledge, the Persian-FPQ is one of the few versions of fully validated questionnaires to measure food preferences among adolescents. The results of this study showed that the Persian version of FPQ has excellent test–retest reliability and internal consistency. Boys had higher scores of food preferences than girls, indicating good known-group validity. Applying factor analysis for evaluating of construct validity led to seven factors (“vegetables”, “fruit”, “dairy”, “snacks”, “meat/fish”, “starches” and “Miscellaneous foods”) in terms of food preferences. The instrument also showed satisfactory divergent validity.

Internal and test–retest reliabilities in the current study were evaluated through the Cronbach’s α and ICC coefficient, respectively. All subscales' ICC exceeded 0.9, and all Cronbach’s α were between 0.7 to 1, suggesting strong test–retest reliability and internal consistency of Persian- FPQ. The results of test–retest reliability of the Persian-FPQ showed higher reliability than a similar earlier study (test–retest coefficients ranged from 0.61 to 0.95) 20 . The Persian-FPQ questionnaire in the present study showed acceptable internal consistency nearly at the same levels which were observed in the previous study (vegetables: α = 0.89; fruit: α = 0.84; meat or fish: α = 0.81; dairy: α = 0.77; and snacks: α = 0.80) 20 . We calculated Cronbach’s α for starches subscale as 0.773 which is a more acceptable value compared to the previous study (α = 0.68) 20 .

The evaluation of construct validity of the Persian-FPQ led to extraction of seven factors (“vegetables”, “fruit”, “dairy”, “snacks”, “meat/fish”, “starches” and “Miscellaneous foods”), explaining 37.8% of the total variance. Although. the factor structure of the FPQ was not completely and formally evaluated in the original English version using EFA 20 ; however, the suggested domains by Smith et al. and Wardle et al.' s studies were comparable with our findings 20 , 24 . In Smith 's study, 6 dimensions have been reported (vegetables: 18 items; fruit: 7 items; meat or fish: 12 items; dairy: 10 items; snacks: 9 items and starch: 6 items) 20 . In Wardle 's study, four factors have been extracted and named as “Vegetables” (comprised mainly from broccoli, cabbage, carrots, cauliflower, green beans, mushrooms, onions, parsnips, salad greens and tomato), “Desserts” (comprised mainly from cream, cakes, pastries, fruit pie, sponge pudding, custard and dairy desserts), “Meat and Fish” (comprised mainly from beef, lamb, pork, chicken, bacon, fried fish, white fish and oily fish), and “Fruit” (comprised mainly from apples, bananas, citrus fruits, grapes, peaches, strawberries, fruit juice), explaining 24% of the variance 24 . In addition to the different number of items used, participants’ age may explain these contradictory results. Indeed, older children are adequately qualified to express their preferences and direct questions about their food preference may provide more accurate responses 36 , 37 , 38 . In support of this, in Smith et al.’s study, which was conducted on 18–19 y twins, identified factors are more similar to ours rather than those identified in Wardle et al.’s study, which was conducted on 4-y children. Other contributory determinants for the construct validity might be geographic, socio-economic status, culture and racial dependency of food preferences.

We examined the known-group validity based on sex and BMI categories. The Persian version of FPQ well discriminated boys and girls; in which scores of food preferences were all significantly higher among boys. Similarly, in Caine‐Bish et al.'s study, boys preferred meat, fish, and poultry foods over girls. However, in contrast with our study, fruits and vegetables were more frequently preferred by girls rather than boys in their study 19 . A similar report was also found in another study, so that girls liked fruit and vegetables more than boys and boys liked fatty and sugary foods, meat, processed meat products and eggs more than girls 39 . Food preferences are influenced by various factors such as taste preference, food availability and accessibility 40 , 41 . For instance, in a cross-sectional study on 225 children, fruit and vegetable availability was the sole predictor of high fruit and vegetable preferences 41 . Therefore, the inconsistency between different studies might be explained, at least to some extent, by such environmental factors. Regarding BMI, we did not observe a significant difference in terms of food preference scores between BMI subgroups, except for the meat/fish dimension, which was more preferred by obese students. this is in accordance with a positive relationship between meat and overweight/obesity in adolescents 42 .

We examined the divergent validity by examining the correlation between scores of psychological distress and dimensions of Persian-FPQ questionnaire. We observed significant negative correlations between five Persian-FPQ subscales (i.e. vegetables, fruit, dairy, meat/fish and starches) and psychological distress measure. Although, higher preference does not necessarily mean higher intakes, our findings are in line with those which showed an inverse association between higher consumption of these food groups and mental disorders. It has been shown that higher intakes of carbohydrate, cobalamin found in dairy products and meats, and fruit and vegetables which are rich in antioxidant, and essential vitamins for mental health are associated with lower risk of mental disorders 12 , 34 , 43 , 44 , 45 , 46 .

Study strengths and limitations

The strengths of our study are the large number of food items that cover broad food preference-range of Iranian adolescent population and maybe some other countries with similar food and nutrition cultural habits. We also evaluated majority of important aspects of validation process led to provide a reliable and valid questionnaire for community-based research projects. Our study has some limitations that should be highlighted. Due to a part of our survey has been conducted through online media, the response the questionnaire items by children may be influenced by their parent’s attitude towards family, social and health desirability. Although we tried to include wide verities of food items in order to provide highest coverage of potential foods candidate for consuming by Iranian adolescents however, we did our study in center of Iran and it is recommended to do it over the different geographic region for enhancing its generalizability. Despite these potential limitations, we believe that the results our study provide important information for public health stakeholders, policy makers, and researchers.

Conclusions

Previous studies over the world focused on developing valid and reliable food preference instruments for school aged children were not able to cover all or at least majority of common consumed food items regarding the child’s food preferences and in other hand there was no valid instrument for food preference evaluation in Iranian children. Our study introduce a reliable and valid measure for evaluating food preferences with highest coverage of food items that not only applicable in Persian-speaking adolescents’ population but also for school aged children worldwide. The Persian-FPQ is self-report and easy to understand and due to the lack of questionnaire in this field, this tool now can be used in research projects in public health domains in association with other medical and health conditions experienced by children and in nutritional epidemiology. Unhealthy food preference patterns in school-age children affect negatively food consumption and diet intakes lead to increase obesity and chronic diseases in future in this population. Such validated instruments introduced in our study helps interventions concentrating on the improvement of a healthy food environment aimed at enhancing food preferences from childhood stages.

Data availability

The data that support the findings of this study are available on request from the corresponding author.

Rivera Medina, C., Briones Urbano, M., de Jesús Espinosa, A. & Toledo López, Á. Eating habits associated with nutrition-related knowledge among university students enrolled in academic programs related to nutrition and culinary arts in Puerto Rico. Nutrients 12 , 1408 (2020).

Article   PubMed   PubMed Central   Google Scholar  

Zalewska, M. & Maciorkowska, E. Selected nutritional habits of teenagers associated with overweight and obesity. PeerJ 5 , e3681 (2017).

Meiselman, H. L. & Bell, R. Eating habits. In (ed. Caballero, B. B. T.-E. of F. S. and N. (Second E.) 1963–1968 (Academic Press, 2003) https://doi.org/10.1016/B0-12-227055-X/00379-5 .

Cullen, K. W., Chen, T.-A., Dave, J. M. & Jensen, H. Differential improvements in student fruit and vegetable selection and consumption in response to the new National School Lunch Program regulations: A pilot study. J. Acad. Nutr. Diet. 115 , 743–750 (2015).

Dangour, A. D. et al. Food and health in Europe: A new basis for action. WHO Regional Publications, European Series, No. 96, 2004. 388 pp. US $90. ISBN 92 890 1363 X. Eur. J. Public Health 16 , 451 (2006).

Article   Google Scholar  

Musaiger, A. O. Overweight and obesity in eastern mediterranean region: Prevalence and possible causes. J. Obes. 2011 , 1–17 (2011).

Ziegler, A. M. et al. An ecological perspective of food choice and eating autonomy among adolescents. Front. Psychol. 12 , 654139 (2021).

Boccia, F., Alvino, L. & Covino, D. This is not my jam: An Italian choice experiment on the influence of typical product attributes on consumers’ willingness to pay. Nutr. Food Sci. 54 (1), 13–32 (2024).

O’Neill, E. V. The degree of peer influences on children’s food choices at summer camp. (2012).

Geiselman, P. J. et al. Reliability and validity of a macronutrient self-selection paradigm and a food preference questionnaire. Physiol. Behav. 63 , 919–928 (1998).

Article   CAS   PubMed   Google Scholar  

Deglaire, A. et al. Development of a questionnaire to assay recalled liking for salt, sweet and fat. Food Qual. Prefer. 23 , 110–124 (2012).

Cornwell, T. B. & McAlister, A. R. Alternative thinking about starting points of obesity. Development of child taste preferences. Appetite 56 , 428–439 (2011).

Article   PubMed   Google Scholar  

Wardle, J. et al. Health dietary practices among European students. Health Psychol. 16 , 443 (1997).

Sorić, T. et al. Evaluation of the food choice motives before and during the COVID-19 pandemic: A cross-sectional study of 1232 adults from Croatia. Nutrients 13 , 3165 (2021).

Neumark-Sztainer, D., Story, M., Perry, C. & Casey, M. A. Factors influencing food choices of adolescents: Findings from focus-group discussions with adolescents. J. Am. Diet. Assoc. 99 , 929–937 (1999).

Januraga, P. P. et al. Qualitative evaluation of a social media campaign to improve healthy food habits among urban adolescent females in Indonesia. Public Health Nutr. 24 , s98–s107 (2021).

Szakály, Z. et al. Adaptation of the Food Choice Questionnaire: The case of Hungary. Br. Food J. 120 , 1474–1488 (2018).

Maulida, R., Nanishi, K., Green, J., Shibanuma, A. & Jimba, M. Food-choice motives of adolescents in Jakarta, Indonesia: The roles of gender and family income. Public Health Nutr. 19 , 2760–2768 (2016).

Caine-Bish, N. L. & Scheule, B. Gender differences in food preferences of school-aged children and adolescents. J. Sch. Health 79 , 532–540 (2009).

Smith, A. D. et al. Genetic and environmental influences on food preferences in adolescence. Am. J. Clin. Nutr. 104 , 446–453 (2016).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Drewnowski, A., Henderson, S. A., Levine, A. & Hann, C. Taste and food preferences as predictors of dietary practices in young women. Public Health Nutr. 2 , 513–519 (1999).

Bawajeeh, A. O. et al. Impact of taste on food choices in adolescence—systematic review and meta-analysis. Nutrients 12 , 1985 (2020).

Lobstein, T. et al. Child and adolescent obesity: Part of a bigger picture. Lancet 385 , 2510–2520 (2015).

Wardle, J., Sanderson, S., Gibson, E. L. & Rapoport, L. Factor-analytic structure of food preferences in four-year-old children in the UK. Appetite 37 , 217–223 (2001).

FPQ. Food Preference Questionnaire for Adolescents and Adults. https://www.ucl.ac.uk/epidemiology-healthcare/sites/epidemiology-health-care/files/FPQ.pdf (Accessed 18 May 2021) (2021).

Beaton, D. E., Bombardier, C., Guillemin, F. & Ferraz, M. B. Guidelines for the process of cross-cultural adaptation of self-report measures. Spine 25 , 3186–3191 (2000).

Tinsley, H. E. A. & Brown, S. D. Handbook of Applied Multivariate Statistics and Mathematical Modeling (Academic Press, 2000).

Google Scholar  

de Onis, M. et al. Development of a WHO growth reference for school-aged children and adolescents. Bull. World Health Organ. 85 , 660–667 (2007).

Shawon, M. S. R., Jahan, E., Rouf, R. R. & Hossain, F. B. Psychological distress and unhealthy dietary behaviours among adolescents aged 12–15 years in nine South-East Asian countries: A secondary analysis of the Global School-Based Health Survey data. Br. J. Nutr. 129 , 1242–1251 (2023).

Article   CAS   Google Scholar  

Glabska, D., Guzek, D., Groele, B. & Gutkowska, K. Fruit and vegetables intake in adolescents and mental health: A systematic review. Rocz. Państwowego Zakładu Hig. 71 (1), 15–25 (2020).

Nguyen, B., Ding, D. & Mihrshahi, S. Fruit and vegetable consumption and psychological distress: Cross-sectional and longitudinal analyses based on a large Australian sample. BMJ Open 7 , e014201 (2017).

Kessler, R. C. et al. Short screening scales to monitor population prevalences and trends in non-specific psychological distress. Psychol. Med. 32 , 959–976 (2002).

Hajebi, A. et al. Adaptation and validation of short scales for assessment of psychological distress in I ran: the P ersian K10 and K6. Int. J. Methods Psychiatr. Res. 27 , e1726 (2018).

Seyed Askari, S. M., Kamyabi, M., Beigzadeh, A. & Narimisa, F. The relationship between personality features and psychosocial distress among nurses of Shafa hospital in Kerman. Iran. J. Psychiatr. Nurs. 5 , 53–60 (2018).

Terwee, C. B. et al. Quality criteria were proposed for measurement properties of health status questionnaires. J. Clin. Epidemiol. 60 , 34–42 (2007).

Guinard, J.-X. Sensory and consumer testing with children. Trends Food Sci. Technol. 11 , 273–283 (2000).

Ogden, J. & Roy-Stanley, C. How do children make food choices? Using a think-aloud method to explore the role of internal and external factors on eating behaviour. Appetite 147 , 104551 (2020).

Lange, C. et al. Assessment of liking for saltiness, sweetness and fattiness sensations in children: Validation of a questionnaire. Food Qual. Prefer. 65 , 81–91 (2018).

Cooke, L. J. & Wardle, J. Age and gender differences in children’s food preferences. Br. J. Nutr. 93 , 741–746 (2005).

Brug, J., Tak, N. I., Te Velde, S. J., Bere, E. & De Bourdeaudhuij, I. Taste preferences, liking and other factors related to fruit and vegetable intakes among schoolchildren: Results from observational studies. Br. J. Nutr. 99 , S7–S14 (2008).

Cullen, K. W. et al. Availability, accessibility, and preferences for fruit, 100% fruit juice, and vegetables influence children’s dietary behavior. Heal. Educ. Behav. 30 , 615–626 (2003).

Shin, S. M. Association of meat intake with overweight and obesity among school-aged children and adolescents. J. Obes. Metab. Syndr. 26 , 217 (2017).

Trübswasser, U. et al. Assessing factors influencing adolescents’ dietary behaviours in urban Ethiopia using participatory photography. Public Health Nutr. 24 , 3615–3623 (2021).

Mishra, G. D., McNaughton, S. A., O’Connell, M. A., Prynne, C. J. & Kuh, D. Intake of B vitamins in childhood and adult life in relation to psychological distress among women in a British birth cohort. Public Health Nutr. 12 , 166–174 (2009).

Begdache, L., Chaar, M., Sabounchi, N. & Kianmehr, H. Assessment of dietary factors, dietary practices and exercise on mental distress in young adults versus matured adults: A cross-sectional study. Nutr. Neurosci. 22 , 488–498 (2019).

Haghighatdoost, F. et al. Glycemic index, glycemic load, and common psychological disorders. Am. J. Clin. Nutr. 103 , 201–209 (2016).

Download references

Acknowledgements

We would like to express our special thanks to Dr. Andrea Smith for her permission to use the original version of the food preference questionnaire that we extended and validated it in current study.

This study was funded by the National Institute for Medical Research Development (NIMAD grant number: 982938).

Author information

Authors and affiliations.

Department of Biostatistics and Epidemiology, School of Health, Isfahan University of Medical Sciences, Hezar-Jerib Ave., P.O. Box 319, Isfahan, 81746-3461, Iran

Zahra Heidari & Awat Feizi

Isfahan Cardiovascular Research Center, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan, Iran

Fahimeh Haghighatdoost

You can also search for this author in PubMed   Google Scholar

Contributions

ZH: methodology, software, formal analysis, writing—original draft. AF: conceptualization, methodology, investigation, writing—review and editing, funding acquisition, supervision. FH: methodology, writing—review and editing.

Corresponding author

Correspondence to Awat Feizi .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Heidari, Z., Feizi, A. & Haghighatdoost, F. Validity and reliability of the Persian version of food preferences questionnaire (Persian-FPQ) in Iranian adolescents. Sci Rep 14 , 11493 (2024). https://doi.org/10.1038/s41598-024-61433-4

Download citation

Received : 22 October 2023

Accepted : 06 May 2024

Published : 20 May 2024

DOI : https://doi.org/10.1038/s41598-024-61433-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Food preferences
  • Adolescence
  • Psychometrics

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

validity and reliability in research questionnaire

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

Cross-Cultural Adaptation, Validity, Reliability and Clinical Applicability of the Michigan Hand Outcomes Questionnaire, and its Brief Version, to Canadian French

Profile image of Patrick Harris

Journal of Hand Therapy

Related Papers

The Journal of Hand Surgery

Lucas Gallo

validity and reliability in research questionnaire

The bone & joint journal

Abdullah Muhit

Our purpose was to determine the quality of current randomised controlled trials (RCTs) in hand surgery using standardised metrics. Based on five-year mean impact factors, we selected the six journals that routinely publish studies of upper extremity surgery. Using a journal-specific search query, 62 RCTs met our inclusion criteria. Then three blinded reviewers used the Jadad and revised Coleman Methodology Score (RCMS) to assess the quality of the manuscripts. Based on the Jadad scale, 28 studies were of high quality and 34 were of low quality. Methodological deficiencies in poorly scoring trials included the absence of rate of enrolment, no power analysis, no description of withdrawal or dropout, and a failure to use validated outcomes assessments with an independent investigator. A large number of RCTs in hand, wrist, and elbow surgery were of suboptimal quality when judged against the RCMS and Jadad scales. Even with a high level of evidence, study design and execution of RCTs s...

Journal of Hand Surgery (European Volume)

The aim of this study was to identify and assess all existing randomized studies on treatment interventions for hand fractures and joint injuries, to inform practice and plan future research. PubMed, Cochrane CENTRAL, MEDLINE and Embase were searched. We identified 78 randomized controlled trials published over 35 years, covering seven anatomical areas of the hand. We report on sources of bias, sample size, follow-up length and retention, outcome measures and reporting. In terms of interventions studied, the trials were extremely heterogeneous, so it is difficult to draw conclusions on individual treatments. The published randomized controlled clinical trial evidence for hand fractures and joint injuries is narrow in scope and of generally low methodological quality. Mapping provides a useful resource and stepping-stone for planning further research. There is a need for high-quality, collaborative research to guide management of a wider range of common hand injuries.

Ruheena Sangrar

British journal of sports medicine

Timothy Hewett

Some injury prevention programmes aim to reduce the risk of ACL rupture. Although the most common athletic task leading to ACL rupture is cutting, there is currently no consensus on how injury prevention programmes influence cutting task biomechanics. To systematically review and synthesise the scientific literature regarding the influence of injury prevention programme exercises on cutting task biomechanics. The three largest databases (Medline, EMBASE and CINAHL) were searched for studies that investigated the effect of injury prevention programmes on cutting task biomechanics. When possible meta-analyses were performed. Seven studies met the inclusion criteria. Across all studies, a total of 100 participants received exercises that are part of ACL injury prevention programmes and 76 participants served in control groups. Most studies evaluated variables associated with the quadriceps dominance theory. The meta-analysis revealed decreased lateral hamstrings electromyography activi...

Giancarlo McEvenue , Fiona Fitzpatrick , Herb von Schroeder

Background: Hand trauma is a top presenting complaint to hospital emergency departments (EDs) and can become costly if not treated effectively. The cornerstone for initial management of the traumatized hand is application of a splint. Improving splinting practice could potentially produce tangible benefits in terms of quality of care and costs to society. Objectives: We sought to determine the following: 1) whether the present standard of ED splint-ing was appropriate and 2) whether a strategically planned educational intervention could improve the existing care. Methods: We used a pre-and postprospective educational intervention study design. In the preintervention phase, patients referred to our hand clinic were assessed for injury and splint type. Splinting appropriateness was evaluated according to a predetermined hand surgeons' expert consensus. Next, an educational intervention was targeted at all ED staff at our institution. Postintervention, all patients were again evaluated for splint appropriateness. A follow-up evaluation was performed at 1 year to see the long-term effects of the intervention. Results: The most common mechanism of injury of referred patients was falling (35%), and the most frequent injury was metacarpal fracture (40%). Splint appropriateness increased significantly postintervention from 49% to 69% (p = 0.048). At follow-up after 1 year, splinting appropriateness was 70% (p = 0.041). Conclusion: Appropriate hand splinting practice is essential for hand trauma management. Our results show that an educational intervention can successfully improve splinting practice. This quality of care initiative was low-cost and demonstrated persistence at 1 year of follow-up.

Hand Therapy

Lisa O'Brien

BMC neurology

Willem Assendelft

Carpal tunnel syndrome is a common disorder, which can be treated with surgery or conservative options. However, there is insufficient evidence and no consensus among physicians with regard to the preferred treatment for carpal tunnel syndrome. Therefore, a randomized controlled trial is conducted to compare the short- and long-term efficacy of surgery and splinting in patients with carpal tunnel syndrome. An attempt is also made to avoid the (methodological) limitations encountered in earlier trials on the efficacy of various treatment options for carpal tunnel syndrome. Patients of 18 years and older, with clinically and electrophysiologically confirmed idiopathic carpal tunnel syndrome, are recruited by neurologists in 13 hospitals. Patients included in the study are randomly allocated to either open carpal tunnel release or wrist splinting during the night for at least 6 weeks. The primary outcomes are general improvement, waking up at night and severity of symptoms (main compla...

Loree K. Kalliainen

Wrist and Radius Injury Surgical Trial (WRIST) Study Group The Wrist and Radius Injury Surgery Trial (WRIST) study group is a collaboration of 21 hand surgery centers in the United States, Canada, and Singapore, to showcase the interest and capability of hand surgeons to conduct a multicenter clinical trial. The WRIST study group was formed in response to the seminal systematic review by Margaliot et al and the Cochrane report that indicated marked deficiency in the quality of evidence in the distal radius fracture literature. Since the initial description of this fracture by Colles in 1814, over 2,000 studies have been published on this subject; yet, high-level studies based on the principles of evidence-based medicine are lacking. As we continue to embrace evidence-based medicine to raise the quality of research, the lessons learned during the organization and conduct of WRIST can serve as a template for others contemplating similar efforts.

RELATED PAPERS

The Egyptian Journal of Otolaryngology

Maha Shaheen

Puji Pangestu

Arquivo Brasileiro de Medicina Veterinária e Zootecnia

Rafael Faleiros

Clinica Chimica Acta

Amin Mohammad

Automation and Remote Control

Michael Tchaikovsky

Folia morphologica

Christos Lyrtzis

AIChE Journal

Scott McCallum

Samuel Simanjuntak

Journal of Biological Chemistry

William Plaxton

IEEE Open Journal of Antennas and Propagation

Juan Barreto

SANDRA LUZ RIVERA BELTRAN

Research Square (Research Square)

Fartun Ahmed

Human Molecular Genetics

Charbel E.-h. Moussa

Loutfi Aboussouan

International Journal of Contemporary Pediatrics

Bulletin of Egyptian Society for Physiological Sciences

Mohammad Tarabay

Praxis Educativa

Vilma Pruzzo

Marco Onorato

BMC Pregnancy and Childbirth

Joyce Muhenge Olenja

Journal of Transport Geography

Iwona Józefowicz

RELATED TOPICS

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024
  • Open access
  • Published: 18 May 2024

Psychometric properties and criterion related validity of the Norwegian version of hospital survey on patient safety culture 2.0

  • Espen Olsen 1 ,
  • Seth Ayisi Junior Addo 1 ,
  • Susanne Sørensen Hernes 2 , 3 ,
  • Marit Halonen Christiansen 4 ,
  • Arvid Steinar Haugen 5 , 6 &
  • Ann-Chatrin Linqvist Leonardsen 7 , 8  

BMC Health Services Research volume  24 , Article number:  642 ( 2024 ) Cite this article

32 Accesses

1 Altmetric

Metrics details

Several studies have been conducted with the 1.0 version of the Hospital Survey on Patient Safety Culture (HSOPSC) in Norway and globally. The 2.0 version has not been translated and tested in Norwegian hospital settings. This study aims to 1) assess the psychometrics of the Norwegian version (N-HSOPSC 2.0), and 2) assess the criterion validity of the N-HSOPSC 2.0, adding two more outcomes, namely ‘pleasure of work’ and ‘turnover intention’.

The HSOPSC 2.0 was translated using a sequential translation process. A convenience sample was used, inviting hospital staff from two hospitals ( N  = 1002) to participate in a cross-sectional questionnaire study. Data were analyzed using Mplus. The construct validity was tested with confirmatory factor analysis (CFA). Convergent validity was tested using Average Variance Explained (AVE), and internal consistency was tested with composite reliability (CR) and Cronbach’s alpha. Criterion related validity was tested with multiple linear regression.

The overall statistical results using the N-HSOPSC 2.0 indicate that the model fit based on CFA was acceptable. Five of the N-HSOPSC 2.0 dimensions had AVE scores below the 0.5 criterium. The CR criterium was meet on all dimensions except Teamwork (0.61). However, Teamwork was one of the most important and significant predictors of the outcomes. Regression models explained most variance related to patient safety rating (adjusted R 2  = 0.38), followed by ‘turnover intention’ (adjusted R 2  = 0.22), ‘pleasure at work’ (adjusted R 2  = 0.14), and lastly, ‘number of reported events’ (adjusted R 2= 0.06).

The N-HSOPSC 2.0 had acceptable construct validity and internal consistency when translated to Norwegian and tested among Norwegian staff in two hospitals. Hence, the instrument is appropriate for use in Norwegian hospital settings. The ten dimensions predicted most variance related to ‘overall patient safety’, and less related to ‘number of reported events’. In addition, the safety culture dimensions predicted ‘pleasure at work’ and ‘turnover intention’, which is not part of the original instrument.

Peer Review reports

Patient harm due to unsafe care is a large and persistent global public health challenge and one of the leading causes of death and disability worldwide [ 1 ]. Improving safety in healthcare is central in governmental policies, though progress in delivering this has been modest [ 2 ]. Patient safety culture surveys have been the most frequently used approach to measure and monitor perception of safety culture [ 3 ]. Safety culture is defined as “the product of individual and group values, attitudes, perceptions, competencies and patterns of behavior that determine the commitment to, and the style and proficiency of, an organization’s health and safety management” [ 4 ]. Moreover, safety culture refers to the perceptions, beliefs, values, attitudes, and competencies within an organization pertaining to safety and prevention of harm [ 5 ]. The importance of measuring patient safety culture was underlined by the results in a 2023 scoping review, where 76 percent of the included studies observed associations between improved safety culture and reduction of adverse events [ 6 ].

To assess patient safety culture in hospitals the US Agency for Healthcare Research and Quality (AHRQ) launched the Hospital Survey on Patient Safety Culture (HSOPSC) version 1.0 in 2004 [ 7 , 8 ]. Since then, HSOPSC 1.0 has become one of the most used tools to evaluate patient safety culture in hospitals, administered to approximately hundred countries and translated into 43 languages as of September 2022 [ 9 ]. HSOPSC 1.0 has generally been considered to be one of the most robust instrument measuring patient safety culture, and it has adequate psychometric properties [ 10 ]. In Norway, the first studies using N-HSOPSC 1.0 concluded that the psychometric properties of the instrument were satisfactory for use in Norwegian hospital settings [ 11 , 12 , 13 ]. A recent review of literature revealed 20 research articles using the N-HSOPSC 1.0 [ 14 ].

Studies of safety culture perceptions in hospitals require valid and psychometric sound instruments [ 12 , 13 , 15 ]. First, an accurate questionnaire structure should demonstrate a match between the theorized content structure and the actual content structure [ 16 , 17 ]. Second, psychometric properties of instruments developed in one context is required to demonstrate appropriateness in other cultures and settings [ 16 , 17 ]. Further, psychometric concepts need to demonstrate relationships with other related and valid criteria. For example, data on criterion validity can be compared with criteria data collected at the same time (concurrent validity) or with similar data from a later time point (predictive validity) [ 12 , 16 , 17 ]. Finally, researchers need to demonstrate a match between the content theorized to be related to the actual content in empirical data [ 15 ]. If these psychometric areas are not taken seriously, this may lead to many pitfalls both for researchers and practitioners [ 14 ]. Pitfalls might be imprecise diagnostics of the patient safety level and failure to evaluate effect of improvement initiative. Moreover, researchers can easily erroneously confirm or reject research hypothesis when applying invalid and inaccurate measurement tools.

Patient safety cannot be understood as an isolated phenomenon, but is influenced by general job characteristics and the well-being of the individual health care workers. Karsh et al. [ 18 ] found that positive staff perceptions of their work environment and low work pressure were significantly related to greater job satisfaction and work commitment. A direct association has also been reported between turnover and work strain, burnout and stress [ 19 ] Zarei et al. [ 20 ] showed a significant relationship between patient safety (safety climate) and unit type, job satisfaction, job interest, and stress in hospitals. This study also illustrated a strong relationship between lack of personal accomplishment, job satisfaction, job interest and stress. Also, there was a negative correlation between occupational burnout and safety climate, where a decrease in the latter was associated with an increase in the former. Hence, patient safety researchers should look at healthcare job characteristics in combination with patient safety culture.

Recently, the AHRQ revised the HSOPSC 1.0 to a 2.0 version, to improve the quality and relevance of the instrument. HSOPSC 2.0 is shorter, with 25 items removed or with changes made for response options and ten additional items added. HSOPSC 2.0 was validated during the revision process [ 21 ], but the psychometric qualities across cultures, countries and in different settings need further investigation. Consequently, the overall aim of this study was to investigate the psychometric properties of the HSOPSC 2.0 [ 21 ] (see supplement 1) in a Norwegian hospital setting. Specifically, the aims were to 1) assess the psychometrics of the Norwegian version (N-HSOPSC 2.0), and 2) assess the criterion validity of the N-HSOPSC 2.0, adding two more outcomes, namely’ pleasure of work’ and ‘turnover intention’.

This study had cross‐sectional design, using a web-based survey solution called “Nettskjema” to distribute questionnaires in two Norwegian hospitals. The study adheres to The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE)Statement guidelines for reporting observational studies [ 22 ].

Translation of the HSOPSC 2.0

We conducted a «forward and backward» translation in-line with recommendations from Brislin [ 23 ]. First, the questionnaires were translated from English to Norwegian by a bilingual researcher. The Norwegian version was then translated back to English by another bilingual researcher. Thereafter, the semantic, idiopathic and conceptual equivalence between the two versions were compared by the research group, consisting of experienced researchers. The face value of the N-HSOPSC 2.0-version was considered to be adequate and the items lend themselves well to the corresponding latent concepts.

The N-HSOPSC 2.0 was pilot-tested with focus on content and face validity. Six randomly selected healthcare personnel were asked to assess whether the questionnaire was adequate, appropriate, and understandable regarding language, instructions, and scores. In addition, an expert group consisting of senior researchers ( n  = 4) and healthcare personnel ( n  = 6), with competence in patient safety culture was asked to assess the same.

The questionnaire

The HSOSPS 2.0 (supplement 1) consists of 32 items using 5-point Likert-like scales of agreement (from 1 = strongly disagree to 5 = strongly agree) or frequency (from 1 = never to 5 = always), as well as an option for “does not apply/do now know”. The 32 items are distributed over ten dimensions. Additionally, 2-single item patient safety culture outcome measures, and 6-item background information measures are included. The patient safety culture single item outcome measures evaluate the overall ‘patient safety rating’ for the work area, and ‘reporting patient safety events’.

In addition to the N-HSOPSC 2.0, participants were asked to respond to three questions about their ‘pleasure at work’ (measure if staff enjoy their work, and are pleased with their work, scored from 1 = never, to 4 = always) [ 24 ], two questions about their ‘intention to quit’ (measure is staff are considering to quit their job, scored on a 5-point likert scale where 1 = strongly agree to 5 = strongly disagree) [ 25 ], as well as demographic variables (gender, age, professional background, primary work area, years of work experience).

Participants and procedure

The data collection was conducted in two phases: the first phase (Nov-Dec 2021) at Hospital A and the second phase at Hospital B (Feb-March 2022)). We used a purposive sampling strategy: At Hospital A (two locations), all employees were invited to participate ( N  = 6648). This included clinical staff, administrators, managers, and technical staff. At Hospital B (three locations) all employees from the anesthesiology, intensive care and operation wards were invited to participate ( N  = 655).

The questionnaire was distributed by e-mail, including a link to a digital survey solution delivered by the University of Oslo, and gathered and stored on a safe research platform: TSD (services for sensitive data). This is a service with two-factor authentication, allowing data-sharing between the collaborating institutions without having to transfer data between them. The system allows for storage of indirectly identifying data, such as gender, age, profession and years of experience, as well as hospital. Reminders were sent out twice.

Statistical analyses

Data were analyzed using Mplus. Normality was assessed for each item using skewness and kurtosis, where values between + 2 and -2 are deemed acceptable for normal distribution [ 26 ]. Missing value analysis was conducted using frequencies, to check the percentage of missing responses for each item. Correlations were assessed using Spearman’s correlation analysis, reported as Cronbach’s alpha.

Confirmatory factor analysis (CFA) was conducted to test the ten-dimension structure of the N-HSOPSC 2.0 using Mplus and Mplus Microsoft Excel Macros. The structure was then tested for fitness using Comparative Fit Index (CFI), Tucker-Lewis Index (TLI), Root Mean Square Error of Approximation (RMSEA) and Standardized Root Mean Square Residual (SRMR) [ 27 ]. Table 1 shows the fitness indices and acceptable thresholds.

Reliability of the 10 predicting dimensions were also assessed using composite reliability (CR) values, where 0.7 or above is deemed acceptable for ascertaining internal consistency [ 25 ].

Convergent validity was assessed using the Average Variance Explained (AVE), where a value of at least 0.5 is deemed acceptable [ 28 ], indicating that at least 50 percent of the variance is explained by the items in a dimension. Criterion-related validity was tested using linear regression, adding ‘turnover intention’ and ‘pleasure at work’ to the two single item outcomes of the N-HSOPSC 2.0.

Internal consistency and reliability were assessed using Cronbach’s alpha, where values > 0.9 is assumed excellent, > 0.8 = good, > 0.7 = acceptable, > 0.6 = questionable, > 0.5 = poor and < 0.5 = unacceptable [ 29 ].

Ethical considerations

The study was conducted in-line with principles for ethical research in the Declaration of Helsinki, and informed consent was obtained from all the participants [ 30 ]. Completed and submitted questionnaires were assumed as consent to participate. Data privacy protection was reviewed by the respective hospitals’ data privacy authority, and assessed by the Norwegian Center for Research Data (NSD, project number 322965).

In total, 1002 participants responded to the questionnaire, representing a response rate of 12.6 percent. As seen in Table  2 , 83.7% of the respondents worked in Hospital A and the remaining 16.3% in Hospital B. The majority of respondents (75.7%) were female, and 75.9 percent of respondents worked directly with patients.

The skewness and kurtosis were between + 2 and -2, indicating that the data were normally distributed. All items had less than two percent of missing values, hence no methods for calculating missing values were used.

Correlations

Correlations and Cronbach’s alpha are displayed in Table  3 .

The following dimensions had the highest correlations; ‘teamwork’, ‘staffing and work pace’, ‘organizational learning-continuous improvement’, ‘response to error’, ‘supervisor support for patient safety’, ‘communication about error’ and ‘communication openness’. Only one dimension, ‘teamwork’ (0.58), had a Cronbach’s alpha below 0.7 (acceptable). Hence, most of the dimensions indicated adequate reliability. Higher levels of the 10 safety dimensions correlate positively with patient safety ratings.

Confirmatory Factor Analysis (CFA)

Table 4 shows the results from the CFA. CFA ( N  = 1002) showed acceptable fitness values [CFI = 0.92, TLI = 0.90, RMSEA = 0.045, SRMR = 0.053] and factor loadings ranged from 0.51–0.89 (see Table  1 ). CR was above the 0.70 criterium on all dimensions except on ‘teamwork’ (0.61). AVE was above the 0.50 criterium except on ‘teamwork’ (0.35), ‘staffing and work pace’ (0.44), ‘organizational learning-continuous improvement’ (0.47), ‘response to error’ (0.47), and communication openness.

Criterion validity

Independent dimensions of HSOPSC 2.0 were employed to predict four different criteria: 1) ‘number of reported events’, 2) ‘patient safety rating’, 3) ‘pleasure at work’, and 4) ‘turnover intentions’. The composite measures explained variance of all the outcome variables significantly thereby ascertaining criterion-related validity (Table  5 ). Regression models explained most variance related to ‘patient safety rating’ (adjusted R 2  = 0.38), followed by ‘turnover intention’ (adjusted R 2  = 0.22), ‘pleasure at work’ (adjusted R 2  = 0.14), and lastly, number of reported events (adjusted R 2  = 0.06).

In this study we have investigated the psychometric properties of the N-HSOPSC 2.0. We found the face and content validity of the questionnaire satisfactory. Moreover, the overall statistical results indicate that the model fit based on CFA was acceptable. Five of the N-HSOPSC 2.0 dimensions had AVE scores below the 0.5 criterium, but we consider this to be the strictest criterium employed in the evaluations of the psychometric properties. The CR criterium was met on all dimensions except ‘teamwork’ (0.61). However, ‘teamwork’ was one of the most important and significant predictors of the outcomes. One the positive side, the CFA results supports the dimensional structure of N-HSOPSC 2.0, and the regression results indicate a satisfactory explanation of the outcomes. On the more critical side, particularly AVE scores reflect threshold below 0.5 on five dimensions, indicating items have certain levels of measurement error as well.

In our study, regression models explained most variance related to ‘patient safety rating’ (R 2  = 0.38), followed by ‘turnover intention’ (R 2  = 0. 22), ‘pleasure at work’ (R 2  = 0.14), and lastly, number of reported events (R 2  = 0.06). This supports the criterion validity of the independent dimensions of N-HSOSPC 2.0, also when adding ‘turnover intention’ and ‘pleasure at work’. These results confirm previous research on the original N-HSOPSC 1.0 [ 12 , 13 ]. The current study also found that ‘number of reported events’ was negatively related to safety culture dimensions, which is also similar to the N-HSOPSC 1.0 findings [ 12 , 13 ].

The current study did more psychometric assessments compared to the first Norwegian studies using HSOPSC 1.0 [ 11 , 12 , 13 ]. However, results from the current study still support that the overall reliability and validity of N-HSOPSC 2.0 when comparing the results with the first studies using N-HSOPSC 1.0 [ 11 , 12 , 13 ]. Also, based on theory and expectations, the dimensions predicted ‘pleasure at work’ and ‘overall safety rating’ positively, and ‘turnover intentions’ and ‘number of reported events’ negatively. The directions of the relations thereby support the overall criterion validity. Some of the dimensions do not predict the outcome variables significantly, nonetheless, each criterion related significantly to at least two dimensions on the HSOPSC 2.0. It is also worth noticing that ‘teamwork’ was generally one of the most important predictors even thought this dimension had the lowest convergent validity (AVE) in the previous findings [ 11 , 12 , 13 ], even if the strict AVE criterium was not satisfactory on the teamwork dimension and CR was also below 0.7. Since the explanatory power of teamwork was satisfactory, this illustrate that the AVE and CR criteria are maybe too strict.

The sample in the current study consisted of 1009 employees at two different hospital trusts in Norway and across different professions. The gender and ages are representative for Norwegian health care workers. In total 760 workers had direct patient contact, 167 had not, and 74 had patient contact sometimes. We think this mix is interesting, since a system perspective is key to establishing patient safety [ 31 ]. The other background variables (work experience, age, primary work area, and gender) indicate a satisfactory spread and mix of personnel in the sample, which is an advantage since then the sample to a large extend represent typical healthcare settings in Norway.

In the current study, N-HSOPSC 2.0 had higher levels of Cronbach’s alpha than in the first N-HSOPSC 1.0 studies [ 11 , 13 ], but more in-line with results from a longitudinal Norwegian study using the N-HSOPSC 1.0 in 2009, 2010 and 2017 respectively [ 23 ]. Moreover, the estimates in the current study reveal a higher level of factor loading on the N-HSOPSC 2.0, ranging from 0.51 to 0.89. This is positive since CFA is a key method when assessing the construct validity [ 16 , 17 , 32 ].

AVE and CR were not estimated in the first Norwegian HSOPSC 1.0 studies [ 11 , 13 ]. The results in this study indicate some issues regarding particularly AVE (convergent validity) since five of the concepts were below the recommended 0.50 threshold [ 32 ]. It is also worth noticing that all measures in the N-HSOPSC 2.0, except ‘teamwork’ (CR = 61), had CR values above 0.70, which is satisfactory. AVE is considered a strict and more conservative measure than CR. The validity of a construct may be adequate even though more than 50% of the variance is due to error [ 33 ]. Hence, some AVE values below 0.50 is not considered critical since the overall results are generally satisfactory.

The first estimate of the criterion related validity of the N-HSOPSC 2.0 using multiple regression indicated that two dimensions where significantly related to ‘number of reported events’, while six dimensions were significantly related to ‘patient safety rating’. The coefficients were negatively related with number of reported events, and positively related with patient safety rating, as expected. In the first Norwegian study in Norway on the N-HSOPSC 1.0 [ 13 ], five dimensions were significantly related to ‘number of reported events’, and seven dimensions were significantly related to ‘patient safety ratings’. The relations with ‘numbers of events reported’ were then both positive and negative, which is not optimal when assessing criterion validity. Hence, since all significant estimates are in the expected directions, the criterion validity of N-HSOPSC 2.0 has generally improved compared to the previous version.

In the current study we added ‘pleasure at work’ and ‘turnover intention’ to extend the assessment of criterion related validity. The first assessment indicated that ‘teamwork’ had a very substantial and positive influence on ‘pleasure at work’. Moreover, ‘staffing and work pace’ also had a positive influence on ‘pleasure at work’, but none of the other concepts were significant predictors. Hence, the teamwork dimension is key in driving ‘pleasure at work’, then followed by ‘staffing and working pace’. ‘Turnover intentions’ was significantly and negatively related to ‘teamwork’, ‘staffing and working pace’, ‘response to error’ and ‘hospital management support’. Hence, the results indicate these dimensions are key drivers in avoiding turnover intentions among staff in hospitals. A direct association has been reported between turnover and work strain, burnout and stress [ 19 ]. Zarei et al. [ 20 ] showed a significant relationship between patient safety (safety climate) and unit type, job satisfaction, job interest, and stress in hospitals. This study also illustrated a strong relationship between lack of personal accomplishment, job satisfaction, job interest and stress. Furthermore, a negative correlation between occupational burnout and safety climate was discovered, where a decrease in the latter is associated with an increase in the former [ 20 ]. Hence, patient safety researchers should look at health care job characteristics in combination with patient safety culture.

Assessment of psychometrics must consider other issues beyond statistical assessments such as theoretical consideration and face validity [ 16 , 17 ]; we believe one of the strengths of the HSOPSC 1.0 is that the instrument was operationalized based on theoretical concepts. This has been a strength, as opposed to other instruments built on EFA and a random selection of items included in the development process. We believe this is also the case in relation to HSOPSC 2.0; the instrument is theoretically based, easy to understand, and most importantly, can function as a tool to improve patient safety in hospitals. Moreover, when assessing the items that belongs to the different latent constructs, item-dimension relationships indicate a high face validity.

Forthcoming studies should consider predicting other outcomes, such as for instance mortality, morbidity, length of stay and readmissions, with the use of N-HSOPSC 2.0.

Limitations

This study is conducted in two Norwegian public hospital trusts, indicating some limitations about generalizability. The response rate within hospitals was low and therefore we could not benchmark subgroups. However, this was not part of the study objectives. The response rate may be hampered by the pandemic workload, and high workload in the hospitals. However, based on the diversity of the sample, we find the study results robust and adequate to explore the psychometric properties of N-HSOPSC 2.0. For the current study, we did not perform sample size calculations. With over 1000 respondents, we consider the sample size adequate to assess psychometric properties. Moreover, the low level of missing responses indicate N-HSOPSC 2.0 was relevant for the staff included in the study.

There are many alternative ways of exploring psychometric capabilities of instruments. For example, we did not investigate alternative factorial structures, e.g. including hierarchical factorial models or try to reduce the factorial structure which has been done with N-HSOPSC 1.0 short [ 34 ]. Lastly, we did not try to predict patient safety indicators over time using a longitudinal design and other objective patient safety indicators.

The results from this study generally support the validity and reliability of the N-HSOPSC 2.0. Hence, we recommend that the N-HSOPSC 2.0 can be applied without any further adjustments. However, future studies should potentially develop structural models to strengthen the knowledge and relationship between the factors included in the N-HSOPSC 2.0/ HSOPSC 2.0. Both improvement initiatives and future research projects can consider including the ‘pleasure at work’ and ‘turnover intentions’ indicators, since N-HSOPSC 2.0 explain a substantial level of variance relating to these criteria. This result also indicates an overlap between general pleasure at work and patient safety culture which is important when trying to improve patient safety.

Availability of data and materials

Datasets generated and/or analyzed during the current study are not publicly available due to local ownership of data, but aggregated data are available from the corresponding author on reasonable request.

World Health Organization. Global patient safety action plan 2021–2030: towards eliminating avoidable harm in health care. 2021. https://www.who.int/teams/integrated-health-services/patient-safety/policy/global-patient-safety-action-plan .

Rafter N, Hickey A, Conroy RM, Condell S, O’Connor P, Vaughan D, Walsh G, Williams DJ. The Irish National Adverse Events Study (INAES): the frequency and nature of adverse events in Irish hospitals—a retrospective record review study. BMJ Qual Saf. 2017;26(2):111–9.

Article   PubMed   Google Scholar  

O’Connor P, O’Malley R, Kaud Y, Pierre ES, Dunne R, Byrne D, Lydon S. A scoping review of patient safety research carried out in the Republic of Ireland. Irish J Med. 2022;192:1–9.

Google Scholar  

Halligan M, Zecevic A. Safety culture in healthcare: a review of concepts, dimensions, measures and progress. BMJ Qual Saf. 2011;20(4):338–43.

Weaver SJ, Lubomksi LH, Wilson RF, Pfoh ER, Martinez KA, Dy SM. Promoting a culture of safety as a patient safety strategy: a systematic review. Ann Intern Med. 2013;158(5):369–74.

Article   PubMed   PubMed Central   Google Scholar  

Vikan M, Haugen AS, Bjørnnes AK, Valeberg BT, Deilkås ECT, Danielsen SO. The association between patient safety culture and adverse events – a scoping review. BMC Health Serv Res 2023;300. https://doi.org/10.1186/s12913-023-09332-8 .

Sorra J, Nieva V. Hospital survey on patient safety culture. AHRQ publication no. 04–0041. Rockville: Agency for Healthcare Research and Quality; 2004.

Nieva VF, Sorra J. Safety culture assessment: a tool for improving patient safety in healthcare organizations. Qual Saf Health Car. 2003;12:II17–23.

Agency for Healthcare Research and Quality (AHQR). International use of SOPS. https://www.ahrq.gov/sops/international/index.html .

Flin R, Burns C, Mearns K, Yule S, Robertson E. Measuring safety climate in health care. Qual Saf Health Care. 2006;15(2):109–15.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Olsen E, Aase K. The challenge of improving safety culture in hospitals: a longitudinal study using hospital survey on patient safety culture. International Probabilistic Safety Assessment and Management Conference and the Annual European Safety and Reliability Conference. 2012;2012:25–9.

Olsen E. Safety climate and safety culture in health care and the petroleum industry: psychometric quality, longitudinal change, and structural models. PhD thesis number 74. University of Stavanger; 2009.

Olsen E. Reliability and validity of the Hospital Survey on Patient Safety Culture at a Norwegian hospital. Quality and safety improvement research: methods and research practice from the International Quality Improvement Research Network (QIRN) 2008:173–186.

Olsen E, Leonardsen ACL. Use of the Hospital Survey of Patient Safety Culture in Norwegian Hospitals: A Systematic Review. Int J Environment Res Public Health. 2021;18(12):6518.

Article   Google Scholar  

Hughes DJ. Psychometric validity: Establishing the accuracy and appropriateness of psychometric measures. The Wiley handbook of psychometric testing: A multidisciplinary reference on survey, scale and test development; 2018:751–779.

DeVillis RF. Scale development: Theory and application. Thousands Oaks: Sage Publications; 2003.

Netemeyer RG, Bearden WO, Sharma S. Scaling procedures: Issues and application. London: SAGE Publications Ltd; 2003.

Book   Google Scholar  

Karsh B, Booske BC, Sainfort F. Job and organizational determinants of nursing home employee commitment, job satisfaction and intent to turnover. Ergonomics. 2005;48:1260–81. https://doi.org/10.1080/00140130500197195 .

Article   CAS   PubMed   Google Scholar  

Hayes L, O’Brien-Pallas L, Duffield C, Shamian J, Buchan J, Hughes F, Spence Laschinger H, North N, Stone P. Nurse turnover: a literature review. Int J Nurs Stud. 2006;43:237–63.

Zarei E, Najafi M, Rajaee R, Shamseddini A. Determinants of job motivation among frontline employees at hospitals in Teheran. Electronic Physician. 2016;8:2249–54.

Agency for Healthcare Research and Quality (AHQR). Hospital Survey on Patient Safety Culture. https://www.ahrq.gov/sops/surveys/hospital/index.html .

von Elm E, Altman DG, Egger M, Pocock SJ, Gotzsche PC, Vandenbroucke JP. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: guidelines for reporting observational studies. BMJ. 2007;335(7624):806–8.

Brislin R. Back translation for cross-sectional research. J Cross-Cultural Psychol. 1970;1(3):185–216.

Notelaers G, De Witte H, Van Veldhoven M, Vermunt JK. Construction and validation of the short inventory to monitor psychosocial hazards. Médecine du Travail et Ergonomie. 2007;44(1/4):11.

Bentein K, Vandenberghe C, Vandenberg R, Stinglhamber F. The role of change in the relationship between commitment and turnover: a latent growth modeling approach. J Appl Psychol. 2005;90(3):468.

Tabachnick B, Fidell L. Using multivariate statistics. 6th ed. Boston: Pearson; 2013.

Hu L, Bentler P. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modelling. 1999;6(1):1–55.

Hair J, Sarstedt M, Hopkins L, Kuppelwieser V. Partial least squares structural equation modeling (PLS-SEM): An emerging tool in business research. Eur Business Rev. 2014;26:106–21.

George D, Mallery P. SPSS for Windows step by step: A simple guide and reference. 11.0 update. Boston: Allyn & Bacon; 2003.

World Medical Association. Declaration of Helsinki- Ethical Principles for Medical Research Involving Human Subjects. 2018. http://www.wma.net/en/30publications/10policies/b3 .

Farup PG. Are measurements of patient safety culture and adverse events valid and reliable? Results from a cross sectional study. BMC Health Serv Res. 2015;15(1):1–7.

Hair JF, Black WC, Babin BJ, Anderson RE. Applications of SEM. Multivariate data analysis. Upper Saddle River: Pearson; 2010.

Malhotra NK, Dash S. Marketing research an applied orientation (paperback). London: Pearson Publishing; 2011.

Olsen E, Aase K. A comparative study of safety climate differences in healthcare and the petroleum industry. Qual Saf Health Care. 2010;19(3):i75–9.

Download references

Acknowledgements

Master student Linda Eikemo is acknowledged for participating in the data collection in Hospital A, and Nina Føreland in Hospital B.

Not applicable.

Author information

Authors and affiliations.

UiS Business School, Department of Innovation, Management and Marketing, University of Stavanger, Stavanger, Norway

Espen Olsen & Seth Ayisi Junior Addo

Hospital of Southern Norway, Flekkefjord, Norway

Susanne Sørensen Hernes

Department of Clinical Sciences, University of Bergen, Bergen, Norway

Department of Obstetrics and Gynecology, Stavanger University Hospital, Stavanger, Norway

Marit Halonen Christiansen

Faculty of Health Sciences Department of Nursing and Health Promotion Acute and Critical Illness, OsloMet - Oslo Metropolitan University, Oslo, Norway

Arvid Steinar Haugen

Department of Anaesthesia and Intensive Care, Haukeland University Hospital, Bergen, Norway

Faculty of Health, Welfare and Organization, Østfold University College, Fredrikstad, Norway

Ann-Chatrin Linqvist Leonardsen

Department of anesthesia, Østfold Hospital Trust, Grålum, Norway

You can also search for this author in PubMed   Google Scholar

Contributions

EO, ASH and ACLL initiated the study. All authors (EO, SA, SSH, MHC, ASH, ACLL) participated in the translation process. SSH and ACLL were responsible for data collection. EO and SA performed the statistical analysis, which was reviewed by ASH and ACLL. EO, SA and ACLL wrote the initial draft of the manuscript, and all authors (EO, SA, SSH, MHC, ASH, ACLL) critically reviewed the manuscript. All authors(EO, SA, SSH, MHC, ASH, ACLL) have read and approved the final version of the manuscript.

Corresponding author

Correspondence to Ann-Chatrin Linqvist Leonardsen .

Ethics declarations

Ethics approval and consent to participate.

The study was conducted in-line with principles for ethical research in the Declaration of Helsinki, and informed consent was obtained from all the participants [ 30 ]. Eligible healthcare personnel were informed of the study through hospital e-mails and by text messages. Completed and submitted questionnaires were assumed as consent to participate. According to the Norwegian Health Research Act §4, no ethics approval is needed when including healthcare personnel in research.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Olsen, E., Addo, S.A.J., Hernes, S.S. et al. Psychometric properties and criterion related validity of the Norwegian version of hospital survey on patient safety culture 2.0. BMC Health Serv Res 24 , 642 (2024). https://doi.org/10.1186/s12913-024-11097-7

Download citation

Received : 03 April 2023

Accepted : 09 May 2024

Published : 18 May 2024

DOI : https://doi.org/10.1186/s12913-024-11097-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Hospital survey on patient safety culture
  • Patient safety culture
  • Psychometric testing

BMC Health Services Research

ISSN: 1472-6963

validity and reliability in research questionnaire

The short Thai version of functional outcomes of sleep questionnaire (FOSQ-10T): reliability and validity in patients with sleep-disordered breathing

  • Sleep Breathing Physiology and Disorders • Original Article
  • Open access
  • Published: 15 May 2024

Cite this article

You have full access to this open access article

validity and reliability in research questionnaire

  • Kawisara Chaiyaporntanarat 1 ,
  • Wish Banhiran   ORCID: orcid.org/0000-0002-4029-6657 1 , 2 ,
  • Phawin Keskool 1 ,
  • Sarin Rungmanee 2 ,
  • Chawanont Pimolsri 2 ,
  • Wattanachai Chotinaiwattarakul 2 , 3 &
  • Auamporn Kodchalai 2 , 4  

312 Accesses

Explore all metrics

The study is to evaluate reliability and validity of the short Thai version of Functional Outcome of Sleep Questionnaire (FOSQ-10T), in patients with sleep disordered breathing (SDB).

Inclusion criteria were Thai patients with SDB age ≥ 18 years old who had polysomnography results available. Exclusion criteria were patients unable to complete questionnaire for any reason, patients with a history of continuous antidepressant or alcohol use, and underlying disorders including unstable cardiovascular, pulmonary, or neurological conditions. All participants were asked to complete the FOSQ-10 T and Epworth sleepiness scales (ESS). Of these, 38 patients were required to retake FOSQ-10 T at 2–4 weeks later to assess test–retest reliability, and 19 OSA patients treated with CPAP were asked to do so at 4 weeks following therapy to assess questionnaire’s responsiveness to treatment.

There were 42 participants (24 men, 18 women), with a mean age of 48.3 years. The internal consistency of the FOSQ-10T was good, as indicated by Cronbach’s alpha coefficient of 0.85. The test–retest reliability was good, as indicated by intraclass correlation coefficient of 0.77. The correlation between the FOSQ-10T and ESS scores (concurrent validity) was moderate ( r  =  − 0.41). The scores of FOSQ-10T significantly increased after receiving adequate CPAP therapy, showing an excellent responsiveness to treatment. However, there was no significant association between FOSQ-10T scores and OSA severity measured by apnea–hypopnea index.

Conclusions

The FOSQ-10T has good reliability and validity to use as a tool to assess QOL in Thai patients with SDB. It is convenient and potentially useful in both clinical and research settings.

Similar content being viewed by others

validity and reliability in research questionnaire

The DS-14 questionnaire: psychometric characteristics and utility in patients with obstructive sleep apnea

Psychometric properties of the turkish version of modified freedman questionnaire for sleep quality.

validity and reliability in research questionnaire

The influential factor of narcolepsy on quality of life: compared to obstructive sleep apnea with somnolence or insomnia

Avoid common mistakes on your manuscript.

Introduction

The term “sleep-disordered breathing” (SDB) refers to a category of high prevalent sleep disorders that are distinguished by abnormal breathing patterns when the patient is asleep. Its negative consequences, especially obstructive sleep apnea (OSA), include excessive daytime sleepiness, high blood pressure, poor quality of life (QOL), cardiometabolic diseases, and sensorineural hearing loss [ 1 , 2 , 3 , 4 , 5 , 6 ]. In addition to lowering these possible morbidities and deaths, improving the patients’ QOL is a key goal of appropriately treating SDB [ 7 ].

Currently available instruments to assess health-related QOL in individuals with sleep disorders include general and disease-specific questionnaires [ 5 , 8 , 9 ]. The Functional Outcomes of Sleep Questionnaire (FOSQ-30), however, is perhaps one of the most widely utilized [ 10 ]. The questionnaire is a standardized self-report form consisting of 30 items that cover various domains including sexual relationships, general productivity, activity level, vigilance, and social consequence. Each of the FOSQ-30 items is given a score between 0 and 4, with a higher score representing a higher quality of life. However, one of the FOSQ-30’s drawbacks is the somewhat lengthy time required to respond to all questions (a total of 20–25 min). The original authors subsequently developed a shortened version (FOSQ-10) to make the QOL assessment easier and more efficient while still maintaining all crucial components [ 11 ]. Unfortunately, there is currently no validated version of this tool available for Thai patients.

The FOSQ-10 has been used to study the effects of therapeutic interventions such as functional septorhinoplasty [ 12 ], continuous positive airway pressure (CPAP) treatment [ 13 , 14 ], oral appliance [ 15 , 16 ], and the effects of gastroesophageal reflux disease [ 17 ]. Previous research also showed that the FOSQ-10 was validated across a number of languages and ethnics [ 18 , 19 , 20 ]. In Iranians, a study found that the FOSQ-10 was comparable in meaning to the original version [ 18 ]. In Peruvians, the Spanish version of FOSQ-10 showed good internal consistency, construct validity, and sensitivity to change in patients with OSA who received treatment [ 19 ]. In Chinese, a study reported that the FOSQ-10 was a valid and reliable instrument for identifying the effects of sleep-related impairment in women during pregnancy [ 20 ]. Yet, no study has looked into its use in Thai people.

The primary objective of this study was to evaluate reliability and validity of the short Thai version of Functional Outcomes of Sleep Questionnaire (FOSQ-10T). The secondary objectives were to evaluate (1) QOL of OSA patients pre- and post-CPAP treatment, (2) QOL of SDB patients across different AHI severity, and (3) the correlation between scores of the FOSQ-10T and Epworth sleepiness scale (ESS).

Material and methods

This observational, prospective research was approved by the Siriraj Institutional Review (SIRB), COA Si 258/2021. The study was conducted between November 2021 and February 2022. All participants gave their informed consent.

Subjects and allocation

The inclusion criteria were Thai patients with SDB who were at least 18 years old and had polysomnography (PSG) results available. The exclusion criteria were patients who were unable to complete questionnaire for any reason, those with a history of long-term sedatives, antidepressants, or alcohol use, and those with underlying medical conditions that would significantly impair QOL.

All participants were asked to complete the FOSQ-10T and ESS. Of these, 38 patients were asked to retake the FOSQ-10T between two and four weeks later in order to evaluate test–retest reliability, and 19 patients with OSA who were receiving CPAP were asked to retake the questionnaire 4 weeks later in order to evaluate the questionnaire’s responsiveness to treatment, as presented in the flow chart (Fig.  1 ).

figure 1

The flow chart of the study; FOSQ-10, the short form of the Functional Outcomes of Sleep Questionnaire; ESS, Epworth Sleepiness scale; CPAP, continuous positive airway pressure

The short Thai version of Functional Outcomes of Sleep Questionnaire (FOSQ-10T)

The FOSQ-10T is a 10-item self-reported questionnaire that measures the impact of sleep disturbances on daily functioning. Though it is a condensed form of the FOSQ-30, it still includes all of the important domains: activity level (three items), vigilance (three items), sexual relationships (one item), general productivity (two items), and social outcome (one item). Every item is assigned a number between 0 and 4, where a higher number corresponds to a higher QOL. With permission of Professor Terri Weaver, the original developers [ 10 ] graciously provided us with the FOSQ-10T (see Supplementary information the appendix ) for use in this study. Standard processes were used to translate it both forward and backward from English to Thai.

Epworth sleepiness scales (ESS)

The ESS is a self-administered questionnaire used to assess an individual's subjective level of sleepiness. It comprises of eight questions that ask the respondent to rate their likelihood of dozing off or falling asleep in a range of everyday scenarios, such as sitting and conversing with someone, watching television, or riding as a passenger in a car. The scores range from 0 to 3, where a higher number indicates greater sleepiness. In this study, we use a validated Thai version of ESS with permission [ 21 ] (see Supplementary information the appendix ).

Statistical analysis

The categorical data were presented as numbers and percentages, whereas the continuous data were presented as mean ± standard deviation (SD). The Cronbach’s alpha coefficient was used to evaluate the internal consistency for reliability and the intraclass coefficient (ICC) was used to evaluate test–retest reliability. The discriminant validity of the FOSQ-10T among each SDB severity was evaluated using a Kruskal–Wallis one-way analysis of variance (ANOVA), and the concurrent validity of the FOSQ-10T and ESS was evaluated using a scatter plot and a Pearson correlation coefficient. A significance level of p  < 0.05 was employed to denote statistical significance. The Statistical Package for the Social Science (SPSS) version 22, International Business Machines Corporation, Armonk, NY, USA, was utilized to conduct the statistical analyses.

For this study, 42 participants (24 men and 18 women) with a mean age of 48.3 ± 15.3 years and a mean BMI of 27.9 ± 5.4 kg/m 2 were recruited. The apnea–hypopnea index (AHI) and mean ESS scores for the group were 38.6 ± 29.8 events/h and 7 ± 3.8, respectively. Among the participants, 16 (38.1%) had hypertension, 11 (26.2%) had dyslipidemia, and 7 (16.7%) had underlying diabetes mellitus.

Reliability

Cronbach’s alpha coefficient of the FOSQ-10T ranged from 0.82 to 0.85 for all 10 items, and the removal of any items did not significantly alter the result (Table  1 ). This suggested a high level of internal consistency. With an ICC of 0.77 and a 95% confidence interval (CI) of 0.60–0.88, the FOSQ-10T demonstrated good test–retest reliability.

The concurrent validity between FOSQ-10T and ESS scores was shown by a scatter plot (Fig.  1 ) and Pearson correlation coefficient of − 0.41 ( p  = 0.01) which indicated that there was a moderate correlation. However, the results of the discriminant validity analysis using the Kruskal–Wallis one-way ANOVA indicated that there was no significant correlation between the OSA severity as determined by the AHI and the FOSQ-10T scores (Table  2 ).

Responsiveness

The mean FOSQ-10 T scores across all domains and the overall scores considerably improved after receiving adequate CPAP therapy (Table  3 ). This proved that the questionnaire had an excellent responsiveness to treatment (Fig.  2 ).

figure 2

The scatter plot of ESS score and FOSQ-10; FOSQ-10, the short Form of the Functional Outcomes of Sleep Questionnaire; ESS, Epworth sleepiness scale; CPAP, continuous positive airway pressure

When treating patients with SDB, the QOL of the patients is an important consideration that cannot be overlooked. While there are other instruments to assess this issue, the FOSQ-30 is likely one of the most widely utilized disease-specific questionnaires. However, its disadvantage is that answering every question takes a substantial amount of time. As a result, the original authors eventually developed a shortened version (FOSQ-10) to streamline and improve the effectiveness of the QOL evaluation while maintaining all crucial elements [ 11 ]. Comparable to the original, this updated version has demonstrated good validity and reliability [ 10 ]. It has been used to evaluate the effects of various therapeutic interventions [ 12 , 14 , 17 ] and has been validated in several other languages, including Spanish, Persian, and Chinese [ 18 , 19 , 20 ].

This study is most likely the first to report on the validity and reliability of the FOSQ-10T in Thai patients with SDB. The results of our study showed that the Cronbach’s alpha coefficient, which measures internal consistency, was 0.85, indicating good reliability. This closely resembles the original version [ 10 ] and studies conducted in Chinese, Spanish, and Persian [ 18 , 19 , 20 ] that found a Cronbach’s alpha of 0.84–0.87.

The results of the present study showed that the FOSQ-10T has good test–retest reliability with an ICC of 0.77. This could suggest that the tool is reliable when applied repeatedly to the same Thai population. It should be mentioned that the ICC of this study was comparable to that of the Chinese study (ICC of 0.73) and another Thai version of the FOSQ-30 (ICC of 0.70) [ 5 ], but it was lower than that of the Iranian study (ICC of 0.92). A straight comparison of ICC values across studies, however, would not always be appropriate because different study designs, populations, and measuring techniques can all have an impact on the values.

According to the study, there was a moderately negative correlation between the FOSQ-10T and the ESS ( r  =  − 0.41). This association is similar to that of the Iranian version [ 18 ] and another Thai version of FOSQ-30 [ 5 ]. Our finding, however, diverged from that of the Spanish [ 19 ] and Chinese studies [ 20 ], which did not discover any significant relationships between the FOSQ-10 T and ESS scores.

Among the OSA patients with different AHI severities in this study, there were no statistically significant differences in the FOSQ-10T scores. Mild OSA had the lowest scores, whereas moderate OSA had the highest. These, however, differ from the original English [ 11 ], Iranian [ 18 ], and Spanish [ 18 , 19 ] versions which showed moderate degrees of discriminating validity.

Not surprisingly, after therapy, the FOSQ-10T scores of participants in this study who used CPAP appropriately improved significantly. These findings were in line with a number of other studies; thus, it may indicate that the questionnaire had a high degree of treatment responsiveness.

This study may have some limitations. First, because the FOSQ-10T  scores were subjectively evaluated by individuals, bias cannot be avoided. Second, the FOSQ-10T and the original FOSQ-30 were not directly compared, so there is a chance that both of them will produce different results when applied. Furthermore, only relatively healthy SDB patients were assessed in this study. For this reason, our findings could not be directly applied to patients suffering from critical illnesses such as heart failure, stroke, or chronic renal diseases, or to patients with other sleep disorders including insomnia or hypersomnolence due to central origin. It is recommended that further study be done in populations with varying characteristics or manifestations.

The results of this study indicate that the FOSQ-10T is a valid and reliable tool for evaluating QOL in Thai patients with SDB. In clinical practice, physicians may utilize the questionnaire to monitor therapy results and customize interventions to fit specific patient needs. In research, the FOSQ-10T may be used to evaluate the effectiveness of various therapeutic or diagnostic approaches.

Data availability

To comply with general data protection regulation and to protect people’s privacy, the raw data for this dataset is not publicly accessible.

Kasemsuk N, Chayopasakul V, Banhiran W, Prakairungthong S, Rungmanee S, Suvarnsit K et al (2023) Obstructive sleep apnea and sensorineural hearing loss: a systematic review and meta-analysis. Otolaryngol Head Neck Surg 169:201–209

Article   PubMed   Google Scholar  

Sangchan T, Banhiran W, Chotinaiwattarakul W, Keskool P, Rungmanee S, Pimolsri C (2023) Association between REM-related mild obstructive sleep apnea and common cardiometabolic diseases. Sleep Breath 27:2265–2271

Uataya M, Banhiran W, Chotinaiwattarakul W, Keskool P, Rungmanee S, Pimolsri C (2023) Association between hypoxic burden and common cardiometabolic diseases in patients with severe obstructive sleep apnea. Sleep Breath 27:2423–2428

Baldwin CM, Griffith KA, Nieto FJ, O’Connor GT, Walsleben JA, Redline S (2001) The association of sleep-disordered breathing and sleep symptoms with quality of life in the Sleep Heart Health Study. Sleep 24:96–105

Article   CAS   PubMed   Google Scholar  

Banhiran W, Assanasen P, Metheetrairut C, Nopmaneejumruslers C, Chotinaiwattarakul W, Kerdnoppakhun J (2012) Functional outcomes of sleep in Thai patients with obstructive sleep-disordered breathing. Sleep Breath 16:663–675

Lal C, Weaver TE, Bae CJ, Strohl KP (2021) Excessive daytime sleepiness in obstructive sleep apnea. mechanisms and clinical management. Ann Am Thorac Soc 18:757–768

Article   PubMed   PubMed Central   Google Scholar  

Cai Y, Tripuraneni P, Gulati A, Stephens EM, Nguyen DK, Durr ML et al (2022) Patient-defined goals for obstructive sleep apnea treatment. Otolaryngol Head Neck Surg 167:791–798

Banhiran W, Assanasen P, Metheetrairut C, Chotinaiwattarakul W (2013) Health-related quality of life in Thai patients with obstructive sleep disordered breathing. J Med Assoc Thai 96:209–216

PubMed   Google Scholar  

Rahavi-Ezabadi S, Amali A, Sadeghniiat-Haghighi K, Montazeri A, Nedjat S (2016) Translation, cultural adaptation, and validation of the Sleep Apnea Quality of Life Index (SAQLI) in Persian-speaking patients with obstructive sleep apnea. Sleep Breath 20:523–528

Weaver TE, Laizner AM, Evans LK, Maislin G, Chugh DK, Lyon K et al (1997) An instrument to measure functional status outcomes for disorders of excessive sleepiness. Sleep 20:835–843

CAS   PubMed   Google Scholar  

Chasens ER, Ratcliffe SJ, Weaver TE (2009) Development of the FOSQ-10: a short version of the Functional Outcomes of Sleep Questionnaire. Sleep 32:915–919

Hismi A, Yu P, Locascio J, Levesque PA, Lindsay RW (2020) The impact of nasal obstruction and functional septorhinoplasty on sleep quality. Facial Plast Surg Aesthet Med 22:412–419

Lam AS, Collop NA, Bliwise DL, Dedhia RC (2017) Validated measures of insomnia, function, sleepiness, and nasal obstruction in a CPAP alternatives clinic population. J Clin Sleep Med 13:949–957

Boyer L, Philippe C, Covali-Noroc A, Dalloz MA, Rouvel-Tallec A, Maillard D et al (2019) OSA treatment with CPAP: randomized crossover study comparing tolerance and efficacy with and without humidification by ThermoSmart. Clin Respir J 13:384–390

Banhiran W, Assanasen P, Nopmaneejumrudlers C, Nujchanart N, Srechareon W, Chongkolwatana C et al (2018) Adjustable thermoplastic oral appliance versus positive airway pressure for obstructive sleep apnea. Laryngoscope 128:516–522

Banhiran W, Durongphan A, Keskool P, Chongkolwatana C, Metheetrairut C (2020) Randomized crossover study of tongue-retaining device and positive airway pressure for obstructive sleep apnea. Sleep Breath 24:1011–1018

Laohasiriwong S, Johnston N, Woodson BT (2013) Extra-esophageal reflux, NOSE score, and sleep quality in an adult clinic population. Laryngoscope 123:3233–3238

Rahavi-Ezabadi S, Amali A, Sadeghniiat-Haghighi K, Montazeri A (2016) Adaptation of the 10-item Functional Outcomes of Sleep Questionnaire to Iranian patients with obstructive sleep apnea. Qual Life Res 25:337–341

Rey de Castro J, Rosales-Mayor E, Weaver TE (2018) Reliability and validity of the Functional Outcomes of Sleep Questionnaire - Spanish Short Version (FOSQ-10SV) in Peruvian patients with obstructive sleep apnea. J Clin Sleep Med 14:615–621

Tsai SY, Shun SC, Lee PL, Lee CN, Weaver TE (2016) Validation of the Chinese version of the Functional Outcomes of Sleep Questionnaire-10 in pregnant women. Res Nurs Health 39:463–471

Banhiran W, Assanasen P, Nopmaneejumruslers C, Metheetrairut C (2011) Epworth sleepiness scale in obstructive sleep disordered breathing: the reliability and validity of the Thai version. Sleep Breath 15:571–577

Download references

Acknowledgements

The authors express their gratitude to Jeerapa Kerdnoppakhun for all of her research assistance including paperwork and working process procedures, as well as to Chulaluk Komoltri for her statistical analysis and sample size computation. Along with all of the patients who participated in this study, the authors also acknowledge the kind cooperation of the medical staffs at the Siriraj Sleep Center and the Department of Otorhinolaryngology.

Open access funding provided by Mahidol University This research project was supported by Siriraj Research Fund, grant number (IO) R016531056, Faculty of Medicine Siriraj Hospital, Mahidol University, Thailand.

Author information

Authors and affiliations.

Department of Otorhinolaryngology, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand

Kawisara Chaiyaporntanarat, Wish Banhiran & Phawin Keskool

Siriraj Sleep Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand

Wish Banhiran, Sarin Rungmanee, Chawanont Pimolsri, Wattanachai Chotinaiwattarakul & Auamporn Kodchalai

Neurology Division, Department of Medicine, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand

Wattanachai Chotinaiwattarakul

American Board of Sleep Medicine, Department of Otorhinolaryngology, Faculty of Medicine Siriraj Hospital, Certified International Sleep Specialist, Mahidol University, 2 Wanglang Road, Bangkok Noi, Bangkok, 10700, Thailand

Auamporn Kodchalai

You can also search for this author in PubMed   Google Scholar

Contributions

Kawisara Chaiyaporntanarat: conception and design, data acquisition, collection, analysis, interpretation, drafting the article, final approval; Wish Banhiran: conception and design, data acquisition, interpretation, critical revisions, drafting the article, final approval, and being the corresponding author; Wattanachai Chotinaiwattarakul, Phawin Keskool, Sarin Rungmanee, Chawanont Pimolsri, Auamporn Kodchalai: data acquisition, interpretation, critical revisions, final approval.

Corresponding author

Correspondence to Wish Banhiran .

Ethics declarations

Ethical approval.

This study was approved by the Siriraj Institutional Review Board (SIRB).

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 163 KB)

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Chaiyaporntanarat, K., Banhiran, W., Keskool, P. et al. The short Thai version of functional outcomes of sleep questionnaire (FOSQ-10T): reliability and validity in patients with sleep-disordered breathing. Sleep Breath (2024). https://doi.org/10.1007/s11325-024-03024-1

Download citation

Received : 09 January 2024

Revised : 28 February 2024

Accepted : 15 March 2024

Published : 15 May 2024

DOI : https://doi.org/10.1007/s11325-024-03024-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Functional Outcome of Sleep Questionnaire
  • Obstructive sleep apnea
  • Sleep-disordered breathing
  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 18 May 2024

Parental hesitancy toward children vaccination: a multi-country psychometric and predictive study

  • Hamid Sharif-Nia 1 , 2 ,
  • Long She 3 ,
  • Kelly-Ann Allen 4 , 12 ,
  • João Marôco 5 ,
  • Harpaljit Kaur 6 ,
  • Gökmen Arslan 7 ,
  • Ozkan Gorgulu 8 ,
  • Jason W. Osborne 9 ,
  • Pardis Rahmatpour 10 &
  • Fatemeh Khoshnavay Fomani 11  

BMC Public Health volume  24 , Article number:  1348 ( 2024 ) Cite this article

28 Accesses

1 Altmetric

Metrics details

Understanding vaccine hesitancy, as a critical concern for public health, cannot occur without the use of validated measures applicable and relevant to the samples they are assessing. The current study aimed to validate the Vaccine Hesitancy Scale (VHS) and to investigate the predictors of children’s vaccine hesitancy among parents from Australia, China, Iran, and Turkey. To ensure the high quality of the present observational study the STROBE checklist was utilized.

A cross-sectional study.

In total, 6,073 parent participants completed the web-based survey between 8 August 2021 and 1 October 2021. The content and construct validity of the Vaccine Hesitancy Scale was assessed. Cronbach’s alpha and McDonald’s omega were used to assess the scale’s internal consistency, composite reliability (C.R.) and maximal reliability (MaxR) were used to assess the construct reliability. Multiple linear regression was used to predict parental vaccine hesitancy from gender, social media activity, and perceived financial well-being.

The results found that the VHS had a two-factor structure (i.e., lack of confidence and risk) and a total of 9 items. The measure showed metric invariance across four very different countries/cultures, showed evidence of good reliability, and showed evidence of validity. As expected, analyses indicated that parental vaccine hesitancy was higher in people who identify as female, more affluent, and more active on social media.

Conclusions

The present research marks one of the first studies to evaluate vaccine hesitancy in multiple countries that demonstrated VHS validity and reliability. Findings from this study have implications for future research examining vaccine hesitancy and vaccine-preventable diseases and community health nurses.

Peer Review reports

Introduction

Emerging and re-emerging infectious diseases have threatened human life many times throughout history. Many researchers and experts agree that vaccinations are one of the most protective and preventative mechanisms for disease control and pandemic prevention [ 1 ]. For example, in case of COVID-19, vaccines were developed to boost immunity to curb the spread of the highly infectious disease [ 2 ] and save an estimated 14.4 million lives globally [ 3 ]. Despite the reported success of many vaccines in terms of disease spread, reduced symptoms, and adverse outcomes, as well as the historical success of vaccination more generally in preventing disease outbreaks, vaccine hesitancy remains an enduring and critical threat to health globally. Vaccine hesitancy has been identified as a central factor affecting vaccine uptake rates, impacting the potential emergence and re-emergence of vaccine-preventable diseases [ 4 ].

The SAGE Working Group on Vaccine Hesitancy defined vaccine hesitancy as a “delay in [the] acceptance or refusal of vaccination despite availability of vaccination services” and found that people’s reluctance to receive safe and available vaccines was a growing concern, long before the recent COVID-19 pandemic [ 5 ]. Previous research has linked vaccine hesitancy to various factors, such as concerns for safety and effectiveness, which may have emerged due to the unprecedented scale and speed at which the vaccines were developed [ 6 ]. Other factors fuelling vaccine hesitancy include a lack of information [ 7 ], conspiracy theories, and low trust in governments and institutions [ 8 , 9 ].

Parental vaccine hesitancy

Parental vaccine hesitancy is a crucial concern for public health due to its close links to vaccination delay, refusal, or denial in children, which ultimately increases their vulnerability to preventable diseases [ 10 , 11 ]. It is estimated that approximately 25% of children aged between 19 and 35 months have not been vaccinated due to the vaccine hesitancy of their parents [ 12 ]. For parents specifically, hesitancy is associated with misinformation on the internet [ 13 ], concern for finances, skepticism towards vaccine safety and necessity, confidence in a vaccine, and perceptions of the vaccine’s risk [ 14 ]. Additionally, parental vaccine hesitancy may be influenced to a large extent by environmental conditions, such as epidemics. Accordingly, children’s vaccination was identified as a challenging health issue during the COVID-19 pandemic, with implications for the health and spread of the diseases to the broader population [ 15 , 16 ].

Research has found that parental perceptions of risk and vaccine confidence generally contribute significantly to parental vaccine hesitancy. Parents have been reported to worry about potential side effects of the vaccines as well as their general effectiveness [ 12 ]. Meanwhile, low confidence in vaccination has been linked to reducing herd immunity and increasing infection among those who are immunocompromised or not vaccinated [ 17 ], especially in children.

Theoretical perspectives

The Health Belief Model (HBM) proposed by Hochbaum, Rosenstock, & Kegels (1952) suggests that vaccine decision-making is based on individuals’ perceptions of diseases and vaccines. Therefore, the perceived severity and susceptibility of diseases and the perceived risks and benefits of the vaccines may predict parental intentions to vaccinate their children [ 18 ]. Parent decisions in protective behaviours can therefore be shaped by their appraisal of the threat. According to protection motivation theory (PMT), threat appraisal refers to one’s adaptive actions, which consist of threat severity, maladaptive rewards, and threat vulnerability [ 19 ]. Parental appraisals of a disease as a threat thus shape patterns of vaccine hesitancy.

Considering existing theories, models, and conceptualizations, various measures have been developed and evaluated for assessing vaccine hesitancy. These measures assess an individual’s confidence in vaccines (Vaccine Confidence Scale) [ 20 , 21 ], parental attitudes toward childhood vaccines [ 22 ], and conspiracy beliefs related to vaccines [ 23 ]. Among the existing measures, the Vaccine Hesitancy Scale (VHS) was originally developed by Larson and colleagues from the SAGE Working Group on Vaccine Hesitancy [ 24 ], and psychometrically tested by Shapiro et al. (2016) among Canadian parents three years later. Their study revealed a two-factor structure (lack of confidence and risk) of the 9-item VHS among Canadian parents in French and English. In the study, one item was removed, and two items were loaded in the “risk” dimension, with the other six loading in the lack of confidence dimension [ 25 ]. Another study among parents in Guatemala also revealed a two-factor solution where the 7-item VHS was a better fit than the 10-item scale [ 26 ]. Further research is needed to refine the scale and assess its validity in different countries and contexts. Understanding vaccine hesitancy cannot occur without the use of validated measures applicable and relevant to the samples they are assessing. The current study, therefore, aims to psychometrically evaluate the Vaccine Hesitancy Scale among parents in Australia, China, Iran, and Turkey.

Study design and participants

The data used in this study is part of a broader research project on identifying the leading factors of parental vaccine hesitancy. A methodological cross-sectional research design was employed to validate the VHS based on data from four countries (i.e., Australia, China, Iran, and Turkey). A survey was distributed to parents across four countries over eight weeks, between 8 August 2021 and 1 October 2021. The inclusion criteria for respondents’ eligibility were parents with at least one child aged 18 years or under. The minimum sample size for conducting the Confirmatory Factor Analysis (CFA) was based on the criteria of [ 1 ] bias of parameters estimates < 10%; [ 2 ] 95% confidence intervals coverage > 91%; and [ 3 ] statistical power > 80% [ 27 ]. A minimum sample size of 200 was found to be sufficient to achieve the required criteria. To ensure the sample would reflect a normative population variance, this study collected more than 300 responses from each country. Using a convenient sampling technique, this study collected a total of 6,073 samples across the four countries: Australia (2734), China (523), Iran (2447), and Turkey (369). The online questionnaire was created by Google Form and sent to participants via social platform such as WhatsApp, Telegram and national application.

Sociodemographic characteristics

The data of parents’ sociodemographic characteristics such as age, gender, education level, living area, their perception regarding their economic status, and being active in social media were gathered using a sociodemographic form.

The vaccine hesitancy scale (VHS)

The VHS (ten-items) was originally developed by the SAGE Working Group on Vaccine Hesitancy, which is used to access parental vaccine hesitancy in their children. Although the original measure was not psychometrically evaluated by the original developers, it was later validated amongst a sample of Canadian parents [ 25 ]. The VHS has a validated two-factor structure: (1) lack of confidence (seven items; e.g., “Childhood vaccines are important for my child’s health”), and (2) risk (two items; e.g., “New vaccines carry more risks than old vaccines”). The scoring procedure for items in the VHS are rated on a 5-point Likert scale ranging from one (strongly disagree) to five (strongly agree). The current study consisted of four versions of the VHS: English (for Australia), Chinese (for China), Persian (for Iran), and Turkish (for Turkey). The English version was adopted from the Shapiro, Tatar [ 25 ] study. The Chinese, Persian, and Turkish versions were translated using the WHO protocol of forward-backward translation technique from the original English version. All versions were checked for cross-cultural equivalence.

Translation procedure

The cross-cultural adaptation procedure [ 28 ] was used to translate the items (sociodemographic information and VHS) from English via the translation and back-translation procedure into Chinese, Persian, and Turkish. All translators were bilingual. Two translators independently translated the questionnaires into the country’s respective languages. The research team then assessed the translated versions selecting the most appropriate item translations. Following this step, two other bilingual translators, who were “blinded” to the original questionnaire version, conducted the back-translation procedure independently. The expert committee (consisting of research team members, two nurses, one physician in social medicine, and a methodologist) then checked the back-translated version to ensure the accuracy and equivalence to the original questionnaire version. The committee also assessed the cross-cultural equivalence and appropriateness of the questionnaire to the study population, as well as the semantic equivalence of the items. No items were changed during the procedure.

Data analysis

Descriptive statistics.

This study used R and RStudio to perform all statistical analyses. The skimr and psych package was applied to produce descriptive statistics, which included the minimum v (Min), maximum (Max), and average value (M) as well as skewness and kurtosis for each item. Additionally, this study also generated histograms for each item [ 29 , 30 , 31 ]. Multiple linear regression was used to predict parental vaccine hesitancy from gender, Self-perception as being an active person on social media, and perceived financial well-being.

Confirmatory factor analysis

This study conducted a confirmatory factor analysis (CFA) using the lavaan package to assess the psychometric properties of the VHS across four countries. The factorial structure and model fit was confirmed and assessed in this stage. Model fit was evaluated using several fit indices such as the comparative fit index (CFI) > 0.90, normed fit index (NFI) > 0.90, Tucker–Lewis’s index (TLI) > 0.90, Standardized Root Mean Square residual (SRMR) < 0.09, and root mean square error of approximation < 0.08 [ 32 , 33 ].

Construct validity and reliability

To assess the VHS’s construct validity, both convergent and discriminant validity were assessed using the SemTools package. For convergent validity, the Average Variance Extracted (AVE) for each construct should be more than 0.5 [ 34 ]. Concerning discriminant validity, this study followed the Heterotrait-monotrait ratio of correlations (HTMT) approach, which denotes that all correlations between constructs in the HTMT matrix table should be less than 0.85 [ 35 ] and the correlations should have an AVE larger than the squared correlation between factors (Fornell & Larcker, 1981; Marôco, 2021). To assess the reliability of the VHS, the SemTools package was used to compute Cronbach’s alpha (α) and omega coefficients (ω), where α and ω values greater than 0.7 demonstrates an acceptable internal consistency and construct reliability [ 36 , 37 , 38 ].

Invariance assessment

To detect whether the factor structure of the VHS holds across the four countries, a set of nested models were defined and compared using the lavaan package with robust maximum likelihood estimation, namely, configural invariance model (no constraints), metric invariance model (constrained factor loadings between four countries), scalar invariance model (constrained loadings and intercepts), and structural invariance model (second order factor loadings constrained). Invariance was assessed using absolute ΔCFI and ΔRMSEA < 0.02. Invariance was assumed for ΔCFI < 0.01 and absolute ΔRMSEA < 0.02 [ 39 , 40 ] between two nested models as described elsewhere [ 27 ].

Ethical considerations

The Ethics Committee of Mazandaran University of Medical Sciences Research Ethics Committee approved the Ethical Considerations of this study (ethic code: IR.MAZUMS.REC.1401.064). In addition, all participants were informed of the purpose of the data collection, and questionnaires were distributed to the respondents only after they provided their consent to participate in the survey. Moreover, the respondents were ensured that their participation was on a voluntary basis and the confidentiality of all collected data was guaranteed.

Participants’ demographic characteristics and mean (S.D.) of COVID-19 vaccine hesitancy

This study employed a cross-sectional, questionnaire-based research design. In total, 6,073 parents from Australia (2734), China (523), Iran (2447), and Turkey (369) completed the survey through an online questionnaire platform. According to the Table  1 , the majority of respondent were female (84.15%) and between 20 and 40 years old (54.61%).

Item distribution properties

Table  2 shows the descriptive summary of the nine items’ minimum value (Min), maximum value (Max), average value (M), skewness, kurtosis, and histograms. The Item number 10 was dropped out due to the cross loading.

A CFA was used to confirm whether the factorial structure of the VHS used in the current study was consistent with results from the original validation study. The results of the CFA demonstrated a good model fit of the two-factor measurement model as evidenced by the model fit indices: CFI (0.972), NFI (0.971), TLI (0.958), SRMR (0.037), and RMSEA (90% C.I.) [0.074 (0.067, 0.074)]. The results also showed that all factor loadings for all items were greater than 0.5 and statistically significant. Figure  1 depicts the factor structure of the VHS in this study.

figure 1

The results of the Confirmatory Factor Analysis (CFA)

Construct validity assessment

The results showed that the AVE for the sub-factor “lack of confidence” was greater than 0.5 (0.735), and the AVE for the sub-factor “risk” was slightly less than 0.5 (0.494). Previous literature indicated that AVE is a conservative and strict measure of convergent validity, and convergent validity can be assessed on the basis of Composite Reliability (C.R.) alone. Therefore, based on the results of C.R., the VHS in this study established convergent validity across all countries. The results of the HTMT correlation matrix showed that discriminant validity was also achieved, as the HTMT between “lack of confidence” and “risk” was 0.395, which is less than the suggested cut-off value of 0.85. The squared correlation between the two factors was 0.153. As this factor is less than the AVE for both “lack of confidence” (0.735) and “risk” (0.494), further evidence of discriminant validity was supported.

Construct reliability assessment

The results showed that the measurement model displayed good internal consistency and reliability, as evidenced by α (Lack of confidence: 0.952; Risk: 0.628) and ω (Lack of confidence: 0.946; Risk: 0.651).

Country invariance assessment

Prior to the Country Invariance Assessment, the vaccine hesitancy among parents score was compared across four countries. The results showed that the vaccine hesitancy among parents score was respectively in Iran (35.96, SD = 4.19), Australia (34.68, SD = 6.21), Turkey (34.09, SD = 4.78) and China (21.65, SD = 4.61) ( P  < 0.001). While China clearly has different average levels of parental vaccine hesitancy, this does not preclude similar psychometric properties (i.e., factor structure) to other countries.

Country invariance assessment was tested in line with standard procedures, with a set of nested increasingly constrained models (see Table  3 ).

First, configurational invariance tests whether the basic structure of the measure is invariant, imposing no equality restrictions on parameters. Second, metric (weak) invariance was tested by constraining factor loadings to be invariant across countries. The ignorable change from configural variance to metric invariance (ΔCFI and ΔRMSEA of -0.009 and 0.004 respectively) supports this level of invariance. Delta chi-squared was significant (Χ 2 (21)  = 235.55; p  < 0.001), but chi squared is notoriously sensitive to ignorable changes when high df are present, and so is not considered a desirable metric.

Third, scalar invariance (“strong invariance”) constrained both factor loadings and item intercepts. Strong invariance is often considered beyond what is necessary for typical applications. These constraints produced a significant delta chi square ( Χ 2 (21)  = 1044.251; p  < 0.001) and a modest ΔCFI=-0.029; ΔRMSEA = 0.021. Finally, structural invariance, which constrained second-order factor loadings also produced a modest further degradation of model fit, but is also considered so extreme as to not be necessary. These results are sufficient to assert metric invariance.

Predictive validity. To further explore parental hesitancy, we examined whether VHS scores were related to gender, social media activity, and perceived financial well-being. All three variables, as predicted, were related to VHS. Because these variables were measured categorically, ANOVA was employed.

Gender was significantly related to VHS ( F (1. 6070)  = 86.62, p  < 0.001, η 2  = 0.014), with those identifying as female or “other” having more vaccine hesitancy (M = 34.37, SD = 6.37; M = 34.04, SD = 6.53) than those identifying as male (M = 32.22, SD = 7.08).

Social media activity was significantly related to VHS ( F (1. 5547)  = 69.54, p  < 0.001, η 2  = 0.012), with those indicating higher social media activity having more vaccine hesitancy (M = 34.89, SD = 5.86) than those indicating lower social media activity (M = 33.49, SD = 6.61).

Financial well-being was also modestly related to VHS ( F (1. 6070)  = 42.52, p  < 0.005, η 2  = 0.002), with those identifying as most affluent having more vaccine hesitancy (M = 34.37, SD = 6.46) than those with moderate (M = 33.94, SD = 6.49) or low affluence (M = 33.32, SD = 7.12).

Vaccines reduce the diseases’ mortality and severity; therefore, vaccine hesitancy impacts global public health. The current study aimed to psychometrically evaluate the Vaccine Hesitancy Scale (VHS) among parents in Australia, China, Iran, and Turkey.

The current study found that a brief measure of parental vaccine hesitancy, when appropriately translated, is able to be used in broadly diverse sociocultural contexts. The Vaccine Hesitancy Scale showed strong and desirable psychometric properties, including predicted factor structure, strong reliability, metric invariance across country, validity, and expected relationships to self-reported outcomes such as affluence, gender, and social media engagement. These results align with the original validation study conducted in Canada [ 25 ] and another validating the scale in Guatemala [ 26 ].

These samples from four different countries and cultures were not ideal- there were far fewer fathers than mothers in three of the four samples (i.e., 4.2% of respondents in Australia, 35% in China, 17.69% in Iran, and 53.9% in Turkey were fathers). However, this could be considered a strength as in many cultures, mothers have more decision-making responsibility for the health and welfare of children than fathers [ 41 ], and it was mothers who were found to have higher vaccine hesitancy. This finding is aligned with the health belief model stating that gender plays a strong role in determining vaccine acceptance [ 18 ]. Existing qualitative research revealed the mothers’ mixed feelings on vaccination (e.g., confusion from conflicting information) [ 42 ]. Mothers in Australia expressed guilt about failing to be a good mother [ 43 ]. Studies have indicated that Chinese mothers exhibit a greater vaccine hesitancy for their children than fathers, due to their concerns regarding vaccine safety and effectiveness. It has been mentioned that fathers generally have a higher tendency for risk behaviours than mothers, so they may be more willing to vaccinate their children [ 44 ].

Among four countries, the vaccine hesitancy score was lower in China. It should be noted that this differences are not statistically significant. In China, parents are less hesitant to vaccinate their children compared to countries like Iran, Turkey, and Australia. This can be attributed to several key factors. Firstly, China has a communication strategy that focuses on transparency and providing authoritative information about vaccines, which has helped build public trust in the vaccination process. Additionally, China’s rapid development and distribution of COVID-19 vaccines have ensured a consistent supply of safe and effective vaccines, contributing to lower rates of vaccine hesitancy. Cultural and social factors also play a significant role, as China’s collectivist culture emphasizes community health and well-being, influencing parents to prioritize vaccinating their children. The Chinese government has implemented policies like providing free vaccines and launching public awareness campaigns to promote vaccination, reducing hesitancy rates. Moreover, China’s success in controlling infectious diseases through previous vaccination programs has created a positive attitude towards vaccines, influencing parents’ decisions. Overall, effective communication, safe vaccine availability, cultural influences, government initiatives, and past vaccination success have all contributed to lower levels of vaccine hesitancy among parents in China compared to other countries [ 14 , 45 ].

Ancillary analyses observed age differences in vaccine hesitancy, but only in Australia, where parents between 40 and 60 years old were more vaccine hesitant than the other age groups ( p  < 0.001, F = 8.10), supporting past research [ 46 ] indicating that younger parents were less likely to be hesitant to vaccinate their children. The reason behind this phenomenon might be that younger parents have less experience with infectious diseases (such as smallpox and poliomyelitis) and, perhaps it makes them less hesitant to vaccinate their children against diseases.

Regarding the current study, the reason why older parents were more hesitant than younger parents might be that during the conducting of this study, they may have older children who should be vaccinated against COVID-19; the vaccine that its side effects, or even its effectiveness were not clear in this age group. When national health systems started to vaccinate children against COVID-19 older children were included in the program, and then it was extended to children five years old and older. It has been indicated that new vaccines generate more hesitancy [ 47 ]. further research needs to be conducted (e.g., qualitative research) to find more details.

These findings also noted that more affluent individuals, and those with more social media engagement tended to be more hesitant to their children’s vaccination, which aligns with prior studies [ 14 , 48 , 49 ]. Some prior studies have suggested that parents who perceived more financial comfort believed that their lifestyle could protect them from diseases, and therefore, they were more hesitant to vaccinate their children [ 49 ]. The role of social media on vaccine hesitancy has been identified by previous studies. In this regard, parents may be confused by misinformation and fake news in the media and on social networks [ 50 ]; consequently, they experience fear, stress, and a wide range of behavioural changes [ 51 , 52 ]. Misinformation may make parents more cautious and force them to show their hesitancy with vaccines, especially new vaccines.

The current study indicated that lack of confidence in the vaccine and perceived vaccine risk contribute to parental vaccine hesitancy. According to the “3 Cs model” (confidence, complacency, and convenience) presented by the SAGE working group [ 53 ], lack of confidence in vaccine safety and effectiveness as well as low or mistrust of the systems that recommend or provide the vaccine can determine vaccine hesitancy. Furthermore, the model suggests that hesitancy may occur when parents do not value or perceive a need for vaccination (complacency) or when the vaccine is not accessible and available (convenience).

Study limitations

This study has several limitations. First, the non-probabilistic samples enrolled in the current study could restrict the generalizability of the findings. Although the sample enrolled in the current study was large, convenience sampling may underrepresent certain population groups. Because these data were gathered using an online survey, findings may not generalize to those without access to electronic devices or the internet.

Findings’ implications

This study supports broad use of this scale to evaluate parental vaccine hesitancy as part of an effort to understand and counteract resistance to adoption of vaccines in the general population. Applying this scale can provide valuable information for public health authorities to manage vaccine hesitancy among parents. The study indicated that women, those active on social media, and more affluent parents are more likely to resist having their children vaccinated, which can guide public health authorities in designing information campaigns to counteract these troubling trends. Healthcare providers can use this information to tailor their communication strategies to address the specific concerns of parents and increase vaccine uptake. Social media can play like a double-edged sword in parental vaccine hesitancy. Consequently, health policymakers are expected to do their best to provide authentic and accurate content that presents explicit information in the right way to the right audience.

Parental vaccine hesitancy is prevalent globally and associated with several individual and contextual factors. It is estimated that vaccine hesitancy will become a major burden on public health worldwide. Without validated instruments in specific countries and contexts, it is not possible to conduct reliable and valid research to investigate the factors and determinants of parental vaccine hesitancy. The present study validated the Vaccine Hesitancy Scale (VHS) among parents in Australia, China, Iran, and Turkey during the COVID-19 outbreak. Acceptable psychometric evidence was found for the 9-item two-factor VHS using data from parents in four countries. Findings from this study have implications for future research examining vaccine hesitancy and vaccine-preventable diseases and community health nurses. Further studies are needed to test the scale’s validity and reliability across additional cultural contexts.

Data availability

The data used to support the finding of this study are available from the corresponding author upon reasonable request.

Excler J-L, Saville M, Berkley S, Kim JH. Vaccine development for emerging infectious diseases. Nat Med. 2021;27(4):591–600.

Article   CAS   PubMed   Google Scholar  

Depar U. www.cdc.gov/coronavirus/vaccines. 2021.

Watson OJ, Barnsley G, Toor J, Hogan AB, Winskill P, Ghani AC. Global impact of the first year of COVID-19 vaccination: a mathematical modelling study. Lancet Infect Dis. 2022;22(9):1293–302.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Wang Q, Xiu S, Yang L, Han Y, Cui T, Shi N et al. Validation of the World Health Organization’s parental vaccine hesitancy scale in China using child vaccination data. 2022:1–7.

MacDonald NE. Vaccine hesitancy: definition, scope and determinants. J Vaccine. 2015;33(34):4161–4.

Article   Google Scholar  

Larson HJ, Jarrett C, Eckersberger E, Smith DMD, Paterson P. Understanding vaccine hesitancy around vaccines and vaccination from a global perspective: a systematic review of published literature, 2007–2012. Vaccine. 2014;32(19):2150–9.

Article   PubMed   Google Scholar  

Loomba S, de Figueiredo A, Piatek SJ, de Graaf K, Larson HJ. Measuring the impact of COVID-19 vaccine misinformation on vaccination intent in the UK and USA. Nat Hum Behav. 2021;5(3):337–48.

Ahmed W, Vidal-Alaball J, Downing J, López Seguí F. COVID-19 and the 5G conspiracy theory: Social Network Analysis of Twitter Data. J Med Internet Res. 2020;22(5):e19458.

Article   PubMed   PubMed Central   Google Scholar  

Trent M, Seale H, Chughtai AA, Salmon D, MacIntyre CR. Trust in government, intention to vaccinate and COVID-19 vaccine hesitancy: a comparative survey of five large cities in the United States, United Kingdom, and Australia. Vaccine. 2022;40(17):2498–505.

Nguyen KH, Srivastav A, Lindley MC, Fisher A, Kim D, Greby SM, et al. Parental Vaccine Hesitancy and Association with Childhood Diphtheria, Tetanus Toxoid, and Acellular Pertussis; Measles, Mumps, and Rubella; Rotavirus; and combined 7-Series vaccination. J Am J Prev Med. 2022;62(3):367–76.

Mbaeyi S, Cohn A, Messonnier N. A call to action: strengthening vaccine confidence in the United States. J Pediatr. 2020;145(6).

Nguyen KH, Srivastav A, Lindley MC, Fisher A, Kim D, Greby SM et al. Parental Vaccine Hesitancy and Association with Childhood Diphtheria, Tetanus Toxoid, and Acellular Pertussis; Measles, Mumps, and Rubella; Rotavirus; and combined 7-Series vaccination. 2022;62(3):367–76.

Vrdelja M, Kraigher A, Verčič D, Kropivnik S. The growing vaccine hesitancy: exploring the influence of the internet. Eur J Pub Health. 2018;28(5):934–9.

Sharif Nia H, Allen K-A, Arslan G, Kaur H, She L, Khoshnavay Fomani F et al. The predictive role of parental attitudes toward COVID-19 vaccines and child vulnerability: a multi-country study on the relationship between parental vaccine hesitancy and financial well-being. Front Public Health. 2023;11.

Bramer CA, Kimmins LM, Swanson R, Kuo J, Vranesich P, Jacques-Carroll LA, et al. Decline in child vaccination coverage during the COVID-19 pandemic — Michigan Care Improvement Registry, May 2016-May 2020. Am J Transplant. 2020;20(7):1930–1.

Seyed Alinaghi S, Karimi A, Mojdeganlou H, Alilou S, Mirghaderi SP, Noori T, et al. Impact of COVID-19 pandemic on routine vaccination coverage of children and adolescents: a systematic review. Health Sci Rep. 2022;5(2):e00516.

Schuster M, Eskola J, Duclos PJV. Review of vaccine hesitancy: Rationale, remit and methods. 2015;33(34):4157–60.

Maiman LA, Becker MH. The Health Belief Model: origins and correlates in Psychological Theory. Health Educ Monogr. 1974;2(4):336–53.

Rippetoe PA, Rogers RW. Effects of components of protection-motivation theory on adaptive and maladaptive coping with a health threat. J Personal Soc Psychol. 1987;52(3):596.

Article   CAS   Google Scholar  

Gilkey MB, Magnus BE, Reiter PL, McRee A-L, Dempsey AF, Brewer NT. The vaccination confidence scale: a brief measure of parents’ vaccination beliefs. Vaccine. 2014;32(47):6259–65.

Gilkey MB, Reiter PL, Magnus BE, McRee A-L, Dempsey AF, Brewer NT. Validation of the vaccination confidence scale: a brief measure to identify parents at risk for refusing adolescent vaccines. Acad Pediatr. 2016;16(1):42–9.

Opel DJ, Mangione-Smith R, Taylor JA, Korfiatis C, Wiese C, Catz S, et al. Development of a survey to identify vaccine-hesitant parents. Hum Vaccines. 2011;7(4):419–25.

Shapiro GK, Holding A, Perez S, Amsel R, Rosberger Z. Validation of the vaccine conspiracy beliefs scale. Papillomavirus Res. 2016;2:167–72.

Larson HJ, Jarrett C, Schulz WS, Chaudhuri M, Zhou Y, Dube E, et al. Measuring vaccine hesitancy: the development of a survey tool. Vaccine. 2015;33(34):4165–75.

Shapiro GK, Tatar O, Dube E, Amsel R, Knauper B, Naz A, et al. The vaccine hesitancy scale: psychometric properties and validation. Vaccine. 2018;36(5):660–7.

Domek GJ, O’Leary ST, Bull S, Bronsert M, Contreras-Roldan IL, Bolaños Ventura GA, et al. Measuring vaccine hesitancy: Field testing the WHO SAGE Working Group on Vaccine Hesitancy survey tool in Guatemala. Vaccine. 2018;36(35):5273–81.

Assunção H, Lin S-W, Sit P-S, Cheung K-C, Harju-Luukkainen H, Smith T, et al. University Student Engagement Inventory (USEI): Transcultural Validity evidence across four continents. Front Psychol. 2020;10:2796.

Beaton DE, Bombardier C, Guillemin F, Ferraz MB. Guidelines for the process of cross-cultural adaptation of self-report measures. Spine. 2000;25(24).

Finney SJ, DiStefano C. Non-normal and categorical data in structural equation modeling. Struct Equation Modeling: Second Course. 2006;10(6):269–314.

Google Scholar  

Watkins MW. Exploratory factor analysis: a guide to best practice. J Black Psychol. 2018;44(3):219–46.

Marôco J. Análise de equações estruturais: Fundamentos teóricos, software & aplicações. ReportNumber, Lda; 2010.

She L, Ma L, Khoshnavay Fomani F. The consideration of Future consequences Scale among Malaysian young adults: a psychometric evaluation. Front Psychol. 2021;12.

Sharif Nia H, She L, Rasiah R, Pahlevan Sharif S, Hosseini L. Psychometrics of Persian Version of the Ageism Survey among an Iranian older Adult Population during COVID-19 pandemic. Front Public Health. 2021:1689.

Fornell C, Larcker DF. Evaluating Structural equation models with unobservable variables and measurement error. J Mark Res. 1981;18(1):39–50.

Henseler J, Ringle CM, Sarstedt M. A new criterion for assessing discriminant validity in variance-based structural equation modeling. J Acad Mark Sci. 2015;43(1):115–35.

Mayers A. Introduction to statistics and SPSS in psychology. Pearson Higher Ed; 2013.

Maroco J, Maroco AL, Campos JADB. Student’s academic efficacy or inefficacy? An example on how to evaluate the psychometric properties of a measuring instrument and evaluate the effects of item wording. Open Journal of Statistics. 2014;2014.

She L, Pahlevan Sharif S, Sharif Nia H. Psychometric evaluation of the Chinese Version of the modified online compulsive buying scale among Chinese young consumers. J Asia-Pac Bus. 2021;22(2):121–33.

Cheung GW, Rensvold RB. Evaluating goodness-of-fit indexes for Testing Measurement Invariance. Struct Equation Modeling: Multidisciplinary J. 2002;9(2):233–55.

Rutkowski L, Svetina D. Assessing the hypothesis of measurement invariance in the context of large-scale international surveys. Educ Psychol Meas. 2014;74(1):31–57.

Horiuchi S, Sakamoto H, Abe SK, Shinohara R, Kushima M, Otawa S, et al. Factors of parental COVID-19 vaccine hesitancy: a cross sectional study in Japan. PLoS ONE. 2021;16(12):e0261121.

Walker KK, Head KJ, Owens H, Zimet GD. A qualitative study exploring the relationship between mothers’ vaccine hesitancy and health beliefs with COVID-19 vaccination intention and prevention during the early pandemic months. Hum Vaccines Immunotherapeutics. 2021;17(10):3355–64.

Schuster L, Gurrieri L, Dootson P. Emotions of burden, intensive mothering and COVID-19 vaccine hesitancy. Crit Public Health. 2022:1–12.

Zheng M, Zhong W, Chen X, Wang N, Liu Y, Zhang Q, et al. Factors influencing parents’ willingness to vaccinate their preschool children against COVID-19: results from the mixed-method study in China. Human Vaccines & Immunotherapeutics; 2022. p. 2090776.

Huang Y, Su X, Xiao W, Wang H, Si M, Wang W, et al. COVID-19 vaccine hesitancy among different population groups in China: a national multicenter online survey. BMC Infect Dis. 2022;22(1):153.

Facciolà A, Visalli G, Orlando A, Bertuccio MP, Spataro P, Squeri R et al. Vaccine Hesitancy: An Overview on Parents’ Opinions about Vaccination and Possible Reasons of Vaccine Refusal. Journal of Public Health Research. 2019;8(1):jphr.2019.1436.

Dubé E, Laberge C, Guay M, Bramadat P, Roy R, Bettinger J. Vaccine hesitancy: an overview. Hum Vaccin Immunother. 2013;9(8):1763–73.

Simas C, Larson HJ. Overcoming vaccine hesitancy in low-income and middle-income regions. Nat Reviews Disease Primers. 2021;7(1):41.

Swaney SE, Burns S. Exploring reasons for vaccine-hesitancy among higher-SES parents in Perth, Western Australia. Health Promotion J Australia. 2019;30(2):143–52.

Ceron W, de-Lima-Santos M-F, Quiles MG. Fake news agenda in the era of COVID-19: identifying trends through fact-checking content. Online Social Networks Media. 2021;21:100116.

Nikčević AV, Spada MM. The COVID-19 anxiety syndrome scale: development and psychometric properties. Psychiatry Res. 2020;292:113322.

Sharif Nia H, She L, Kaur H, Boyle C, Khoshnavay Fomani F, Hoseinzadeh E, et al. A predictive study between anxiety and fear of COVID-19 with psychological behavior response: the mediation role of perceived stress. Front Psychiatry. 2022;13:851212.

MacDonald NE. Vaccine hesitancy: definition, scope and determinants. Vaccine. 2015;33(34):4161–4.

Download references

Author information

Authors and affiliations.

Psychosomatic Research Center, Mazandaran University of Medical Sciences, Sari, Iran

Hamid Sharif-Nia

Department of Nursing, Amol School of Nursing and Midwifery, Mazandaran University of Medical Sciences, Sari, Iran

Sunway Business School, Sunway University, Sunway City, Malaysia

School of Educational Psychology and Counselling, Faculty of Education, Monash University, Clayton, Australia

Kelly-Ann Allen

William James Centre for Research ISPA – Instituto Universitário, Lisboa, Portugal

João Marôco

Business School, Taylor’s University Lakeside Campus, Subang Jaya, Malaysia

Harpaljit Kaur

Department of Psychological Counseling, Burdur Mehmet Akif Ersoy University, Burdur, Turkey

Gökmen Arslan

Department of Biostatistics and Medical Informatics, Faculty of Medicine, Kırşehir Ahi Evran University, Kırşehir, Turkey

Ozkan Gorgulu

Department of Statistics, Miami University, Oxford, OH, USA

Jason W. Osborne

School of Nursing, Alborz University of Medical Sciences, Karaj, Iran

Pardis Rahmatpour

Nursing and Midwifery Care Research Center, School of Nursing and Midwifery, Tehran University of Medical Sciences, Tehran, Iran

Fatemeh Khoshnavay Fomani

Centre for Wellbeing Science, Faculty of Education, University of Melbourne, Parkville, Australia

You can also search for this author in PubMed   Google Scholar

Contributions

Study conception and design: HS, K-AA and FK. Data collection: FK, HS, K-AA, LS, and OG. Analysis and interpretation of results: GA, JM, JO, LS, and HS. Draft manuscript preparation and or substantively revised it: FK, HK, K-AA, HS, and JO. All authors reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to Fatemeh Khoshnavay Fomani .

Ethics declarations

Consent for publication.

Not Applicable.

Competing interests

The authors declare no competing interests.

Conflict of interest

The authors report no conflicts of interest in this work.

Ethics statement

The protocol of this study was approved by the Mazandaran University of Medical Sciences Research Ethics Committee (IR.MAZUMS.REC.1400.13876). Informed consent was obtained from all participants. Informed consent was obtained from all participants. All methods were carried out in accordance with relevant guidelines and regulation under the Ethics approval and consent to participate.

Acknowledgements

We thank all the participants who took part in the study.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Sharif-Nia, H., She, L., Allen, KA. et al. Parental hesitancy toward children vaccination: a multi-country psychometric and predictive study. BMC Public Health 24 , 1348 (2024). https://doi.org/10.1186/s12889-024-18806-1

Download citation

Received : 23 October 2023

Accepted : 09 May 2024

Published : 18 May 2024

DOI : https://doi.org/10.1186/s12889-024-18806-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Vaccine hesitancy
  • Psychometric

BMC Public Health

ISSN: 1471-2458

validity and reliability in research questionnaire

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Family Med Prim Care
  • v.4(3); Jul-Sep 2015

Validity, reliability, and generalizability in qualitative research

Lawrence leung.

1 Department of Family Medicine, Queen's University, Kingston, Ontario, Canada

2 Centre of Studies in Primary Care, Queen's University, Kingston, Ontario, Canada

In general practice, qualitative research contributes as significantly as quantitative research, in particular regarding psycho-social aspects of patient-care, health services provision, policy setting, and health administrations. In contrast to quantitative research, qualitative research as a whole has been constantly critiqued, if not disparaged, by the lack of consensus for assessing its quality and robustness. This article illustrates with five published studies how qualitative research can impact and reshape the discipline of primary care, spiraling out from clinic-based health screening to community-based disease monitoring, evaluation of out-of-hours triage services to provincial psychiatric care pathways model and finally, national legislation of core measures for children's healthcare insurance. Fundamental concepts of validity, reliability, and generalizability as applicable to qualitative research are then addressed with an update on the current views and controversies.

Nature of Qualitative Research versus Quantitative Research

The essence of qualitative research is to make sense of and recognize patterns among words in order to build up a meaningful picture without compromising its richness and dimensionality. Like quantitative research, the qualitative research aims to seek answers for questions of “how, where, when who and why” with a perspective to build a theory or refute an existing theory. Unlike quantitative research which deals primarily with numerical data and their statistical interpretations under a reductionist, logical and strictly objective paradigm, qualitative research handles nonnumerical information and their phenomenological interpretation, which inextricably tie in with human senses and subjectivity. While human emotions and perspectives from both subjects and researchers are considered undesirable biases confounding results in quantitative research, the same elements are considered essential and inevitable, if not treasurable, in qualitative research as they invariable add extra dimensions and colors to enrich the corpus of findings. However, the issue of subjectivity and contextual ramifications has fueled incessant controversies regarding yardsticks for quality and trustworthiness of qualitative research results for healthcare.

Impact of Qualitative Research upon Primary Care

In many ways, qualitative research contributes significantly, if not more so than quantitative research, to the field of primary care at various levels. Five qualitative studies are chosen to illustrate how various methodologies of qualitative research helped in advancing primary healthcare, from novel monitoring of chronic obstructive pulmonary disease (COPD) via mobile-health technology,[ 1 ] informed decision for colorectal cancer screening,[ 2 ] triaging out-of-hours GP services,[ 3 ] evaluating care pathways for community psychiatry[ 4 ] and finally prioritization of healthcare initiatives for legislation purposes at national levels.[ 5 ] With the recent advances of information technology and mobile connecting device, self-monitoring and management of chronic diseases via tele-health technology may seem beneficial to both the patient and healthcare provider. Recruiting COPD patients who were given tele-health devices that monitored lung functions, Williams et al. [ 1 ] conducted phone interviews and analyzed their transcripts via a grounded theory approach, identified themes which enabled them to conclude that such mobile-health setup and application helped to engage patients with better adherence to treatment and overall improvement in mood. Such positive findings were in contrast to previous studies, which opined that elderly patients were often challenged by operating computer tablets,[ 6 ] or, conversing with the tele-health software.[ 7 ] To explore the content of recommendations for colorectal cancer screening given out by family physicians, Wackerbarth, et al. [ 2 ] conducted semi-structure interviews with subsequent content analysis and found that most physicians delivered information to enrich patient knowledge with little regard to patients’ true understanding, ideas, and preferences in the matter. These findings suggested room for improvement for family physicians to better engage their patients in recommending preventative care. Faced with various models of out-of-hours triage services for GP consultations, Egbunike et al. [ 3 ] conducted thematic analysis on semi-structured telephone interviews with patients and doctors in various urban, rural and mixed settings. They found that the efficiency of triage services remained a prime concern from both users and providers, among issues of access to doctors and unfulfilled/mismatched expectations from users, which could arouse dissatisfaction and legal implications. In UK, a care pathways model for community psychiatry had been introduced but its benefits were unclear. Khandaker et al. [ 4 ] hence conducted a qualitative study using semi-structure interviews with medical staff and other stakeholders; adopting a grounded-theory approach, major themes emerged which included improved equality of access, more focused logistics, increased work throughput and better accountability for community psychiatry provided under the care pathway model. Finally, at the US national level, Mangione-Smith et al. [ 5 ] employed a modified Delphi method to gather consensus from a panel of nominators which were recognized experts and stakeholders in their disciplines, and identified a core set of quality measures for children's healthcare under the Medicaid and Children's Health Insurance Program. These core measures were made transparent for public opinion and later passed on for full legislation, hence illustrating the impact of qualitative research upon social welfare and policy improvement.

Overall Criteria for Quality in Qualitative Research

Given the diverse genera and forms of qualitative research, there is no consensus for assessing any piece of qualitative research work. Various approaches have been suggested, the two leading schools of thoughts being the school of Dixon-Woods et al. [ 8 ] which emphasizes on methodology, and that of Lincoln et al. [ 9 ] which stresses the rigor of interpretation of results. By identifying commonalities of qualitative research, Dixon-Woods produced a checklist of questions for assessing clarity and appropriateness of the research question; the description and appropriateness for sampling, data collection and data analysis; levels of support and evidence for claims; coherence between data, interpretation and conclusions, and finally level of contribution of the paper. These criteria foster the 10 questions for the Critical Appraisal Skills Program checklist for qualitative studies.[ 10 ] However, these methodology-weighted criteria may not do justice to qualitative studies that differ in epistemological and philosophical paradigms,[ 11 , 12 ] one classic example will be positivistic versus interpretivistic.[ 13 ] Equally, without a robust methodological layout, rigorous interpretation of results advocated by Lincoln et al. [ 9 ] will not be good either. Meyrick[ 14 ] argued from a different angle and proposed fulfillment of the dual core criteria of “transparency” and “systematicity” for good quality qualitative research. In brief, every step of the research logistics (from theory formation, design of study, sampling, data acquisition and analysis to results and conclusions) has to be validated if it is transparent or systematic enough. In this manner, both the research process and results can be assured of high rigor and robustness.[ 14 ] Finally, Kitto et al. [ 15 ] epitomized six criteria for assessing overall quality of qualitative research: (i) Clarification and justification, (ii) procedural rigor, (iii) sample representativeness, (iv) interpretative rigor, (v) reflexive and evaluative rigor and (vi) transferability/generalizability, which also double as evaluative landmarks for manuscript review to the Medical Journal of Australia. Same for quantitative research, quality for qualitative research can be assessed in terms of validity, reliability, and generalizability.

Validity in qualitative research means “appropriateness” of the tools, processes, and data. Whether the research question is valid for the desired outcome, the choice of methodology is appropriate for answering the research question, the design is valid for the methodology, the sampling and data analysis is appropriate, and finally the results and conclusions are valid for the sample and context. In assessing validity of qualitative research, the challenge can start from the ontology and epistemology of the issue being studied, e.g. the concept of “individual” is seen differently between humanistic and positive psychologists due to differing philosophical perspectives:[ 16 ] Where humanistic psychologists believe “individual” is a product of existential awareness and social interaction, positive psychologists think the “individual” exists side-by-side with formation of any human being. Set off in different pathways, qualitative research regarding the individual's wellbeing will be concluded with varying validity. Choice of methodology must enable detection of findings/phenomena in the appropriate context for it to be valid, with due regard to culturally and contextually variable. For sampling, procedures and methods must be appropriate for the research paradigm and be distinctive between systematic,[ 17 ] purposeful[ 18 ] or theoretical (adaptive) sampling[ 19 , 20 ] where the systematic sampling has no a priori theory, purposeful sampling often has a certain aim or framework and theoretical sampling is molded by the ongoing process of data collection and theory in evolution. For data extraction and analysis, several methods were adopted to enhance validity, including 1 st tier triangulation (of researchers) and 2 nd tier triangulation (of resources and theories),[ 17 , 21 ] well-documented audit trail of materials and processes,[ 22 , 23 , 24 ] multidimensional analysis as concept- or case-orientated[ 25 , 26 ] and respondent verification.[ 21 , 27 ]

Reliability

In quantitative research, reliability refers to exact replicability of the processes and the results. In qualitative research with diverse paradigms, such definition of reliability is challenging and epistemologically counter-intuitive. Hence, the essence of reliability for qualitative research lies with consistency.[ 24 , 28 ] A margin of variability for results is tolerated in qualitative research provided the methodology and epistemological logistics consistently yield data that are ontologically similar but may differ in richness and ambience within similar dimensions. Silverman[ 29 ] proposed five approaches in enhancing the reliability of process and results: Refutational analysis, constant data comparison, comprehensive data use, inclusive of the deviant case and use of tables. As data were extracted from the original sources, researchers must verify their accuracy in terms of form and context with constant comparison,[ 27 ] either alone or with peers (a form of triangulation).[ 30 ] The scope and analysis of data included should be as comprehensive and inclusive with reference to quantitative aspects if possible.[ 30 ] Adopting the Popperian dictum of falsifiability as essence of truth and science, attempted to refute the qualitative data and analytes should be performed to assess reliability.[ 31 ]

Generalizability

Most qualitative research studies, if not all, are meant to study a specific issue or phenomenon in a certain population or ethnic group, of a focused locality in a particular context, hence generalizability of qualitative research findings is usually not an expected attribute. However, with rising trend of knowledge synthesis from qualitative research via meta-synthesis, meta-narrative or meta-ethnography, evaluation of generalizability becomes pertinent. A pragmatic approach to assessing generalizability for qualitative studies is to adopt same criteria for validity: That is, use of systematic sampling, triangulation and constant comparison, proper audit and documentation, and multi-dimensional theory.[ 17 ] However, some researchers espouse the approach of analytical generalization[ 32 ] where one judges the extent to which the findings in one study can be generalized to another under similar theoretical, and the proximal similarity model, where generalizability of one study to another is judged by similarities between the time, place, people and other social contexts.[ 33 ] Thus said, Zimmer[ 34 ] questioned the suitability of meta-synthesis in view of the basic tenets of grounded theory,[ 35 ] phenomenology[ 36 ] and ethnography.[ 37 ] He concluded that any valid meta-synthesis must retain the other two goals of theory development and higher-level abstraction while in search of generalizability, and must be executed as a third level interpretation using Gadamer's concepts of the hermeneutic circle,[ 38 , 39 ] dialogic process[ 38 ] and fusion of horizons.[ 39 ] Finally, Toye et al. [ 40 ] reported the practicality of using “conceptual clarity” and “interpretative rigor” as intuitive criteria for assessing quality in meta-ethnography, which somehow echoed Rolfe's controversial aesthetic theory of research reports.[ 41 ]

Food for Thought

Despite various measures to enhance or ensure quality of qualitative studies, some researchers opined from a purist ontological and epistemological angle that qualitative research is not a unified, but ipso facto diverse field,[ 8 ] hence any attempt to synthesize or appraise different studies under one system is impossible and conceptually wrong. Barbour argued from a philosophical angle that these special measures or “technical fixes” (like purposive sampling, multiple-coding, triangulation, and respondent validation) can never confer the rigor as conceived.[ 11 ] In extremis, Rolfe et al. opined from the field of nursing research, that any set of formal criteria used to judge the quality of qualitative research are futile and without validity, and suggested that any qualitative report should be judged by the form it is written (aesthetic) and not by the contents (epistemic).[ 41 ] Rolfe's novel view is rebutted by Porter,[ 42 ] who argued via logical premises that two of Rolfe's fundamental statements were flawed: (i) “The content of research report is determined by their forms” may not be a fact, and (ii) that research appraisal being “subject to individual judgment based on insight and experience” will mean those without sufficient experience of performing research will be unable to judge adequately – hence an elitist's principle. From a realism standpoint, Porter then proposes multiple and open approaches for validity in qualitative research that incorporate parallel perspectives[ 43 , 44 ] and diversification of meanings.[ 44 ] Any work of qualitative research, when read by the readers, is always a two-way interactive process, such that validity and quality has to be judged by the receiving end too and not by the researcher end alone.

In summary, the three gold criteria of validity, reliability and generalizability apply in principle to assess quality for both quantitative and qualitative research, what differs will be the nature and type of processes that ontologically and epistemologically distinguish between the two.

Source of Support: Nil.

Conflict of Interest: None declared.

IMAGES

  1. Reliability vs. Validity in Research

    validity and reliability in research questionnaire

  2. Validity vs reliability as data research quality evaluation outline

    validity and reliability in research questionnaire

  3. [PDF] Validity and Reliability of the Research Instrument; How to Test

    validity and reliability in research questionnaire

  4. How to establish the validity and reliability of qualitative research?

    validity and reliability in research questionnaire

  5. Examples of reliability and validity in research

    validity and reliability in research questionnaire

  6. Reliability Vs. Validity in Research Methodology

    validity and reliability in research questionnaire

VIDEO

  1. Validity and Reliability in Research

  2. Validity, Reliability, and Scoring

  3. Reliability & Validity in Research Studies

  4. C11 Validity & Reliability (Part 5)

  5. Kuder-Richardson 20 (KR-20): Reliability Testing

  6. Concept of Reliability and Validity

COMMENTS

  1. Reliability vs. Validity in Research

    Reliability is about the consistency of a measure, and validity is about the accuracy of a measure.opt. It's important to consider reliability and validity when you are creating your research design, planning your methods, and writing up your results, especially in quantitative research. Failing to do so can lead to several types of research ...

  2. Reliability vs Validity in Research

    Revised on 10 October 2022. Reliability and validity are concepts used to evaluate the quality of research. They indicate how well a method, technique, or test measures something. Reliability is about the consistency of a measure, and validity is about the accuracy of a measure. It's important to consider reliability and validity when you are ...

  3. Validity and Reliability of the Research Instrument; How to Test the

    In this paper, validity and reliability of questionnaire/survey as a significant research instrument tool were reviewed. Various types of validi ty were discussed with the goal of validity ...

  4. Designing and validating a research questionnaire

    Face validity is a subjective assessment of factors such as the relevance, formatting, readability, clarity, and appropriateness of the questionnaire for the intended audience. Face validity can be determined by nonexperts, but is an important component when a questionnaire is first being developed.

  5. Designing and validating a research questionnaire

    Research questionnaires may be self-administered (by the research participant) or researcher administered. ... In addition to validity and reliability, pilot testing provides information on the time taken to complete the questionnaire and whether any questions are confusing or misleading and need to be rephrased. Validity indicates that the ...

  6. Survey Reliability: Models, Methods, and Findings

    SQP is an automated system that predicts reliability, validity, method effects, and total quality of a question ... (1991), " Response-Time Measurement in Survey Research: A Method for CATI and a New Look at Non-Attitudes," Public Opinion Quarterly, 55, 331-346. [Google Scholar]

  7. Validity & Reliability In Research

    What Is Reliability? As with validity, reliability is an attribute of a measurement instrument - for example, a survey, a weight scale or even a blood pressure monitor. But while validity is concerned with whether the instrument is measuring the "thing" it's supposed to be measuring, reliability is concerned with consistency and stability.In other words, reliability reflects the degree ...

  8. Validity and Reliability of the Research Instrument; How to Test the

    The main objective of questionnaire in research is to obtain relevant information in most reliable and valid manner. Thus the accuracy and consistency of survey/questionnaire forms a significant aspect of research methodology which are known as validity and reliability.

  9. What Are Survey Validity and Reliability?

    Statistical validity is an assessment of how well the numbers in a study support the claims being made. Suppose a survey says 25% of people believe the Earth is flat. An assessment of statistical validity asks whether that 25% is based on a sample of 12 or 12,000. There is no one way to evaluate claims of statistical validity.

  10. Validity and Reliability of the Research Instrument; How to Test the

    are known as validity and reliability. Often new researchers are confused with selection and conducting of proper validity type to test their research instrument (questionnaire/survey). This review article explores and describes the validity and reliability of a questionnaire/survey and also discusses various forms of validity and reliability ...

  11. Validating a Questionnaire

    Generally speaking the first step in validating a survey is to establish face validity. There are two important steps in this process. First is to have experts or people who understand your topic read through your questionnaire. They should evaluate whether the questions effectively capture the topic under investigation.

  12. PDF Establishing survey validity: A practical guide

    Research Methods, Survey Methods, Validity Abstract: What follows is a practical guide for establishing the validity of a survey for research purposes. The motivation for providing this guide is our observation that researchers, not necessarily being survey researchers per se, but wanting to use a survey method, lack a concise resource on ...

  13. Validity and reliability in quantitative studies

    Validity. Validity is defined as the extent to which a concept is accurately measured in a quantitative study. For example, a survey designed to explore depression but which actually measures anxiety would not be considered valid. The second measure of quality in a quantitative study is reliability, or the accuracy of an instrument.In other words, the extent to which a research instrument ...

  14. Reliability and Validity

    Reliability refers to the consistency of the measurement. Reliability shows how trustworthy is the score of the test. If the collected data shows the same results after being tested using various methods and sample groups, the information is reliable. If your method has reliability, the results will be valid. Example: If you weigh yourself on a ...

  15. Principles and Methods of Validity and Reliability Testing o ...

    FACE VALIDITY. Some authors [7 13] are of the opinion that face validity is a component of content validity while others believe it is not.[2 14 15] Face validity is established when an individual (and or researcher) who is an expert on the research subject reviewing the questionnaire (instrument) concludes that it measures the characteristic or trait of interest.

  16. A Step-by-step Guide to Questionnaire Validation Research

    A Step-by-step Guide to Questionnaire Validation Research. September 2022. DOI: 10.5281/zenodo.6801209. Publisher: Institute for Clinical Research, NIH MY. ISBN: ISBN: 9781005256180. Authors ...

  17. Guidelines for developing, translating, and validating a questionnaire

    The construct validity of a questionnaire can be evaluated by estimating its association with other variables (or measures of a construct) with which it should be correlated positively, negatively, or not at all. In practice, the questionnaire of interest, as well as the preexisting instruments that measure similar and dissimilar constructs, is ...

  18. Validity and reliability of the Persian version of food preferences

    On the other hand, online electronic link to the questionnaires along with the explanation of the purpose of the research, guides on how to complete the questionnaire and consent to participate in ...

  19. (PDF) Establishing survey validity: A practical guide

    or survey item (and this applies to interview items as well) has validity if the reader of the item understands the item as intended by the item's creator. As stated in the 2018 Palgrave

  20. Cross-Cultural Adaptation, Validity, Reliability and Clinical

    Cross-Cultural Adaptation, Validity, Reliability and Clinical Applicability of the Michigan Hand Outcomes Questionnaire, and its Brief Version, to Canadian French ... Abundant high-quality research exists for a portion of the hand therapy scope of practice; however, more evidence is needed for complex diagnoses as well as for behavioral and ...

  21. Psychometric properties and criterion related validity of the Norwegian

    Background Several studies have been conducted with the 1.0 version of the Hospital Survey on Patient Safety Culture (HSOPSC) in Norway and globally. The 2.0 version has not been translated and tested in Norwegian hospital settings. This study aims to 1) assess the psychometrics of the Norwegian version (N-HSOPSC 2.0), and 2) assess the criterion validity of the N-HSOPSC 2.0, adding two more ...

  22. Practical Guidelines to Develop and Evaluate a Questionnaire

    The process of converting the original questionnaire to the targeted language and then back to the original language is known as forward and backward translation. The subsequent steps such as expert panel group, pilot testing, reliability, and validity for translating a questionnaire remain the same as in constructing a new scale.

  23. (PDF) Validity and Reliability in Quantitative Research

    Abstract and Figures. The validity and reliability of the scales used in research are important factors that enable the research to yield healthy results. For this reason, it is useful to ...

  24. The short Thai version of functional outcomes of sleep questionnaire

    Purpose The study is to evaluate reliability and validity of the short Thai version of Functional Outcome of Sleep Questionnaire (FOSQ-10T), in patients with sleep disordered breathing (SDB). Methods Inclusion criteria were Thai patients with SDB age ≥ 18 years old who had polysomnography results available. Exclusion criteria were patients unable to complete questionnaire for any reason ...

  25. Sustainability

    The research instrument met the reliability, validity and internal consistency criteria required. Each variable achieved a high effect size, f2, with a value of co-efficient of determination of more than the threshold value of 70%. Thus, this supported the fitness criterion of the SEM-based measurement model. ... The survey questionnaire was ...

  26. Parental hesitancy toward children vaccination: a multi-country

    The present research marks one of the first studies to evaluate vaccine hesitancy in multiple countries that demonstrated VHS validity and reliability. Findings from this study have implications for future research examining vaccine hesitancy and vaccine-preventable diseases and community health nurses. ... questionnaire-based research design ...

  27. Validity, reliability, and generalizability in qualitative research

    Fundamental concepts of validity, reliability, and generalizability as applicable to qualitative research are then addressed with an update on the current views and controversies. Keywords: Controversies ... The logic of qualitative survey research and its position in the field of social research methods. Forum Qual Soc Res. 2010; 11:2 ...

  28. Turkish validity and reliability study of the adolescent asthma self

    A total of 110 asthmatic children (aged 7-15) and 129 parents (with asthmatic children aged 3-15) responded to a mail-out survey. Evidence for reliability (0.75-0.87) and validity was obtained for ...