Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Reliability vs Validity in Research | Differences, Types & Examples

Reliability vs Validity in Research | Differences, Types & Examples

Published on 3 May 2022 by Fiona Middleton . Revised on 10 October 2022.

Reliability and validity are concepts used to evaluate the quality of research. They indicate how well a method , technique, or test measures something. Reliability is about the consistency of a measure, and validity is about the accuracy of a measure.

It’s important to consider reliability and validity when you are creating your research design , planning your methods, and writing up your results, especially in quantitative research .

Table of contents

Understanding reliability vs validity, how are reliability and validity assessed, how to ensure validity and reliability in your research, where to write about reliability and validity in a thesis.

Reliability and validity are closely related, but they mean different things. A measurement can be reliable without being valid. However, if a measurement is valid, it is usually also reliable.

What is reliability?

Reliability refers to how consistently a method measures something. If the same result can be consistently achieved by using the same methods under the same circumstances, the measurement is considered reliable.

What is validity?

Validity refers to how accurately a method measures what it is intended to measure. If research has high validity, that means it produces results that correspond to real properties, characteristics, and variations in the physical or social world.

High reliability is one indicator that a measurement is valid. If a method is not reliable, it probably isn’t valid.

However, reliability on its own is not enough to ensure validity. Even if a test is reliable, it may not accurately reflect the real situation.

Validity is harder to assess than reliability, but it is even more important. To obtain useful results, the methods you use to collect your data must be valid: the research must be measuring what it claims to measure. This ensures that your discussion of the data and the conclusions you draw are also valid.

Prevent plagiarism, run a free check.

Reliability can be estimated by comparing different versions of the same measurement. Validity is harder to assess, but it can be estimated by comparing the results to other relevant data or theory. Methods of estimating reliability and validity are usually split up into different types.

Types of reliability

Different types of reliability can be estimated through various statistical methods.

Types of validity

The validity of a measurement can be estimated based on three main types of evidence. Each type can be evaluated through expert judgement or statistical methods.

To assess the validity of a cause-and-effect relationship, you also need to consider internal validity (the design of the experiment ) and external validity (the generalisability of the results).

The reliability and validity of your results depends on creating a strong research design , choosing appropriate methods and samples, and conducting the research carefully and consistently.

Ensuring validity

If you use scores or ratings to measure variations in something (such as psychological traits, levels of ability, or physical properties), it’s important that your results reflect the real variations as accurately as possible. Validity should be considered in the very earliest stages of your research, when you decide how you will collect your data .

  • Choose appropriate methods of measurement

Ensure that your method and measurement technique are of high quality and targeted to measure exactly what you want to know. They should be thoroughly researched and based on existing knowledge.

For example, to collect data on a personality trait, you could use a standardised questionnaire that is considered reliable and valid. If you develop your own questionnaire, it should be based on established theory or the findings of previous studies, and the questions should be carefully and precisely worded.

  • Use appropriate sampling methods to select your subjects

To produce valid generalisable results, clearly define the population you are researching (e.g., people from a specific age range, geographical location, or profession). Ensure that you have enough participants and that they are representative of the population.

Ensuring reliability

Reliability should be considered throughout the data collection process. When you use a tool or technique to collect data, it’s important that the results are precise, stable, and reproducible.

  • Apply your methods consistently

Plan your method carefully to make sure you carry out the same steps in the same way for each measurement. This is especially important if multiple researchers are involved.

For example, if you are conducting interviews or observations, clearly define how specific behaviours or responses will be counted, and make sure questions are phrased the same way each time.

  • Standardise the conditions of your research

When you collect your data, keep the circumstances as consistent as possible to reduce the influence of external factors that might create variation in the results.

For example, in an experimental setup, make sure all participants are given the same information and tested under the same conditions.

It’s appropriate to discuss reliability and validity in various sections of your thesis or dissertation or research paper. Showing that you have taken them into account in planning your research and interpreting the results makes your work more credible and trustworthy.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Middleton, F. (2022, October 10). Reliability vs Validity in Research | Differences, Types & Examples. Scribbr. Retrieved 14 May 2024, from https://www.scribbr.co.uk/research-methods/reliability-or-validity/

Is this article helpful?

Fiona Middleton

Fiona Middleton

Other students also liked, the 4 types of validity | types, definitions & examples, a quick guide to experimental design | 5 steps & examples, sampling methods | types, techniques, & examples.

Sago

What We Offer

With a comprehensive suite of qualitative and quantitative capabilities and 55 years of experience in the industry, Sago powers insights through adaptive solutions.

  • Recruitment
  • Communities
  • Methodify® Automated research
  • QualBoard® Digital Discussions
  • QualMeeting® Digital Interviews
  • Global Qualitative
  • Global Quantitative
  • In-Person Facilities
  • Research Consulting
  • Europe Solutions
  • Neuromarketing Tools
  • Trial & Jury Consulting

Who We Serve

Form deeper customer connections and make the process of answering your business questions easier. Sago delivers unparalleled access to the audiences you need through adaptive solutions and a consultative approach.

  • Consumer Packaged Goods
  • Financial Services
  • Media Technology
  • Medical Device Manufacturing
  • Marketing Research

With a 55-year legacy of impact, Sago has proven we have what it takes to be a long-standing industry leader and partner. We continually advance our range of expertise to provide our clients with the highest level of confidence.​

  • Global Offices
  • Partnerships & Certifications
  • News & Media
  • Researcher Events

professional woman looking down at tablet in office at night

Sago Announces Launch of Sago Health to Elevate Healthcare Research

man and woman sitting in front of laptop smiling broadly

Sago Launches AI Video Summaries on QualBoard to Streamline Data Synthesis

Steve Schlesinger, Quirks Lifetime Achievement Award

Sago Executive Chairman Steve Schlesinger to Receive Quirk’s Lifetime Achievement Award

Drop into your new favorite insights rabbit hole and explore content created by the leading minds in market research.

  • Case Studies
  • Knowledge Kit

Female doctor using stethoscope on female patient.

The Hidden Truths of Women’s Health Revealed

Report: Perceptions of Women’s Health

Report: Perceptions of Women’s Health

  • Get in touch

quantitative research validity reliability

  • Account Logins

quantitative research validity reliability

The Significance of Validity and Reliability in Quantitative Research

  • Resources , Blog

clock icon

Key Takeaways:

  • Types of validity to consider during quantitative research include internal, external, construct, and statistical
  • Types of reliability that apply to quantitative research include test re-test, inter-rater, internal consistency, and parallel forms
  • There are numerous challenges to achieving validity and reliability in quantitative research, but the right techniques can help overcome them

Quantitative research is used to investigate and analyze data to draw meaningful conclusions. Validity and reliability are two critical concepts in quantitative analysis that ensure the accuracy and consistency of the research results. Validity refers to the extent to which the research measures what it intends to measure, while reliability refers to the consistency and reproducibility of the research results over time. Ensuring validity and reliability is crucial in conducting high-quality research, as it increases confidence in the findings and conclusions drawn from the data.

This article aims to provide an in-depth analysis of the significance of validity and reliability in quantitative research. It will explore the different types of validity and reliability, their interrelationships, and the associated challenges and limitations.

In this Article:

The role of validity in quantitative research, the role of reliability in quantitative research, validity and reliability: how they differ and interrelate, challenges and limitations of ensuring validity and reliability, overcoming challenges and limitations to achieve validity and reliability, explore trusted quantitative solutions.

Take the guesswork out of your quant research with solutions that put validity and reliability first. Discover Sago’s quantitative solutions.

Request a consultation

Validity is crucial in maintaining the credibility and reliability of quantitative research outcomes. Therefore, it is critical to establish that the variables being measured in a study align with the research objectives and accurately reflect the phenomenon being investigated.

Several types of validity apply to various study designs; let’s take a deeper look at each one below:

Internal validity is concerned with the extent to which a study establishes a causal relationship between the independent and dependent variables. In other words, internal validity determines whether the changes observed in the conditional variable result from changes in the independent variable or some other factor.

External validity refers to the degree to which the findings of a study can be generalized to other populations and contexts. External validity helps ensure the results of a study are not limited to the specific people or context in which the study was conducted.

Construct validity refers to the degree to which a research study accurately measures the theoretical construct it intends to measure. Construct validity helps provide alignment between the study’s measures and the theoretical concept it aims to investigate.

Finally, statistical validity refers to the accuracy of the statistical tests used to analyze the data. Establishing statistical validity provides confidence that the conclusions drawn from the data are reliable and accurate.

To safeguard the validity of a study, researchers must carefully design their research methodology, select appropriate measures, and control for extraneous variables that may impact the results. Validity is especially crucial in fields such as medicine, where inaccurate research findings can have severe consequences for patients and healthcare practices.

Ensuring the consistency and reproducibility of research outcomes over time is crucial in quantitative research, and this is where the concept of reliability comes into play. Reliability is vital to building trust in the research findings and their ability to be replicated in diverse contexts.

Similar to validity, multiple types of reliability are pertinent to different research designs. Let’s take a closer look at each of these types of reliability below:

Test-retest reliability refers to the consistency of the results obtained when the same test is administered to the same group of participants at different times. This type of reliability is essential when researchers need to administer the same test multiple times to assess changes in behavior or attitudes over time.

Inter-rater reliability refers to the results’ consistency when different raters or observers monitor the same behavior or phenomenon. This type of reliability is vital when researchers are required to rely on different individuals to rate or observe the same behavior or phenomenon.

Internal consistency reliability refers to the degree to which the items or questions in a test or questionnaire measure the same construct. This type of reliability is important in studies where researchers use multiple items or questions to assess a particular construct, such as knowledge or quality of life.

Lastly, parallel forms reliability refers to the consistency of the results obtained when two different versions of the same test are administered to the same group of participants. This type of reliability is important when researchers administer different versions of the same test to assess the consistency of the results.

Reliability in research is like the accuracy and consistency of a medical test. Just as a reliable medical test produces consistent and accurate results that physicians can trust to make informed decisions about patient care, a highly reliable study produces consistent and precise findings that researchers can trust to make knowledgeable conclusions about a particular phenomenon. To ensure reliability in a study, researchers must carefully select appropriate measures and establish protocols for administering the measures consistently. They must also take steps to control for extraneous variables that may impact the results.

Validity and reliability are two critical concepts in quantitative research that significantly determine the quality of research studies. While both terms are often used interchangeably, they refer to different aspects of research. Validity is the extent to which a research study measures what it claims to measure without being affected by extraneous factors or bias. In contrast, reliability is the degree to which the research results are consistent and stable over time and across different samples , methods, and evaluators.

Designing a research study that is both valid and reliable is essential for producing high-quality and trustworthy research findings. Finding this balance requires significant expertise, skill, and attention to detail. Ultimately, the goal is to produce research findings that are valid and reliable but also impactful and influential for the organization requesting them. Achieving this level of excellence requires a deep understanding of the nuances and complexities of research methodology and a commitment to excellence and rigor in all aspects of the research process.

Ensuring validity and reliability in quantitative research is not without its challenges. Some of the factors to consider include:

1. Measuring Complex Constructs or Variables One of the main challenges is the difficulty in accurately measuring complex constructs or variables. For instance, measuring constructs such as intelligence or personality can be complicated due to their multi-dimensional nature, and it can be challenging to capture all aspects accurately.

2. Limitations of Data Collection Instruments In addition, the measures or instruments used to collect data can also be limited in their sensitivity or specificity. This can impact the study’s validity and reliability, as accurate and precise measures can lead to incorrect conclusions and unreliable results. For example, a scale that measures depression but does not include all relevant symptoms may not accurately capture the construct being studied.

3. Sources of Error and Bias in Data Collection The data collection process itself can introduce sources of error or bias, which can impact the validity and reliability of the study. For instance, measurement errors can occur due to the limitations of the measuring instrument or human error during data collection. In addition, response bias can arise when participants provide socially desirable answers, while sampling bias can occur when the sample is not representative of the studied population.

4. The Complexity of Achieving Meaningful and Accurate Research Findings There are also some limitations to validity and reliability in research studies. For example, achieving internal validity by controlling for extraneous variables may only sometimes ensure external validity or the ability to generalize findings to other populations or settings. This can be a limitation for researchers who wish to apply their findings to a larger population or different contexts.

Additionally, while reliability is essential for producing consistent and reproducible results, it does not guarantee the accuracy or truth of the findings. This means that even if a study has reliable results, it may still need to be revised in terms of accuracy. These limitations remind us that research is a complex process, and achieving validity and reliability is just one part of the giant puzzle of producing accurate and meaningful research.

Researchers can adopt various measures and techniques to overcome the challenges and limitations in ensuring validity and reliability in research studies.

One such approach is to use multiple measures or instruments to assess the same construct. In addition, various steps can help identify commonalities and differences across measures, thereby providing a more comprehensive understanding of the construct being studied.

Inter-rater reliability checks can also be conducted to ensure different raters or observers consistently interpret and rate the same data. This can reduce measurement errors and improve the reliability of the results. Additionally, data-cleaning techniques can be used to identify and remove any outliers or errors in the data.

Finally, researchers can use appropriate statistical methods to assess the validity and reliability of their measures. For example, factor analysis identifies the underlying factors contributing to the construct being studied, while test-retest reliability helps evaluate the consistency of results over time. By adopting these measures and techniques, researchers can crease t their findings’ overall quality and usefulness.

The backbone of any quantitative research lies in the validity and reliability of the data collected. These factors ensure the data accurately reflects the intended research objectives and is consistent and reproducible. By carefully balancing the interrelationship between validity and reliability and using appropriate techniques to overcome challenges, researchers protect the credibility and impact of their work. This is essential in producing high-quality research that can withstand scrutiny and drive progress.

Are you seeking a reliable and valid way to collect, analyze, and report your quantitative data? Sago’s comprehensive quantitative solutions provide you the peace of mind to conduct research and draw meaningful conclusions.

Don’t Settle for Subpar Results

Work with a trusted quantitative research partner to deliver quantitative research you can count on. Book a consultation with our team to get started.

Guide: Respondent Engagement Playbook

Guide: Respondent Engagement Playbook

young adults happy outside wearing masks

A New Generation Is Born: Meet Gen C

female doctor with female patient in doctor's office

OnDemand: Breaking the Silence on Women’s Health: From Perception to Truth

downtown philadelphia, pennsylvania

The Swing Voter Project Pennsylvania: April 2024

two female medical professionals looking at computer screen

Revolutionizing Healthcare Research: A Q&A With Industry Experts

diverse people at voting booths, american voters

The Deciders April 2024: Michigan Voters in Union Households

woman smiling at laptop

OnDemand: Navigating the Dynamic Future of Qualitative Research

girl wearing medical mask in foreground, two people talking in medical masks in background

How Connecting with Gen C Can Help Your Brand Grow

Quantifying Digital Detox Experiences Across the Globe

Quantifying Digital Detox Experiences Across the Globe

Take a deep dive into your favorite market research topics

quantitative research validity reliability

How can we help support you and your research needs?

quantitative research validity reliability

BEFORE YOU GO

Have you considered how to harness AI in your research process? Check out our on-demand webinar for everything you need to know

quantitative research validity reliability

Reliability and validity: Importance in Medical Research

Affiliations.

  • 1 Al-Nafees Medical College,Isra University, Islamabad, Pakistan.
  • 2 Fauji Foundation Hospital, Foundation University Medical College, Islamabad, Pakistan.
  • PMID: 34974579
  • DOI: 10.47391/JPMA.06-861

Reliability and validity are among the most important and fundamental domains in the assessment of any measuring methodology for data-collection in a good research. Validity is about what an instrument measures and how well it does so, whereas reliability concerns the truthfulness in the data obtained and the degree to which any measuring tool controls random error. The current narrative review was planned to discuss the importance of reliability and validity of data-collection or measurement techniques used in research. It describes and explores comprehensively the reliability and validity of research instruments and also discusses different forms of reliability and validity with concise examples. An attempt has been taken to give a brief literature review regarding the significance of reliability and validity in medical sciences.

Keywords: Validity, Reliability, Medical research, Methodology, Assessment, Research tools..

Publication types

  • Biomedical Research*
  • Reproducibility of Results
  • Search Menu
  • Browse content in Arts and Humanities
  • Browse content in Archaeology
  • Anglo-Saxon and Medieval Archaeology
  • Archaeological Methodology and Techniques
  • Archaeology by Region
  • Archaeology of Religion
  • Archaeology of Trade and Exchange
  • Biblical Archaeology
  • Contemporary and Public Archaeology
  • Environmental Archaeology
  • Historical Archaeology
  • History and Theory of Archaeology
  • Industrial Archaeology
  • Landscape Archaeology
  • Mortuary Archaeology
  • Prehistoric Archaeology
  • Underwater Archaeology
  • Urban Archaeology
  • Zooarchaeology
  • Browse content in Architecture
  • Architectural Structure and Design
  • History of Architecture
  • Residential and Domestic Buildings
  • Theory of Architecture
  • Browse content in Art
  • Art Subjects and Themes
  • History of Art
  • Industrial and Commercial Art
  • Theory of Art
  • Biographical Studies
  • Byzantine Studies
  • Browse content in Classical Studies
  • Classical Literature
  • Classical Reception
  • Classical History
  • Classical Philosophy
  • Classical Mythology
  • Classical Art and Architecture
  • Classical Oratory and Rhetoric
  • Greek and Roman Papyrology
  • Greek and Roman Archaeology
  • Greek and Roman Epigraphy
  • Greek and Roman Law
  • Late Antiquity
  • Religion in the Ancient World
  • Digital Humanities
  • Browse content in History
  • Colonialism and Imperialism
  • Diplomatic History
  • Environmental History
  • Genealogy, Heraldry, Names, and Honours
  • Genocide and Ethnic Cleansing
  • Historical Geography
  • History by Period
  • History of Emotions
  • History of Agriculture
  • History of Education
  • History of Gender and Sexuality
  • Industrial History
  • Intellectual History
  • International History
  • Labour History
  • Legal and Constitutional History
  • Local and Family History
  • Maritime History
  • Military History
  • National Liberation and Post-Colonialism
  • Oral History
  • Political History
  • Public History
  • Regional and National History
  • Revolutions and Rebellions
  • Slavery and Abolition of Slavery
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • Urban History
  • World History
  • Browse content in Language Teaching and Learning
  • Language Learning (Specific Skills)
  • Language Teaching Theory and Methods
  • Browse content in Linguistics
  • Applied Linguistics
  • Cognitive Linguistics
  • Computational Linguistics
  • Forensic Linguistics
  • Grammar, Syntax and Morphology
  • Historical and Diachronic Linguistics
  • History of English
  • Language Evolution
  • Language Reference
  • Language Variation
  • Language Families
  • Language Acquisition
  • Lexicography
  • Linguistic Anthropology
  • Linguistic Theories
  • Linguistic Typology
  • Phonetics and Phonology
  • Psycholinguistics
  • Sociolinguistics
  • Translation and Interpretation
  • Writing Systems
  • Browse content in Literature
  • Bibliography
  • Children's Literature Studies
  • Literary Studies (Romanticism)
  • Literary Studies (American)
  • Literary Studies (Modernism)
  • Literary Studies (Asian)
  • Literary Studies (European)
  • Literary Studies (Eco-criticism)
  • Literary Studies - World
  • Literary Studies (1500 to 1800)
  • Literary Studies (19th Century)
  • Literary Studies (20th Century onwards)
  • Literary Studies (African American Literature)
  • Literary Studies (British and Irish)
  • Literary Studies (Early and Medieval)
  • Literary Studies (Fiction, Novelists, and Prose Writers)
  • Literary Studies (Gender Studies)
  • Literary Studies (Graphic Novels)
  • Literary Studies (History of the Book)
  • Literary Studies (Plays and Playwrights)
  • Literary Studies (Poetry and Poets)
  • Literary Studies (Postcolonial Literature)
  • Literary Studies (Queer Studies)
  • Literary Studies (Science Fiction)
  • Literary Studies (Travel Literature)
  • Literary Studies (War Literature)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Mythology and Folklore
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Applied Music
  • Dance and Music
  • Ethics in Music
  • Ethnomusicology
  • Gender and Sexuality in Music
  • Medicine and Music
  • Music Cultures
  • Music and Media
  • Music and Culture
  • Music and Religion
  • Music Education and Pedagogy
  • Music Theory and Analysis
  • Musical Scores, Lyrics, and Libretti
  • Musical Structures, Styles, and Techniques
  • Musicology and Music History
  • Performance Practice and Studies
  • Race and Ethnicity in Music
  • Sound Studies
  • Browse content in Performing Arts
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • Feminist Philosophy
  • History of Western Philosophy
  • Metaphysics
  • Moral Philosophy
  • Non-Western Philosophy
  • Philosophy of Language
  • Philosophy of Mind
  • Philosophy of Perception
  • Philosophy of Action
  • Philosophy of Law
  • Philosophy of Religion
  • Philosophy of Science
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Social and Political Philosophy
  • Browse content in Religion
  • Biblical Studies
  • Christianity
  • East Asian Religions
  • History of Religion
  • Judaism and Jewish Studies
  • Qumran Studies
  • Religion and Education
  • Religion and Health
  • Religion and Politics
  • Religion and Science
  • Religion and Law
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Cookery, Food, and Drink
  • Cultural Studies
  • Customs and Traditions
  • Ethical Issues and Debates
  • Hobbies, Games, Arts and Crafts
  • Lifestyle, Home, and Garden
  • Natural world, Country Life, and Pets
  • Popular Beliefs and Controversial Knowledge
  • Sports and Outdoor Recreation
  • Technology and Society
  • Travel and Holiday
  • Visual Culture
  • Browse content in Law
  • Arbitration
  • Browse content in Company and Commercial Law
  • Commercial Law
  • Company Law
  • Browse content in Comparative Law
  • Systems of Law
  • Competition Law
  • Browse content in Constitutional and Administrative Law
  • Government Powers
  • Judicial Review
  • Local Government Law
  • Military and Defence Law
  • Parliamentary and Legislative Practice
  • Construction Law
  • Contract Law
  • Browse content in Criminal Law
  • Criminal Procedure
  • Criminal Evidence Law
  • Sentencing and Punishment
  • Employment and Labour Law
  • Environment and Energy Law
  • Browse content in Financial Law
  • Banking Law
  • Insolvency Law
  • History of Law
  • Human Rights and Immigration
  • Intellectual Property Law
  • Browse content in International Law
  • Private International Law and Conflict of Laws
  • Public International Law
  • IT and Communications Law
  • Jurisprudence and Philosophy of Law
  • Law and Society
  • Law and Politics
  • Browse content in Legal System and Practice
  • Courts and Procedure
  • Legal Skills and Practice
  • Primary Sources of Law
  • Regulation of Legal Profession
  • Medical and Healthcare Law
  • Browse content in Policing
  • Criminal Investigation and Detection
  • Police and Security Services
  • Police Procedure and Law
  • Police Regional Planning
  • Browse content in Property Law
  • Personal Property Law
  • Study and Revision
  • Terrorism and National Security Law
  • Browse content in Trusts Law
  • Wills and Probate or Succession
  • Browse content in Medicine and Health
  • Browse content in Allied Health Professions
  • Arts Therapies
  • Clinical Science
  • Dietetics and Nutrition
  • Occupational Therapy
  • Operating Department Practice
  • Physiotherapy
  • Radiography
  • Speech and Language Therapy
  • Browse content in Anaesthetics
  • General Anaesthesia
  • Neuroanaesthesia
  • Clinical Neuroscience
  • Browse content in Clinical Medicine
  • Acute Medicine
  • Cardiovascular Medicine
  • Clinical Genetics
  • Clinical Pharmacology and Therapeutics
  • Dermatology
  • Endocrinology and Diabetes
  • Gastroenterology
  • Genito-urinary Medicine
  • Geriatric Medicine
  • Infectious Diseases
  • Medical Toxicology
  • Medical Oncology
  • Pain Medicine
  • Palliative Medicine
  • Rehabilitation Medicine
  • Respiratory Medicine and Pulmonology
  • Rheumatology
  • Sleep Medicine
  • Sports and Exercise Medicine
  • Community Medical Services
  • Critical Care
  • Emergency Medicine
  • Forensic Medicine
  • Haematology
  • History of Medicine
  • Browse content in Medical Skills
  • Clinical Skills
  • Communication Skills
  • Nursing Skills
  • Surgical Skills
  • Medical Ethics
  • Browse content in Medical Dentistry
  • Oral and Maxillofacial Surgery
  • Paediatric Dentistry
  • Restorative Dentistry and Orthodontics
  • Surgical Dentistry
  • Medical Statistics and Methodology
  • Browse content in Neurology
  • Clinical Neurophysiology
  • Neuropathology
  • Nursing Studies
  • Browse content in Obstetrics and Gynaecology
  • Gynaecology
  • Occupational Medicine
  • Ophthalmology
  • Otolaryngology (ENT)
  • Browse content in Paediatrics
  • Neonatology
  • Browse content in Pathology
  • Chemical Pathology
  • Clinical Cytogenetics and Molecular Genetics
  • Histopathology
  • Medical Microbiology and Virology
  • Patient Education and Information
  • Browse content in Pharmacology
  • Psychopharmacology
  • Browse content in Popular Health
  • Caring for Others
  • Complementary and Alternative Medicine
  • Self-help and Personal Development
  • Browse content in Preclinical Medicine
  • Cell Biology
  • Molecular Biology and Genetics
  • Reproduction, Growth and Development
  • Primary Care
  • Professional Development in Medicine
  • Browse content in Psychiatry
  • Addiction Medicine
  • Child and Adolescent Psychiatry
  • Forensic Psychiatry
  • Learning Disabilities
  • Old Age Psychiatry
  • Psychotherapy
  • Browse content in Public Health and Epidemiology
  • Epidemiology
  • Public Health
  • Browse content in Radiology
  • Clinical Radiology
  • Interventional Radiology
  • Nuclear Medicine
  • Radiation Oncology
  • Reproductive Medicine
  • Browse content in Surgery
  • Cardiothoracic Surgery
  • Gastro-intestinal and Colorectal Surgery
  • General Surgery
  • Neurosurgery
  • Paediatric Surgery
  • Peri-operative Care
  • Plastic and Reconstructive Surgery
  • Surgical Oncology
  • Transplant Surgery
  • Trauma and Orthopaedic Surgery
  • Vascular Surgery
  • Browse content in Science and Mathematics
  • Browse content in Biological Sciences
  • Aquatic Biology
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Microbiology
  • Molecular and Cell Biology
  • Natural History
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Structural Biology
  • Systems Biology
  • Zoology and Animal Sciences
  • Browse content in Chemistry
  • Analytical Chemistry
  • Computational Chemistry
  • Crystallography
  • Environmental Chemistry
  • Industrial Chemistry
  • Inorganic Chemistry
  • Materials Chemistry
  • Medicinal Chemistry
  • Mineralogy and Gems
  • Organic Chemistry
  • Physical Chemistry
  • Polymer Chemistry
  • Study and Communication Skills in Chemistry
  • Theoretical Chemistry
  • Browse content in Computer Science
  • Artificial Intelligence
  • Computer Architecture and Logic Design
  • Game Studies
  • Human-Computer Interaction
  • Mathematical Theory of Computation
  • Programming Languages
  • Software Engineering
  • Systems Analysis and Design
  • Virtual Reality
  • Browse content in Computing
  • Business Applications
  • Computer Games
  • Computer Security
  • Computer Networking and Communications
  • Digital Lifestyle
  • Graphical and Digital Media Applications
  • Operating Systems
  • Browse content in Earth Sciences and Geography
  • Atmospheric Sciences
  • Environmental Geography
  • Geology and the Lithosphere
  • Maps and Map-making
  • Meteorology and Climatology
  • Oceanography and Hydrology
  • Palaeontology
  • Physical Geography and Topography
  • Regional Geography
  • Soil Science
  • Urban Geography
  • Browse content in Engineering and Technology
  • Agriculture and Farming
  • Biological Engineering
  • Civil Engineering, Surveying, and Building
  • Electronics and Communications Engineering
  • Energy Technology
  • Engineering (General)
  • Environmental Science, Engineering, and Technology
  • History of Engineering and Technology
  • Mechanical Engineering and Materials
  • Technology of Industrial Chemistry
  • Transport Technology and Trades
  • Browse content in Environmental Science
  • Applied Ecology (Environmental Science)
  • Conservation of the Environment (Environmental Science)
  • Environmental Sustainability
  • Environmentalist Thought and Ideology (Environmental Science)
  • Management of Land and Natural Resources (Environmental Science)
  • Natural Disasters (Environmental Science)
  • Nuclear Issues (Environmental Science)
  • Pollution and Threats to the Environment (Environmental Science)
  • Social Impact of Environmental Issues (Environmental Science)
  • History of Science and Technology
  • Browse content in Materials Science
  • Ceramics and Glasses
  • Composite Materials
  • Metals, Alloying, and Corrosion
  • Nanotechnology
  • Browse content in Mathematics
  • Applied Mathematics
  • Biomathematics and Statistics
  • History of Mathematics
  • Mathematical Education
  • Mathematical Finance
  • Mathematical Analysis
  • Numerical and Computational Mathematics
  • Probability and Statistics
  • Pure Mathematics
  • Browse content in Neuroscience
  • Cognition and Behavioural Neuroscience
  • Development of the Nervous System
  • Disorders of the Nervous System
  • History of Neuroscience
  • Invertebrate Neurobiology
  • Molecular and Cellular Systems
  • Neuroendocrinology and Autonomic Nervous System
  • Neuroscientific Techniques
  • Sensory and Motor Systems
  • Browse content in Physics
  • Astronomy and Astrophysics
  • Atomic, Molecular, and Optical Physics
  • Biological and Medical Physics
  • Classical Mechanics
  • Computational Physics
  • Condensed Matter Physics
  • Electromagnetism, Optics, and Acoustics
  • History of Physics
  • Mathematical and Statistical Physics
  • Measurement Science
  • Nuclear Physics
  • Particles and Fields
  • Plasma Physics
  • Quantum Physics
  • Relativity and Gravitation
  • Semiconductor and Mesoscopic Physics
  • Browse content in Psychology
  • Affective Sciences
  • Clinical Psychology
  • Cognitive Psychology
  • Cognitive Neuroscience
  • Criminal and Forensic Psychology
  • Developmental Psychology
  • Educational Psychology
  • Evolutionary Psychology
  • Health Psychology
  • History and Systems in Psychology
  • Music Psychology
  • Neuropsychology
  • Organizational Psychology
  • Psychological Assessment and Testing
  • Psychology of Human-Technology Interaction
  • Psychology Professional Development and Training
  • Research Methods in Psychology
  • Social Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Anthropology of Religion
  • Human Evolution
  • Medical Anthropology
  • Physical Anthropology
  • Regional Anthropology
  • Social and Cultural Anthropology
  • Theory and Practice of Anthropology
  • Browse content in Business and Management
  • Business Ethics
  • Business History
  • Business Strategy
  • Business and Technology
  • Business and Government
  • Business and the Environment
  • Comparative Management
  • Corporate Governance
  • Corporate Social Responsibility
  • Entrepreneurship
  • Health Management
  • Human Resource Management
  • Industrial and Employment Relations
  • Industry Studies
  • Information and Communication Technologies
  • International Business
  • Knowledge Management
  • Management and Management Techniques
  • Operations Management
  • Organizational Theory and Behaviour
  • Pensions and Pension Management
  • Public and Nonprofit Management
  • Strategic Management
  • Supply Chain Management
  • Browse content in Criminology and Criminal Justice
  • Criminal Justice
  • Criminology
  • Forms of Crime
  • International and Comparative Criminology
  • Youth Violence and Juvenile Justice
  • Development Studies
  • Browse content in Economics
  • Agricultural, Environmental, and Natural Resource Economics
  • Asian Economics
  • Behavioural Finance
  • Behavioural Economics and Neuroeconomics
  • Econometrics and Mathematical Economics
  • Economic History
  • Economic Methodology
  • Economic Systems
  • Economic Development and Growth
  • Financial Markets
  • Financial Institutions and Services
  • General Economics and Teaching
  • Health, Education, and Welfare
  • History of Economic Thought
  • International Economics
  • Labour and Demographic Economics
  • Law and Economics
  • Macroeconomics and Monetary Economics
  • Microeconomics
  • Public Economics
  • Urban, Rural, and Regional Economics
  • Welfare Economics
  • Browse content in Education
  • Adult Education and Continuous Learning
  • Care and Counselling of Students
  • Early Childhood and Elementary Education
  • Educational Equipment and Technology
  • Educational Strategies and Policy
  • Higher and Further Education
  • Organization and Management of Education
  • Philosophy and Theory of Education
  • Schools Studies
  • Secondary Education
  • Teaching of a Specific Subject
  • Teaching of Specific Groups and Special Educational Needs
  • Teaching Skills and Techniques
  • Browse content in Environment
  • Applied Ecology (Social Science)
  • Climate Change
  • Conservation of the Environment (Social Science)
  • Environmentalist Thought and Ideology (Social Science)
  • Natural Disasters (Environment)
  • Social Impact of Environmental Issues (Social Science)
  • Browse content in Human Geography
  • Cultural Geography
  • Economic Geography
  • Political Geography
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • African Politics
  • Asian Politics
  • Chinese Politics
  • Comparative Politics
  • Conflict Politics
  • Elections and Electoral Studies
  • Environmental Politics
  • European Union
  • Foreign Policy
  • Gender and Politics
  • Human Rights and Politics
  • Indian Politics
  • International Relations
  • International Organization (Politics)
  • International Political Economy
  • Irish Politics
  • Latin American Politics
  • Middle Eastern Politics
  • Political Behaviour
  • Political Economy
  • Political Institutions
  • Political Theory
  • Political Methodology
  • Political Communication
  • Political Philosophy
  • Political Sociology
  • Politics and Law
  • Public Policy
  • Public Administration
  • Quantitative Political Methodology
  • Regional Political Studies
  • Russian Politics
  • Security Studies
  • State and Local Government
  • UK Politics
  • US Politics
  • Browse content in Regional and Area Studies
  • African Studies
  • Asian Studies
  • East Asian Studies
  • Japanese Studies
  • Latin American Studies
  • Middle Eastern Studies
  • Native American Studies
  • Scottish Studies
  • Browse content in Research and Information
  • Research Methods
  • Browse content in Social Work
  • Addictions and Substance Misuse
  • Adoption and Fostering
  • Care of the Elderly
  • Child and Adolescent Social Work
  • Couple and Family Social Work
  • Developmental and Physical Disabilities Social Work
  • Direct Practice and Clinical Social Work
  • Emergency Services
  • Human Behaviour and the Social Environment
  • International and Global Issues in Social Work
  • Mental and Behavioural Health
  • Social Justice and Human Rights
  • Social Policy and Advocacy
  • Social Work and Crime and Justice
  • Social Work Macro Practice
  • Social Work Practice Settings
  • Social Work Research and Evidence-based Practice
  • Welfare and Benefit Systems
  • Browse content in Sociology
  • Childhood Studies
  • Community Development
  • Comparative and Historical Sociology
  • Economic Sociology
  • Gender and Sexuality
  • Gerontology and Ageing
  • Health, Illness, and Medicine
  • Marriage and the Family
  • Migration Studies
  • Occupations, Professions, and Work
  • Organizations
  • Population and Demography
  • Race and Ethnicity
  • Social Theory
  • Social Movements and Social Change
  • Social Research and Statistics
  • Social Stratification, Inequality, and Mobility
  • Sociology of Religion
  • Sociology of Education
  • Sport and Leisure
  • Urban and Rural Studies
  • Browse content in Warfare and Defence
  • Defence Strategy, Planning, and Research
  • Land Forces and Warfare
  • Military Administration
  • Military Life and Institutions
  • Naval Forces and Warfare
  • Other Warfare and Defence Issues
  • Peace Studies and Conflict Resolution
  • Weapons and Equipment

Finding and Evaluating Evidence: Systematic Reviews and Evidence-Based Practice

  • < Previous chapter
  • Next chapter >

3 Critically Appraising the Quality and Credibility of Quantitative Research for Systematic Reviews

  • Published: September 2011
  • Cite Icon Cite
  • Permissions Icon Permissions

This chapter looks at how to evaluate the quality and credibility of various types of quantitative research that might be included in a systematic review. Various factors that determine the quality and believability of a study will be presented, including, • assessing the study’s methods in terms of internal validity • examining factors associated with external validity and relevance; and • evaluating the credibility of the research and researcher in terms of possible biases that might influence the research design, analysis, or conclusions. The importance of transparency is highlighted.

Signed in as

Institutional accounts.

  • Google Scholar Indexing
  • GoogleCrawler [DO NOT DELETE]

Personal account

  • Sign in with email/username & password
  • Get email alerts
  • Save searches
  • Purchase content
  • Activate your purchase/trial code
  • Add your ORCID iD

Institutional access

Sign in with a library card.

  • Sign in with username/password
  • Recommend to your librarian
  • Institutional account management
  • Get help with access

Access to content on Oxford Academic is often provided through institutional subscriptions and purchases. If you are a member of an institution with an active account, you may be able to access content in one of the following ways:

IP based access

Typically, access is provided across an institutional network to a range of IP addresses. This authentication occurs automatically, and it is not possible to sign out of an IP authenticated account.

Sign in through your institution

Choose this option to get remote access when outside your institution. Shibboleth/Open Athens technology is used to provide single sign-on between your institution’s website and Oxford Academic.

  • Click Sign in through your institution.
  • Select your institution from the list provided, which will take you to your institution's website to sign in.
  • When on the institution site, please use the credentials provided by your institution. Do not use an Oxford Academic personal account.
  • Following successful sign in, you will be returned to Oxford Academic.

If your institution is not listed or you cannot sign in to your institution’s website, please contact your librarian or administrator.

Enter your library card number to sign in. If you cannot sign in, please contact your librarian.

Society Members

Society member access to a journal is achieved in one of the following ways:

Sign in through society site

Many societies offer single sign-on between the society website and Oxford Academic. If you see ‘Sign in through society site’ in the sign in pane within a journal:

  • Click Sign in through society site.
  • When on the society site, please use the credentials provided by that society. Do not use an Oxford Academic personal account.

If you do not have a society account or have forgotten your username or password, please contact your society.

Sign in using a personal account

Some societies use Oxford Academic personal accounts to provide access to their members. See below.

A personal account can be used to get email alerts, save searches, purchase content, and activate subscriptions.

Some societies use Oxford Academic personal accounts to provide access to their members.

Viewing your signed in accounts

Click the account icon in the top right to:

  • View your signed in personal account and access account management features.
  • View the institutional accounts that are providing access.

Signed in but can't access content

Oxford Academic is home to a wide variety of products. The institutional subscription may not cover the content that you are trying to access. If you believe you should have access to that content, please contact your librarian.

For librarians and administrators, your personal account also provides access to institutional account management. Here you will find options to view and activate subscriptions, manage institutional settings and access options, access usage statistics, and more.

Our books are available by subscription or purchase to libraries and institutions.

  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Rights and permissions
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Family Med Prim Care
  • v.4(3); Jul-Sep 2015

Validity, reliability, and generalizability in qualitative research

Lawrence leung.

1 Department of Family Medicine, Queen's University, Kingston, Ontario, Canada

2 Centre of Studies in Primary Care, Queen's University, Kingston, Ontario, Canada

In general practice, qualitative research contributes as significantly as quantitative research, in particular regarding psycho-social aspects of patient-care, health services provision, policy setting, and health administrations. In contrast to quantitative research, qualitative research as a whole has been constantly critiqued, if not disparaged, by the lack of consensus for assessing its quality and robustness. This article illustrates with five published studies how qualitative research can impact and reshape the discipline of primary care, spiraling out from clinic-based health screening to community-based disease monitoring, evaluation of out-of-hours triage services to provincial psychiatric care pathways model and finally, national legislation of core measures for children's healthcare insurance. Fundamental concepts of validity, reliability, and generalizability as applicable to qualitative research are then addressed with an update on the current views and controversies.

Nature of Qualitative Research versus Quantitative Research

The essence of qualitative research is to make sense of and recognize patterns among words in order to build up a meaningful picture without compromising its richness and dimensionality. Like quantitative research, the qualitative research aims to seek answers for questions of “how, where, when who and why” with a perspective to build a theory or refute an existing theory. Unlike quantitative research which deals primarily with numerical data and their statistical interpretations under a reductionist, logical and strictly objective paradigm, qualitative research handles nonnumerical information and their phenomenological interpretation, which inextricably tie in with human senses and subjectivity. While human emotions and perspectives from both subjects and researchers are considered undesirable biases confounding results in quantitative research, the same elements are considered essential and inevitable, if not treasurable, in qualitative research as they invariable add extra dimensions and colors to enrich the corpus of findings. However, the issue of subjectivity and contextual ramifications has fueled incessant controversies regarding yardsticks for quality and trustworthiness of qualitative research results for healthcare.

Impact of Qualitative Research upon Primary Care

In many ways, qualitative research contributes significantly, if not more so than quantitative research, to the field of primary care at various levels. Five qualitative studies are chosen to illustrate how various methodologies of qualitative research helped in advancing primary healthcare, from novel monitoring of chronic obstructive pulmonary disease (COPD) via mobile-health technology,[ 1 ] informed decision for colorectal cancer screening,[ 2 ] triaging out-of-hours GP services,[ 3 ] evaluating care pathways for community psychiatry[ 4 ] and finally prioritization of healthcare initiatives for legislation purposes at national levels.[ 5 ] With the recent advances of information technology and mobile connecting device, self-monitoring and management of chronic diseases via tele-health technology may seem beneficial to both the patient and healthcare provider. Recruiting COPD patients who were given tele-health devices that monitored lung functions, Williams et al. [ 1 ] conducted phone interviews and analyzed their transcripts via a grounded theory approach, identified themes which enabled them to conclude that such mobile-health setup and application helped to engage patients with better adherence to treatment and overall improvement in mood. Such positive findings were in contrast to previous studies, which opined that elderly patients were often challenged by operating computer tablets,[ 6 ] or, conversing with the tele-health software.[ 7 ] To explore the content of recommendations for colorectal cancer screening given out by family physicians, Wackerbarth, et al. [ 2 ] conducted semi-structure interviews with subsequent content analysis and found that most physicians delivered information to enrich patient knowledge with little regard to patients’ true understanding, ideas, and preferences in the matter. These findings suggested room for improvement for family physicians to better engage their patients in recommending preventative care. Faced with various models of out-of-hours triage services for GP consultations, Egbunike et al. [ 3 ] conducted thematic analysis on semi-structured telephone interviews with patients and doctors in various urban, rural and mixed settings. They found that the efficiency of triage services remained a prime concern from both users and providers, among issues of access to doctors and unfulfilled/mismatched expectations from users, which could arouse dissatisfaction and legal implications. In UK, a care pathways model for community psychiatry had been introduced but its benefits were unclear. Khandaker et al. [ 4 ] hence conducted a qualitative study using semi-structure interviews with medical staff and other stakeholders; adopting a grounded-theory approach, major themes emerged which included improved equality of access, more focused logistics, increased work throughput and better accountability for community psychiatry provided under the care pathway model. Finally, at the US national level, Mangione-Smith et al. [ 5 ] employed a modified Delphi method to gather consensus from a panel of nominators which were recognized experts and stakeholders in their disciplines, and identified a core set of quality measures for children's healthcare under the Medicaid and Children's Health Insurance Program. These core measures were made transparent for public opinion and later passed on for full legislation, hence illustrating the impact of qualitative research upon social welfare and policy improvement.

Overall Criteria for Quality in Qualitative Research

Given the diverse genera and forms of qualitative research, there is no consensus for assessing any piece of qualitative research work. Various approaches have been suggested, the two leading schools of thoughts being the school of Dixon-Woods et al. [ 8 ] which emphasizes on methodology, and that of Lincoln et al. [ 9 ] which stresses the rigor of interpretation of results. By identifying commonalities of qualitative research, Dixon-Woods produced a checklist of questions for assessing clarity and appropriateness of the research question; the description and appropriateness for sampling, data collection and data analysis; levels of support and evidence for claims; coherence between data, interpretation and conclusions, and finally level of contribution of the paper. These criteria foster the 10 questions for the Critical Appraisal Skills Program checklist for qualitative studies.[ 10 ] However, these methodology-weighted criteria may not do justice to qualitative studies that differ in epistemological and philosophical paradigms,[ 11 , 12 ] one classic example will be positivistic versus interpretivistic.[ 13 ] Equally, without a robust methodological layout, rigorous interpretation of results advocated by Lincoln et al. [ 9 ] will not be good either. Meyrick[ 14 ] argued from a different angle and proposed fulfillment of the dual core criteria of “transparency” and “systematicity” for good quality qualitative research. In brief, every step of the research logistics (from theory formation, design of study, sampling, data acquisition and analysis to results and conclusions) has to be validated if it is transparent or systematic enough. In this manner, both the research process and results can be assured of high rigor and robustness.[ 14 ] Finally, Kitto et al. [ 15 ] epitomized six criteria for assessing overall quality of qualitative research: (i) Clarification and justification, (ii) procedural rigor, (iii) sample representativeness, (iv) interpretative rigor, (v) reflexive and evaluative rigor and (vi) transferability/generalizability, which also double as evaluative landmarks for manuscript review to the Medical Journal of Australia. Same for quantitative research, quality for qualitative research can be assessed in terms of validity, reliability, and generalizability.

Validity in qualitative research means “appropriateness” of the tools, processes, and data. Whether the research question is valid for the desired outcome, the choice of methodology is appropriate for answering the research question, the design is valid for the methodology, the sampling and data analysis is appropriate, and finally the results and conclusions are valid for the sample and context. In assessing validity of qualitative research, the challenge can start from the ontology and epistemology of the issue being studied, e.g. the concept of “individual” is seen differently between humanistic and positive psychologists due to differing philosophical perspectives:[ 16 ] Where humanistic psychologists believe “individual” is a product of existential awareness and social interaction, positive psychologists think the “individual” exists side-by-side with formation of any human being. Set off in different pathways, qualitative research regarding the individual's wellbeing will be concluded with varying validity. Choice of methodology must enable detection of findings/phenomena in the appropriate context for it to be valid, with due regard to culturally and contextually variable. For sampling, procedures and methods must be appropriate for the research paradigm and be distinctive between systematic,[ 17 ] purposeful[ 18 ] or theoretical (adaptive) sampling[ 19 , 20 ] where the systematic sampling has no a priori theory, purposeful sampling often has a certain aim or framework and theoretical sampling is molded by the ongoing process of data collection and theory in evolution. For data extraction and analysis, several methods were adopted to enhance validity, including 1 st tier triangulation (of researchers) and 2 nd tier triangulation (of resources and theories),[ 17 , 21 ] well-documented audit trail of materials and processes,[ 22 , 23 , 24 ] multidimensional analysis as concept- or case-orientated[ 25 , 26 ] and respondent verification.[ 21 , 27 ]

Reliability

In quantitative research, reliability refers to exact replicability of the processes and the results. In qualitative research with diverse paradigms, such definition of reliability is challenging and epistemologically counter-intuitive. Hence, the essence of reliability for qualitative research lies with consistency.[ 24 , 28 ] A margin of variability for results is tolerated in qualitative research provided the methodology and epistemological logistics consistently yield data that are ontologically similar but may differ in richness and ambience within similar dimensions. Silverman[ 29 ] proposed five approaches in enhancing the reliability of process and results: Refutational analysis, constant data comparison, comprehensive data use, inclusive of the deviant case and use of tables. As data were extracted from the original sources, researchers must verify their accuracy in terms of form and context with constant comparison,[ 27 ] either alone or with peers (a form of triangulation).[ 30 ] The scope and analysis of data included should be as comprehensive and inclusive with reference to quantitative aspects if possible.[ 30 ] Adopting the Popperian dictum of falsifiability as essence of truth and science, attempted to refute the qualitative data and analytes should be performed to assess reliability.[ 31 ]

Generalizability

Most qualitative research studies, if not all, are meant to study a specific issue or phenomenon in a certain population or ethnic group, of a focused locality in a particular context, hence generalizability of qualitative research findings is usually not an expected attribute. However, with rising trend of knowledge synthesis from qualitative research via meta-synthesis, meta-narrative or meta-ethnography, evaluation of generalizability becomes pertinent. A pragmatic approach to assessing generalizability for qualitative studies is to adopt same criteria for validity: That is, use of systematic sampling, triangulation and constant comparison, proper audit and documentation, and multi-dimensional theory.[ 17 ] However, some researchers espouse the approach of analytical generalization[ 32 ] where one judges the extent to which the findings in one study can be generalized to another under similar theoretical, and the proximal similarity model, where generalizability of one study to another is judged by similarities between the time, place, people and other social contexts.[ 33 ] Thus said, Zimmer[ 34 ] questioned the suitability of meta-synthesis in view of the basic tenets of grounded theory,[ 35 ] phenomenology[ 36 ] and ethnography.[ 37 ] He concluded that any valid meta-synthesis must retain the other two goals of theory development and higher-level abstraction while in search of generalizability, and must be executed as a third level interpretation using Gadamer's concepts of the hermeneutic circle,[ 38 , 39 ] dialogic process[ 38 ] and fusion of horizons.[ 39 ] Finally, Toye et al. [ 40 ] reported the practicality of using “conceptual clarity” and “interpretative rigor” as intuitive criteria for assessing quality in meta-ethnography, which somehow echoed Rolfe's controversial aesthetic theory of research reports.[ 41 ]

Food for Thought

Despite various measures to enhance or ensure quality of qualitative studies, some researchers opined from a purist ontological and epistemological angle that qualitative research is not a unified, but ipso facto diverse field,[ 8 ] hence any attempt to synthesize or appraise different studies under one system is impossible and conceptually wrong. Barbour argued from a philosophical angle that these special measures or “technical fixes” (like purposive sampling, multiple-coding, triangulation, and respondent validation) can never confer the rigor as conceived.[ 11 ] In extremis, Rolfe et al. opined from the field of nursing research, that any set of formal criteria used to judge the quality of qualitative research are futile and without validity, and suggested that any qualitative report should be judged by the form it is written (aesthetic) and not by the contents (epistemic).[ 41 ] Rolfe's novel view is rebutted by Porter,[ 42 ] who argued via logical premises that two of Rolfe's fundamental statements were flawed: (i) “The content of research report is determined by their forms” may not be a fact, and (ii) that research appraisal being “subject to individual judgment based on insight and experience” will mean those without sufficient experience of performing research will be unable to judge adequately – hence an elitist's principle. From a realism standpoint, Porter then proposes multiple and open approaches for validity in qualitative research that incorporate parallel perspectives[ 43 , 44 ] and diversification of meanings.[ 44 ] Any work of qualitative research, when read by the readers, is always a two-way interactive process, such that validity and quality has to be judged by the receiving end too and not by the researcher end alone.

In summary, the three gold criteria of validity, reliability and generalizability apply in principle to assess quality for both quantitative and qualitative research, what differs will be the nature and type of processes that ontologically and epistemologically distinguish between the two.

Source of Support: Nil.

Conflict of Interest: None declared.

  • How it works

researchprospect post subheader

Reliability and Validity – Definitions, Types & Examples

Published by Alvin Nicolas at August 16th, 2021 , Revised On October 26, 2023

A researcher must test the collected data before making any conclusion. Every  research design  needs to be concerned with reliability and validity to measure the quality of the research.

What is Reliability?

Reliability refers to the consistency of the measurement. Reliability shows how trustworthy is the score of the test. If the collected data shows the same results after being tested using various methods and sample groups, the information is reliable. If your method has reliability, the results will be valid.

Example: If you weigh yourself on a weighing scale throughout the day, you’ll get the same results. These are considered reliable results obtained through repeated measures.

Example: If a teacher conducts the same math test of students and repeats it next week with the same questions. If she gets the same score, then the reliability of the test is high.

What is the Validity?

Validity refers to the accuracy of the measurement. Validity shows how a specific test is suitable for a particular situation. If the results are accurate according to the researcher’s situation, explanation, and prediction, then the research is valid. 

If the method of measuring is accurate, then it’ll produce accurate results. If a method is reliable, then it’s valid. In contrast, if a method is not reliable, it’s not valid. 

Example:  Your weighing scale shows different results each time you weigh yourself within a day even after handling it carefully, and weighing before and after meals. Your weighing machine might be malfunctioning. It means your method had low reliability. Hence you are getting inaccurate or inconsistent results that are not valid.

Example:  Suppose a questionnaire is distributed among a group of people to check the quality of a skincare product and repeated the same questionnaire with many groups. If you get the same response from various participants, it means the validity of the questionnaire and product is high as it has high reliability.

Most of the time, validity is difficult to measure even though the process of measurement is reliable. It isn’t easy to interpret the real situation.

Example:  If the weighing scale shows the same result, let’s say 70 kg each time, even if your actual weight is 55 kg, then it means the weighing scale is malfunctioning. However, it was showing consistent results, but it cannot be considered as reliable. It means the method has low reliability.

Internal Vs. External Validity

One of the key features of randomised designs is that they have significantly high internal and external validity.

Internal validity  is the ability to draw a causal link between your treatment and the dependent variable of interest. It means the observed changes should be due to the experiment conducted, and any external factor should not influence the  variables .

Example: age, level, height, and grade.

External validity  is the ability to identify and generalise your study outcomes to the population at large. The relationship between the study’s situation and the situations outside the study is considered external validity.

Also, read about Inductive vs Deductive reasoning in this article.

Looking for reliable dissertation support?

We hear you.

  • Whether you want a full dissertation written or need help forming a dissertation proposal, we can help you with both.
  • Get different dissertation services at ResearchProspect and score amazing grades!

Threats to Interval Validity

Threats of external validity, how to assess reliability and validity.

Reliability can be measured by comparing the consistency of the procedure and its results. There are various methods to measure validity and reliability. Reliability can be measured through  various statistical methods  depending on the types of validity, as explained below:

Types of Reliability

Types of validity.

As we discussed above, the reliability of the measurement alone cannot determine its validity. Validity is difficult to be measured even if the method is reliable. The following type of tests is conducted for measuring validity. 

Does your Research Methodology Have the Following?

  • Great Research/Sources
  • Perfect Language
  • Accurate Sources

If not, we can help. Our panel of experts makes sure to keep the 3 pillars of Research Methodology strong.

Does your Research Methodology Have the Following?

How to Increase Reliability?

  • Use an appropriate questionnaire to measure the competency level.
  • Ensure a consistent environment for participants
  • Make the participants familiar with the criteria of assessment.
  • Train the participants appropriately.
  • Analyse the research items regularly to avoid poor performance.

How to Increase Validity?

Ensuring Validity is also not an easy job. A proper functioning method to ensure validity is given below:

  • The reactivity should be minimised at the first concern.
  • The Hawthorne effect should be reduced.
  • The respondents should be motivated.
  • The intervals between the pre-test and post-test should not be lengthy.
  • Dropout rates should be avoided.
  • The inter-rater reliability should be ensured.
  • Control and experimental groups should be matched with each other.

How to Implement Reliability and Validity in your Thesis?

According to the experts, it is helpful if to implement the concept of reliability and Validity. Especially, in the thesis and the dissertation, these concepts are adopted much. The method for implementation given below:

Frequently Asked Questions

What is reliability and validity in research.

Reliability in research refers to the consistency and stability of measurements or findings. Validity relates to the accuracy and truthfulness of results, measuring what the study intends to. Both are crucial for trustworthy and credible research outcomes.

What is validity?

Validity in research refers to the extent to which a study accurately measures what it intends to measure. It ensures that the results are truly representative of the phenomena under investigation. Without validity, research findings may be irrelevant, misleading, or incorrect, limiting their applicability and credibility.

What is reliability?

Reliability in research refers to the consistency and stability of measurements over time. If a study is reliable, repeating the experiment or test under the same conditions should produce similar results. Without reliability, findings become unpredictable and lack dependability, potentially undermining the study’s credibility and generalisability.

What is reliability in psychology?

In psychology, reliability refers to the consistency of a measurement tool or test. A reliable psychological assessment produces stable and consistent results across different times, situations, or raters. It ensures that an instrument’s scores are not due to random error, making the findings dependable and reproducible in similar conditions.

What is test retest reliability?

Test-retest reliability assesses the consistency of measurements taken by a test over time. It involves administering the same test to the same participants at two different points in time and comparing the results. A high correlation between the scores indicates that the test produces stable and consistent results over time.

How to improve reliability of an experiment?

  • Standardise procedures and instructions.
  • Use consistent and precise measurement tools.
  • Train observers or raters to reduce subjective judgments.
  • Increase sample size to reduce random errors.
  • Conduct pilot studies to refine methods.
  • Repeat measurements or use multiple methods.
  • Address potential sources of variability.

What is the difference between reliability and validity?

Reliability refers to the consistency and repeatability of measurements, ensuring results are stable over time. Validity indicates how well an instrument measures what it’s intended to measure, ensuring accuracy and relevance. While a test can be reliable without being valid, a valid test must inherently be reliable. Both are essential for credible research.

Are interviews reliable and valid?

Interviews can be both reliable and valid, but they are susceptible to biases. The reliability and validity depend on the design, structure, and execution of the interview. Structured interviews with standardised questions improve reliability. Validity is enhanced when questions accurately capture the intended construct and when interviewer biases are minimised.

Are IQ tests valid and reliable?

IQ tests are generally considered reliable, producing consistent scores over time. Their validity, however, is a subject of debate. While they effectively measure certain cognitive skills, whether they capture the entirety of “intelligence” or predict success in all life areas is contested. Cultural bias and over-reliance on tests are also concerns.

Are questionnaires reliable and valid?

Questionnaires can be both reliable and valid if well-designed. Reliability is achieved when they produce consistent results over time or across similar populations. Validity is ensured when questions accurately measure the intended construct. However, factors like poorly phrased questions, respondent bias, and lack of standardisation can compromise their reliability and validity.

You May Also Like

Struggling to figure out “whether I should choose primary research or secondary research in my dissertation?” Here are some tips to help you decide.

Experimental research refers to the experiments conducted in the laboratory or under observation in controlled conditions. Here is all you need to know about experimental research.

Thematic analysis is commonly used for qualitative data. Researchers give preference to thematic analysis when analysing audio or video transcripts.

USEFUL LINKS

LEARNING RESOURCES

researchprospect-reviews-trust-site

COMPANY DETAILS

Research-Prospect-Writing-Service

  • How It Works

Logo for UNT Open Books

6 Reliability and Validity

Chris Bailey, PhD, CSCS, RSCC

Both reliability and validity have been discussed in this book previously, but this chapter will take a much deeper look at each. Reliability and validity are both important in kinesiology and they are often confused. In this chapter we will differentiate between the two as well as introduce concepts such as agreement. As with previous chapters, kinesiology related examples will be used to demonstrate how to calculate measures of reliability and validity and interpret them.

Chapter Learning Objectives

  • Discuss and differentiate between reliability and validity
  • Differentiate between relative and absolute measures of reliability
  • Discuss differences between reliability and agreement
  • Calculate reliability, agreement, and validity
  • Examine reliability and validity data examples in kinesiology

Reliability and Validity

Broadly, reliability refers to how repeatable the score or observations are. If we repeat our measure under very similar conditions, we should get a similar result if our data are reliable. Reliability may be referred to as consistency or stability in some circumstances. Consider an example where we are using a new minimally invasive device to measure body composition. If it gives us a similar result every time we test under similar conditions, we can likely say that it is reliable. But is it valid?

Validity refers to how truthful a score or measure is. The test is valid if it is measuring what it is supposed to measure. Consider an example where you want to evaluate knowledge on nutrition and eating for healthy lifestyles. In order to do so, you test a sample of student’s percent body fat at the rec center. Is percent body fat a valid measure of nutrition knowledge? We might assume that people with lower percent body fat have greater knowledge about nutrition which leads to a fitter physique, but that isn’t necessarily true. So, this test likely is not valid.

Validity is dependent on reliability and relevance. Relevance is the degree to which a test pertains to its objectives, described earlier. If reliability is a part of validity, you can begin to see how the two are related. Unfortunately, this also leads to some confusion about the two.

Reliability and Validity Confusion

Let’s take a look at two examples to differentiate between reliability and validity. To be clear from the beginning, valid data must be reliable, but not all reliable data are valid. The first example is the classic example you will find in many statistics textbooks that comes from archery and is observed in Figure 6.1 below. There are 3 different targets and 3 archers. The first archer fires all their arrows and none of them hit the bullseye that they were aiming for, but they are pretty consistent in where they do hit the target. Since they do not hit the mark, they were aiming for, they cannot be considered valid. But, since they are consistent, they can be considered reliable. This means that we could actually be bad at something but be bad consistently and we’d be considered reliable. The word reliable in itself does not necessarily mean good. Let’s take a look at the second target. Here we see that the archer does hit the bullseye once but is pretty spread out with the rest of the arrows. They likely hit the bullseye once by chance alone and since the arrows are pretty spread out, we can’t say they are reliable. If they aren’t reliable, the aren’t valid either. On the 3rd target the archer hit the bullseye with every arrow. Since they consistently did this, they are reliable. Since they hit the mark they were going for, they are relevant and also valid.

Figure 6.1 Three archery targets with 3 different arrow strike patterns

Let’s look at an example where someone might confuse reliability and validity. There is currently at least one cell phone service provider that states that they are ”the most reliable network” and they market this as a reason you should switch to them. But should you? What does this actually mean? Being the most reliable doesn’t actually tell us anything of value and they may be depending on potential customers’ confusion in this area. From a statistical standpoint, this marketing simply states that they are the most consistent carrier. They could be consistently good or consistently bad and could still be telling the truth. We don’t actually know from the information they provided. What customers are generally most concerned with (other than price) is whether or not they will have coverage in their area and this statement doesn’t provide any information on that aspect.

Reliability

Concerning reliability, objectivity is a specific type of reliability. It is also known as interrater reliability, which refers to the reliability between raters or judges. [1] Whenever we see the prefix “inter,” that refers to “between” and the prefix “intra” refers to “within.” An easy way to remember this is the difference between intercollegiate athletics (played between universities) and intramural athletics (played within the university). As an example of objectivity, consider an exam you take for a college course. Which of the following is likely more influenced by the grader, a multiple-choice exam, or an essay-based exam? The multiple-choice exam is more objective, less influenced, and the essay-based exam is more influenced or more subjective.

We would like to have most of our data be objective so that we aren’t introducing any bias, but sometimes we may not be able to avoid that. Can you think of an example? How about the RPE scale? The rating of perceived exertion can be used in many ways and is a cheap way to quantify the level of intensity of a given exercise. Unfortunately, it is highly subjective, and this can lead to issues when interpreting the values.

Table 6.1 depicts another example of objectivity and interrater reliability. Back in 2017, Canelo Alvarez and Gennady Golovkin fought to a split-decision draw. While you generally don’t want any fight to end in a draw, this one was much worse when you examine the score cards. The scorecards were transcribed into a table here from each of the 3 judges. If you are not familiar, both boxing and MMA use a 10/9 ”must” scoring system where whomever wins the round gets 10 points and the other fighter gets 9 points. If the round was particularly lopsided, a score of 8 may be awarded to the loser instead of 9, but that is quite rare. So, you can see how each judge scored each round and then totaled these scores to determine the overall winner. What was interesting was how differently Adelaide Byrd scored this fight from both Dave Moretti and Don Trella . She scored the fight as a convincing victory for Alvarez (Alvarez 118 to Golovkin’s 110), while Moretti and Trella scored the fight much closer (113 to 115 and 114 to 114). There was a lot of blowback after this fight when the scorecards were released, and Byrd was actually suspended for some time.  Based on the consistency between the other two judges, we might say that Byrd is not a reliable judge or more specifically, she does not seem to be as objective and lacks interrater reliability.

Relative and Absolute Measures of Reliability

Many of the variables we measure in exercise and sport science can be used to produce relative and absolute values. Consider oxygen consumption (VO 2 ), strength, and power measurements. Each can be used to produce a measure relative to the subject’s body mass (ml/kg/min, N/kg, or W/kg, respectively) or an absolute measure (l/min, N, W, respectively). Similarly, reliability can be measured relatively or absolutely . Relative measures of reliability are slightly different in their usage as they measure reliability relative to a specific sample or population. Relative measures of reliability provided an error estimate that differentiates between the subjects in a sample. As a result, these findings should not be applied to any sample or population that is different than the one studied. For example, imagine you are using a bioelectical impedance analysis (BIA) unit to quantify body composition for a study. Previous research has published reliability findings on the exact same device that you will be using. You could save a lot of time by simply citing the previous research demonstrating its reliability instead of completing your own analysis. Unfortunately, if relative measures of reliability were used (or those are the measures you wish to discuss), those findings are specific to the sample tested. This means that variable reliability should be evaluated with each new study or each new sample/population being tested if/when relative measures of reliability are used. This also means that comparisons of relative reliability findings should not be completed when the samples/populations are not similar. Absolute measures of reliability provide an estimate of the magnitude of the measurement error. Absolute measures of reliability can be inferred across samples as long as they have similar characteristics. When evaluating reliability, both relative and absolute measures of reliability should be included. Please refer to Figure 6.xxx below that provides options for evaluating each.

Evaluating Reliability

We might understand reliability a little better when we consider the observed, true, and error scores. As we have seen before, the observed score is the sum of the true score and error score. [2] In theory, the true score exists, but we will never be able to measure it. Anything that causes the observed score to deviate from the true score can be considered error.

Figure 6.2 Image of a bar plot that depicts 100% of an observed score. Roughly 80% is shaded as true, and the remainder is from error.

If we are essentially attempting to calculate how a variable measure might change from trial to trial we could consider it from a variance perspective. Or more specifically, a “variance unaccounted for” perspective which we discussed in the chapter on correlation. If one trial can predict a very large amount of the variance in another trial, this information is useful in terms of reliability as we will also know how much error variance there is from one trial to the next. Many methods to evaluate reliability are based off of the correlation and similar to the PPM correlation, scores can range from 0 to 1. It is generally desired that reliability coefficients be greater than 0.8.

Test-Retest reliability

The example described above is one version of test-retest reliability where one trial of a measure is evaluated against another. If the data is reliable, it should produce similar values. When using a PPM correlation to evaluate this, it is sometimes referred to as an “interclass” correlation coefficient. The prefix “inter” is used with a PPM coefficient because it is most often used to evaluate correlations “between” 2 variables. The term “intraclass” correlation coefficient (ICC) is likely more correct here as we are actually evaluating the same variable “within” multiple trials. As you may have noticed, this differentiation between inter- and intra- here can become a little confusing and is mostly semantic and/or simply used to classify different types of tests. As such, you may see the abbreviation ICC used for both. Either way, they are both considered relative measures of reliability.

Dataset 6.1

We can use Dataset 6.1 and/or the data in Table 6.2 above to evaluate the test-retest reliability of jump heights across 2 trials with a PPM correlation. Running a simple correlation we find a r value of 0.897. This means that we can predict 80.4% of the variance in trial 2 with the data from trial 1 (using the coefficient of determination ( r 2 )). Or we could subtract that value from 100% and say that the error variance is 19.6%.

The above example assumes the trial data was collected one right after the other in the same session. If a significant amount of time has passed between measurements, the term stability should be used instead of reliability. Stability is used to determine how subjects hold their scores.

Chronbach’s Alpha (ICCa)

The intraclass correlation Chronbach’s Alpha coefficient or ICCa is one of the most commonly used intraclass correlation models.  It examines variance from 3 different components: subject variance, trial variance, and subject-by-trial variance.

Most statistical software will have a function to calculate this. MS Excel does not, but as you can see by the formula, it can calculated with other Excel functions, which we’ve already learned how to calculate. We’d just need to calculate the variance for each of the components mentioned previously (trial, the variance for all trials, and the sum of each trial’s variance). Then we could plug that data into the formula below, where [asciimath]alpha[/asciimath] = alpha, [asciimath]k[/asciimath] = number of trials, [asciimath]sum_(i=1)^{k}[/asciimath] = sum of each trial’s variance, [asciimath]sigma_(yi)^{2}[/asciimath] = variance from trial [asciimath]i[/asciimath], and [asciimath]sigma_x^{2}[/asciimath] = total variance.

[asciimath]alpha = (k/(k-1))*(1-(sum_(i=1)^{k}sigma_(yi)^{2})/sigma_x^{2})[/asciimath]

While calculating alpha is possible in MS Excel, it isn’t exactly quick or easy and many used other statistical programs as a result.

The ICC is considered a relative measure of reliability, which means that it is specific to your sample. As mentioned earlier, you cannot infer the reliability found during testing from one sample to another if the ICC was the measure used.

Calculating Chronbach’s Alpha (ICCa) in JASP

In order to calculate Chronbach’s [asciimath]alpha[/asciimath] in JASP, each trial will need to be included as it’s own column and a minimum of 3 trials is required. As of JASP version 0.14.1, 3 trials were required, but previously (and potentially future versions) JASP users have been able to calculate ICCa with 2 trials. This is important as getting 3 maximal effort trials for performance data isn’t always realistic.

Dataset 6.2

This dataset includes 3 trials of jump height data measured in meters. Once the data have been imported, click on the Reliability module drop-down arrow and select the Classical version of the Single-Test Reliability Analysis. Then move each trial over to the Variables box and scroll down to the Single-Test Reliability menu. Deselect McDonald’s ω and select Chronbach’s α. The results should now appear.

A Chronbach’s α value of 0.981 is quite good as it is very close to 1. The 95% confidence intervals (CI) are also included. They indicate the 95% likelihood range where we would find the result if tested again with the same sample. This result could be written as ICCa = 0.981 [0.928,0.996], which is inidcating the ICCa 95% CI range from 0.928 to 0.996.

Issues with Correlational Methods

One issue with using a PPM correlation to evaluate reliability is that it is bivariate in nature and therefore can only evaluate 2 trials. Other methods like Chronbach’s Alpha (described above) or those based on a repeated measures ANOVA can overcome this issue.

Another issue that impacts all correlational reliability models (including ICCa above) happens when there is a consistent difference between trials. Table 6.2 shows a dataset containing 2 trials of heart rate data that correlates perfectly ( r = 1.00), but is statistically different.

Download Dataset 6.3 here .

If a PPM correlation is run on this data a perfect 1.0 correlation is observed and you can see this in the scatter plot below in Figure 6.3 with a perfectly straight trendline. So, is this data reliable? Nope. In fact, a paired samples  t test reveals a  p value of less than 0.000 indicating the trials are statistically different. Hopefully you examined the data and noticed a pattern. Each value in trial 2 is exactly 11 beats per minute higher than that of trial 1. The mean bpm of trial 1 is 69.75, while the mean of trial 2 is 80.75. Clearly, there is a difference and this can be seen in the box plot also in Figure 6.3. But why does our correlation not pick up on this? It’s because there is a consistent difference in the values. Think about a correlation as a ranking within the group. If you rank each subject according to their HR, what will happen to their rank if we change each trial #1 value by the same amount. Their rank will be unaffected. So, they still vary the same way, which means the correlation will not recognize it.

Figure 6.3 depicting the perfect correlation between 2 trials of heart rate data, but the right side shows a box plot where each trial obviously has a different mean.

Equivalence Reliability for Exams or Questionnaires

Often with exams or questionnaires, one might want to use multiple forms of the instrument. In order to evaluate their equivalence, each subject must participated in each exam. Again, the PPM could be used to determine how well the scores on one exam correlate with the other. If the r value is high enough, they might be considered equivalent.

While the equivalence method may be used for exams and questionnaires, it is generally not recommended to use different forms of data collection as substitutes for one another. This is especially true for performance testing as we’ve seen in one of the issues with using correlation as a measure of reliability above. Variables or trials that change similarly may look strongly correlated or reliable, but the magnitudes may be different. So whichever tool is used initially may not be interchangeable with another one.

One issue with the equivalence method is that it may be difficult to find subjects to take an exam once, let alone on 2 separate occasions. Another alternative is to administer one exam, but split the test in half afterward to evaluate the equivalence of the 2 halves to be used later. This is referred to as split-halves reliability.

Another issue with administering questionnaires is the length of the questionnaire. You are much more likely to get subjects to complete the questionnaire if it is short. That being said. the easiest way to increase the reliability of a questionnaire is to make it longer. Ideally, your questionnaire would be long enough to be reliable, but short enough that subjects do not quit while doing it. Finding this optimal zone could end up being somewhat of a “guess and check” method. Fortunately, there is a way to predict how the reliability might change depending on how we change the length of the exam or questionnaire (increasing or decreasing) called the Spearman-Brown Prophecy Formula. [3]

[asciimath]r_"kk" = (k*r_11)/(1+r_11*(k-1))[/asciimath]

r kk = predicted reliability coefficient, k = # of times the length of the test has changed, and r 11 = the original reliability coefficient

Imagine an exam was given that had a length of 25 items and its reliability was previously evaluated as 0.555. If we cahnge the length to 50 items, we are doubling its length. So, k = 2. What would the new predicted reliability be then? This can be predicted if the values are plugged into the formula.

[asciimath]r_"kk" = (2*0.555)/(1+0.555*(2-1))[/asciimath]

[asciimath]r_"kk" = 1.11/1.555[/asciimath]

[asciimath]r_"kk" = 0.714[/asciimath]

Doubling the length of the exam increased the reliability coefficient (predicted) to 0.714, which is going in the direction desired. What do you think would happen if we instead shortened the original 25 item test length to 15 items? We can find k by dividing 15 by 25. This gives us a k of 0.6. Since the k is now less than 1, we will be decreasing the value of the numerator. If the value of the denominator stays the same, the value will have to decrease. Decreasing the item length to 15 shrinks the r kk to a value of 0.214.

Standard Error of Measurement (SEM)

The standard error of the measurement or SEM is an absolute measure of reliability, meaning that it indicates the magnitude of the measurement error and you can infer reliability results from one sample to another assuming they have similar characteristics. It essentially tells us how much the observed score varies due to measurement errors. An added benefit of this measure is that it stays in the units of the original measure. So, if we were measuring VO 2 max in ml/kg/min, the SEM value would also be in ml/kg/min. We can calculate the SEM as the standard deviation (σ) times the square root of 1 minus the ICC value. Some statistical software will have a function for this measure, but not all. So it is pretty common that you may need to calculate this one by hand.

[asciimath]SEM=sigma*sqrt(1-I C C )[/asciimath]

Consider an example where we have a mean of 44.1 ml/kg/min, a standard deviation of 12.6 ml/kg/min, and an ICC of 0.854, what is the SEM? All the information needed is provided, so we can just plug the information in to determine it and we end up with a value of 4.81 ml/kg/min. So, is that good or bad? We can interpret the SEM by it’s size relative to the mean. 4.81 is roughly 11% of 44.1 (4.81/44.1*100=10.9%). We generally want our values to be pretty low here, but there is no strict cutoff.

One final note on the SEM is that it assumes that your data are homoscedastic, meaning that there is no proportional bias. If the portion of your data that are either really large or really small and have a greater chance to be an error than the portion in the middle, it would not be considered homoscedastic. It would then be considered heteroscedastic (proportional bias is present). This assumption should be checked prior to using the SEM. If your data violate this assumption, you can use the coefficient of variation, which we will discuss next. The assumption of homoscedasticity can be checked with several statistical test depending on your data type and the models used. The most common are the Levene’s test of homogeneity (for samples with multiple groups) or the Breusch-Pagan test (without groups).

Calculating the SEM in JASP

JASP does not have a simple check box to add in the SEM to reliability analysis, but you can simply add the mean and standard deviation to the tabular results shown earlier by checking boxes.

Now the SEM can be calculated.

[asciimath]SEM=0.004*sqrt(1-0.981)[/asciimath]

This produces an SEM of 0.0006 m, which is very small. Don’t forget that the SEM will carry the same unit of measure as the original variable.

Coefficient of Variation (CV)

Unlike the SEM, the coefficient of variation, or CV, assumes that your data does show proportional bias (heteroscedastic). Much of the data we see in sport and human performance are this way. [4] [5] Those that can produce a lot of force or those that can run really fast are more prone to have errors in their measurements. This has nothing to do with the subjects themselves, but when we test those that will produce some of the more extreme values our equipment is more likely to produce an error. The CV is more appropriate for this type of data. It’s also much easier to calculate since we only need to know the mean and standard deviation of the sample. We simply divide the standard deviation (σ) by the mean ([asciimath]m[/asciimath] and multiply by 100.

[asciimath]CV=(sigma/m)*100[/asciimath]

This gives us a percentage that will tell us how much variation there is about the mean of the sample. In general, we want this value to be less than 15%. Unfortunately, most statistical software does not include functions to calculate this and that may be because it is pretty simple to do ourselves. So you will most often need to compute this one yourself.

Let’s revisit our VO 2 max example. With a mean of 44.1 and a standard deviation of 12.6, what is the CV? 12.6 divided by 44.1 is 0.2839. We multiply that by 100 and we are left with a CV of 28.39%.

[asciimath]CV=(12.6/44.1)*100=28.39%[/asciimath]

This is above our desired 15% cutoff, so this does show a decent amount of variation. Hopefully you are also now noticing that we had a decent ICC value of 0.854, but a poor CV value. These are both reliability measures, so do we have both good an bad reliability? Actually, that is somewhat true. Keep in mind that the ICC is a relative measure of reliability and the CV and the SEM are absolute measures of reliability. So we could say that our VO 2 max data shows good relative reliability, but less than desirable absolute reliability.

Reliability versus Agreement

The way we defined reliability earlier is accurate, but is also somewhat broad, which leads to some confusion about the difference between reliability and agreement from a statistical perspective. This is generally just semantics, but I will try to clear it up a little bit here before we discuss quantifying agreement. Both are concerned with error, but reliability is more concerned with how that error relates to each person’s score variation and agreement is concerned with how closely the scores bunch together. [6]

If we move from the theoretical definition to the practical usage of these two, we see a greater divide generally. Agreement in kinesiology studies is almost exclusively used when working with dichotomous data such as, win or lose, pass or fail, or left or right. Reliability assessments like we discussed previously are used on other types of data. Methods of measuring agreement also differ in that they can be used with validity measures as well.

Now that we have discussed these differences, let’s take a look at how we can evaluate agreement with the proportion of agreement and the Kappa coefficient.

Proportion of Agreement (P)

Our first measurement of dichotomous agreement is the proportion of agreement. It is the ratio of scores that agree to the total number of scores. Looking at the diagram in Figure 6.4 below, that you may recognize as a chi square contingency table. Each inner quadrant is named as n and either 1, 2, 3, or 4. Quadrants n 1 and n 4 represent the number of scores that agreed between trials, days, or tests and quadrant n 2 and n 3 represent the disagreements.  For example, let’s say we are classifying strength asymmetry of the lower body. If a subject has demonstrated to have a left-side asymmetry in trial 1 and a left-side asymmetry in trial 2, they would be counted in the top left quadrant (n 1 ), where we see that 21 subjects were counted. If they presented with a right-side asymmetry in trial 1, but a left-side asymmetry in trial 2, they would be counted in the top right or n 2 quadrant and this square represents a disagreement between trials. Likewise, n 3 also represents a disagreement. Subjects counted there present with a left-side asymmetry in trial 1 and a right-side asymmetry in trial 2. Finally, n 4 represents those who present with right-side asymmetries in both trials, so it is our other agreement quadrant.

Figure depicting a chi square contingency table to outline agreement between 2 trials of asymmetry testing.

In order to calculate the proportion of agreement, we simply divide the total number of agreements (found in n 1 and n 4 ) by the total number of scores (summing all quadrants).

[asciimath]P=(n_1+n_4)/(n_1+n_2+n_3+n_4)[/asciimath]

[asciimath]P=(21+19)/(21+11+13+19)[/asciimath]

[asciimath]P=(40/64)=0.625[/asciimath]

This leaves us with 40 agreements out of 64 scores or a proportion of agreement of 0.625. We can convert this to a percentage of agreement by multiplying by 100. This would mean that we have 62.5% agreement between trial 1 and trial 2. The proportion of agreement ranges from 0 to 1 where higher values indicate more agreement

One issue with this method is that it does not account for chance occurrences, which we have seen can cause issues in our results and interpretations.

Kappa coefficient (K)

The Cohen’s Kappa coefficient, or simply Kappa coefficient does account for agreements coming from chance alone. This makes it a more frequently used method. Although, since you must calculate the proportion of agreement as part of Kappa, you could argue that it is used just as frequently.

To calculate Kappa, you must first calculate the proportion of agreement as we did previously.  We then subtract the proportion of agreement coming from chance and divide that value by one minus the proportion due to chance. We will discuss how to calculate the proportion due to chance on the upcoming slides.

[asciimath]K=((P-P_c)/(1-P_C))[/asciimath]

Kappa values may range from -1 to +1, similar to correlation values, but we generally only want to see positive values here. Negative values indicate that we have more observed agreements coming due to chance occurrences. Similar to correlations, values closer to 1 indicate greater agreement. You can see the interpretation scale created by Landis and Koch (1977) below. [7]

Kappa is the preferred method for evaluating agreement since it does account for chance; however, due to the way proportion due to chance is calculated, smaller sample studies are not ideal candidates for this method.

Moving forward with calculating the Kappa coefficient, it makes sense to format the data into a true table like the one in Table 6.7. This is the same data from before, but now the sums of the columns and rows have been added (Total).

In order to calculate the proportion of agreement due to chance ( P c ) , we must add in marginal data which are the sums of the rows and columns. Those values are then multiplied and then divided by the total n 2 . This must be done for row 1/column 1 and for row 2/column 2, so we will end up with 2 values for P c . They can then be added to calculate the total P c . In this case, row 1 has a marginal value of 32 and column 1 has a marginal value of 34. Row 2 has a marginal value of 32 and column 2 has a marginal value of 30. The total n should be the same as the sample size, 64.

[asciimath]P_c=(32*34)/64^2=0.27[/asciimath]

[asciimath]P_c=(32*30)/64^2=0.23[/asciimath]

These can be added together to produce the total P c needed to calculate Kappa.

[asciimath]P_c=0.27+0.23=0.50[/asciimath]

This value can now be used to calculate Kappa using the formula previously shown.

[asciimath]K=(0.625-0.5)/(1-0.5)=0.25[/asciimath]

What do you notice? The Kappa value is quite a bit lower than the proportion of agreement. Why do you think it decreased? It decreased because it factored out the proportion of agreement that was coming from chance occurrence.

There is nothing special about completing this analysis in MS Excel. Assuming the table is set up correctly, the summation column and rows can be added simply along with the other calculations. If one does this sort of analysis frequently, it might be worthwhile to set up a template file so that the calculations can be completed as soon as the new values are typed in. As of JASP 0.14.1, there is not a current solution for calculating Kappa, but this could change with future versions.

As mentioned previously, validity refers to how truthful a measure is. That is a somewhat broad definition as validity can take on many forms. There are 3 main forms of validity: content-related reliability, criterion-related reliability, and construct-related reliability.

Content-Related Validity

Content related validity is also known as “face validity” or logical validity. It answers the question does the measure clearly involve the performance to be evaluated. Generally, there isn’t any statistical evidence required to back this up. For example, if it rains and I don’t have an umbrella, I will get wet walking outside. If it is raining on a Monday and I forgot my umbrella, what will happen?

Evaluations of content validity are often necessary in survey-based research. Surveys must be validated before they can be used in research. This is accomplished by sending a draft of the survey to a content expert who will evaluate if the survey will answer the questions the experimenters are intending them to.

Criterion-Related Validity

Criterion-related validity attempts to evaluate the presence of a relationship between the standard measure and another measure. There are two types of criterion-related validity.

  • Concurrent validity evaluates how a new or alternative test measures up to the criterion or “gold standard” method of measurement.
  • Predictive validity evaluates how a measurement may predict the change or response in a different variable.

Concurrent validity is pretty common in technology related kinesiology studies. Think about a new smartwatch that counts repetitions, measures heart rate, heart rate variability, and tracks sleep. A single study could seek to validate one or more of these measures, but this example will focus on heart rate and heart rate variability. What would the gold standard method of measuring these likely be? Probably an EKG. So a researcher could concurrently (or simultaneously) measure those variables with the new smartwatch and an EKG in 50 subjects. The validity of the smartwatch would be dependent on how well it measures up to the gold standard measure, the EKG. If the values are very different, we’d likely conclude that it isn’t valid. As a side note, it is also common to evaluate reliability during these types of studies.

Click here to see an example of one of these types of studies that was presented as a poster .

Predictive validity seeks to determine how variation in specific variables might predict outcomes of other variables. A good example of this is how diet, physical activity, hypertension, and family history can be predictive of heart disease. Regression has been discussed in this book before, and hopefully you recall that the prediction equation will predict values or outcomes based upon specific variable inputs. Those prediction values are usually not perfect, and the discrepancy between the predicted values and the true values is known as the residual. The larger the residual values are, the less predictive validity we observe.

A key aspect of criterion-related validity is determining what the criterion measure should be. Keep in mind that predictive validity uses past events and data to create prediciton equations. If it is the goal to prevent some sort of performance, it would be a good idea to validate the predicitions against the actual performance whenever that happens (though this may not always be possible). For concurrent validity, it is always best to use the gold standard test as the criterion. When that is not possible, another previously validated test may be used instead.

Construct-Related Validity

Construct-related validity evaluates how much a measure can evaluate a hypothetical construct. This often happens when one attempts to relate behaviors to test scores or group membership. We often see this with political parties and polling results. Members of one party often vote a specific way on certain topics, while the other party would usually vote the opposite way.

These constructs are easy to imagine, but often difficult to observe. Consider a hypothetical example involving ethics in sport. How can you determine which athletes are cheaters and which play fair? This isn’t easy because they don’t wear name tags describing as much. But, you probably have some thoughts about specific athletes in your favorite sport. You might really like some and really dislike others based on past events observed.

Studying this type of research question is quite difficult too. You might come up with scenarios that put subjects into situations where they are forced to make several choices and depending on the test results, you classify them as a “good sport” or a “bad sport” based on the number of times they chose to break the rules. Once they are classified with a grouping variable, many of the means comparison test methods like a t test or an ANOVA could then be applied.

Evaluating Validity

By now, you should be recognizing just how useful the correlation can be. It will be the primary method for validation as well. If you are keeping track, it can be used to evaluate test-retest reliability, objectivity, equivalence reliability and both concurrent and predictive validity. That’s in addition to everything discussed in the chapter on correlation. At this point if you had to guess which statistical test should be run for a given example, the correlation most likely has the highest probability of being correct. As correlation in both MS Excel and JASP have already been demonstrated, they will not be shown again. Please refer to Chapter 3 if you need to review.

Similar issue with correlation (valid does not mean interchangeable)

While we will nearly always select the correlation for our statistical evaluation of validity, there is one caveat. If a device is highly correlated with the gold standard or criterion measure, we will likely conclude that it is valid. But, that does not mean that we can use the two devices interchangeably. If you read the research poster example above, you’ve seen why. In that study multiple vertical jump assessment devices were validated concurrently. Most were highly correlated with the criterion measure (a force plate), but t tests also revealed that they were statistically different. Stating that devices can be highly related but different may sound counterintuitive, but this comes back to the concept we discussed earlier in this chapter on consistent change. If one of the devices consistently gives an inflated value, but gives everyone that same inflated value, the correlation will still be very strong. A comparison of the means will likely show the difference though. So, the moral is that even though both devices may be valid, you should stick with the same one throughout your study because they can still be different. Please refer back to Figure 6.3 for a visual representation of this issue.

Bland-Altman assessment of agreement

Building on the issue with consistent differences just discussed, a different technique that evaluates agreement was created by Bland and Altman in 1986. If you recall, from the reliability portion of this chapter, many also use agreement when evaluating reliability. [8] This method will be described along with a practical example.

In Table 6.7 and dataset 6.4 ( click here to download dataset 6.4 ) we have a new smartphone app that uses the camera’s led flash to measure heart rate. This will be validated against an EKG as the criterion measure. The new smartphone app correlates pretty well with it, with an r value of 0.761. It isn’t perfect, but that is pretty strong as it accounts for roughly 58% of the variance ( r 2 = 0.580).

One issue that Bland and Altman brought up is that the measure magnitude may influence the accuracy of the measurement in some data. This has actually already been discussed earlier in this chapter as proportional bias. If our larger or smaller values from the smartphone deviate more than the middle values from the gold standard, we would likely have some proportional bias or heteroscedasticity. We can examine this with a plot by adding in the measure differences and plotting those against the averages.

Figure 6.5 Scatter plot depicting data with proportional bias

Notice in Figure 6.5 several of the higher heart rate values deviate further from the horizontal 0 line. This demonstrates a trend where the higher the heart rate value is the larger the error magnitude compared to the EKG. In order to create this plot, we will need to compute a couple of new columns for our dataset.

Now we’ve added in another column depicting the differences between each of the smartphone and the EKG heart rate measurements and another one that averaged the EKG and smartphone values. This data was used in Figure 6.5, which shows the device differences on the y axis and the averages on the x axis. This funnel type shape is indicative of heteroscedasticity no matter which direction it is pointing. Given what is seen here, it seems that whenever higher heart rates are collected, there is more difference from the EKG than when lower heart rates are collected. This should be a cause for concern if one wishes to used this for measurement during exercise when higher heart rates should be expected. The validity will suffer in this case.

Creating a Bland-Altman Plot in MS Excel

Beginning with Dataset 6.4 , the final 2 columns appearing in Table 6.8 need to be added before the Bland-Altman plot can be created. Creating the mean column can be accomplished with the =AVERAGE() built-in Excel function where both heart rate values for a given row are included. Copying and pasting this value for each row will produce values for each. [9] The differences column can be computed by creating a differences column which subtracts the EKG value from the Smartphone value. This can be calculated in Excel as B2-A2 in Dataset 6.4. Copying and pasting the formula all the way down will produce values for each of the rows. Next, the average between the EKG heart.

Now highlight the two new columns and select Insert from the ribbon. Selecting the basic scatter plot should produce a plot that looks very simlar to the one shown below in Figure 6.6. It will not be identical because Figure 6.5 was created with R and the x axis has been adjusted to zoom in on the data. Formatting the x axis in Excel may also be completed. Adding chart elements such as axis titles are also very helpful for viewers.

Figure 6.6 Bland-Altman plot created in MS Excel

Creating a Bland-Altman Plot in JASP

Similar to the solution in Excel, the final 2 columns need to be created. But, you first must make sure that the imported data have the correct variable types selected. More than likely, the SmartPhone data was imported as ordinal. This will be an issue if it is not changed to scale. If you recall back to Chapter 2 of this book, only scale data can be manipulated and the result be useful. If we try to compute the column while it is still listed as ordinal, it will remain blank.

The mean of the EKG and smartphone measures can be created by adding the two variables together and dividing by 2. [10] The difference column can be created by subtracting the EKG value from the smartphone value. Both of these are considered computed variables, so they must added by clicking the plus sign to add a new variable. As you may recall, variables can be computed using JASP’s drag and drop formula builder or using R syntax. Use whichever you are comfortable with, but don’t be afraid of R here as it is very straightforward in this example. Using the R syntax to compute the mean you would need to type in “(EKG+SmartPhone)/2” and the difference column would be “Smartphone-EKG.” Then click compute and the new columns should be populated with data.

The Bland-Altman plot can be created in the Descriptives module of JASP. After clicking the Descriptives icon, move the two newly created variables into the variables box. Now move down to the Plots menu and check the Scatter Plots option. Your plot will now be created, but you’ll need to make a few adjustments to make it look similar to Figure 6.5. By default, density plots are included above and to the right of the scatter plot. Check none to remove both of those. Now click to uncheck and remove the regression line and the plot should look very similar to Figure 6.7 below. If it looks like the same plot but rotated 90°, you likely have the variables listed out of order. Mean (or the gold standard) should be listed prior to the difference variable. To fix this, simply remove the variable listed first in the Variables box and add it again so that it appears afterward.

Figure 6.7 A Bland-Altman plot produced in JASP

Closing remarks

As the chapter name indicates, reliability and validity are both very important. That being said, if we also understand objectivity, relevance, and agreement, we will likely have a better understanding of reliability and validity. While many may incorrectly do this, the results of reliability and validity analyses are difficult to generalize. This is because the studies where they were analyzed often utilizes different sample characteristics than those they (or you) want to generalize to. Even when you have a similar population, you may not be able to standardize a lot of other variables, so you probably shouldn’t expect as favorable of outcomes as have been published previously in a highly standardized laboratory study. But, if you follow the testing protocol as closely as possible, your results will be as reliable and valid as potentially possible.

  • Morrow, J., Mood, D., Disch, J., and Kang, M. (2016). Measurement and Evaluation in Human Performance. Human Kinetics. Champaign, IL. ↵
  • Vincent, W., Weir, J. 2012. Statistics in Kinesiology. 4th ed. Human Kinetics, Champaign, IL. ↵
  • Morrow, J., Mood, D., Disch, J., and Kang, M. 2016. Measurement and Evaluation in Human Performance. Human Kinetics. Champaign, IL. ↵
  • Bailey, C. (2019). Longitudinal Monitoring of Athletes: Statistical Issues and Best Practices. J Sci Sport Exerc.1:217-227. ↵
  • Bailey, CA, McInnis, TC, Batcher, JJ. (2016). Bat swing mechanical analysis with an inertial measurement unit: reliability and implications for athlete monitoring . Trainology, 5(2):42. ↵
  • de Vet HC, Terwee CB, Knol DL, Bouter LM. When to use agreement versus reliability measures. J Clin Epidemiol. 2006 Oct;59(10):1033-9. ↵
  • Landis, JR and Koch, GG (1977). The measurement of observer agreement for categorical data. Biometrics, 33:159-174. ↵
  • Bland JM, Altman DG. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet;1(8476):307-10. ↵
  • Note: that some may wish to use the gold standard value instead of the mean here. If using a Bland-Altman plot for reliability purposes (trial to trial agreement), the mean is necessary. If using if for validity purposes, it may not be, but this needs to be described and justified no matter which method is used. ↵

refers to the consistency of data. Often includes various types: test-retest (across time), between raters (interrater), within rater (intrarater), or internal consistency (across items).

how well scores represent the variable they are supposed to; or how well the measurement measures what it is supposed to.

Relative measures of reliability provided an error estimate that differentiates between the subjects in a sample

Absolute measures of reliability provides an estimate of the magnitude of the measurement error.

Quantitative Analysis in Exercise and Sport Science by Chris Bailey, PhD, CSCS, RSCC is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License , except where otherwise noted.

Share This Book

Reliability, Validity and Ethics

Cite this chapter.

quantitative research validity reliability

  • Lindy Woodrow 2  

1546 Accesses

This chapter is about writing about the procedure of the research. This includes a discussion of reliability, validity and the ethics of research and writing. The level of detail about these issues varies across texts, but the reliability and validity of the study must feature in the text. Some-times these issues are evident from the research instruments and analysis and sometimes they are referred to explicitly. This chapter includes the following sections:

Technical information

Reliability of a measure

Internal validity

External validity

Research ethics

Reporting on reliability

Writing about validity

Reporting on ethics

Writing about research procedure

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unable to display preview.  Download preview PDF.

Further reading

Dörnyei, Z. (2007). Research methods in applied linguistics: Quantitative, qualitative and mixed methodologies . Oxford: Oxford University Press.

Google Scholar  

Paltridge, B. & Phakiti, A. (2010) (Eds.). Continuum companion to research methods in applied linguistics . London: Continuum.

Sources of examples

Lee, J.-A. (2009). Teachers’ sense of efficacy in teaching English, perceived English proficiency and attitudes toward English language: A case of Korean public elementary teachers . PhD, Ohio State University.

Levine, G. S. (2003). Student and instructor beliefs and attitudes about target language use, first language use and anxiety: Report of a questionnaire study. Modern Language Journal , 87(3), 343–364, doi: 10.1111/1540-4781.00194.

Article   Google Scholar  

Lin, H., Chen, T., & Dwyer, F. (2006). Effects of static visuals and computer-generated animations in facilitating Immediate and delayed achievement in the EFL classroom. Foreign Language Annals , 39(2), doi: 203-219.10.1111/j.1944-9720.2006.tb02262.x.

Mills, N. (2011). Teaching assistants’ self-efficacy in teaching literature: Sources, personal assessments, and consequences. Modern Language Journal , 95(1), 61–80. doi: 10.1111/j.1540-4781.2010.01145.x.

Rai, M. K., Loschky, L. C., Harris, R. J., Peck, N. R., & Cook, L. G. (2011). Effects of stress and working memory capacity on foreign language readers’ inferential processing during comprehension. Language Learning , 61(1), 187–218. doi: 10.1111/j.1467-9922.2010.00592.x.

Rose, H. (2010). Kanji learning of Japanese language learners on a year-long study exchange at a Japanese university: An investigation of strategy use, motivation control and self regulation . PhD, University of Sydney.

Download references

Author information

Authors and affiliations.

University of Sydney, Australia

Lindy Woodrow

You can also search for this author in PubMed   Google Scholar

Copyright information

© 2014 Lindy Woodrow

About this chapter

Woodrow, L. (2014). Reliability, Validity and Ethics. In: Writing about Quantitative Research in Applied Linguistics. Palgrave Macmillan, London. https://doi.org/10.1057/9780230369955_3

Download citation

DOI : https://doi.org/10.1057/9780230369955_3

Publisher Name : Palgrave Macmillan, London

Print ISBN : 978-0-230-36997-9

Online ISBN : 978-0-230-36995-5

eBook Packages : Palgrave Language & Linguistics Collection Education (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Advanced Scripting
  • Online Diaries
  • Facial Coding & Neuroscience
  • Online Communities
  • Online Qualitative Recruitments
  • Digital Behavior
  • Geolocation
  • Audio-matching
  • In-Home Usage Testing [IHUT]
  • Mystery Shopper
  • Ad Tracking Cookies
  • Knowledge base

Contact us

Maximizing Data Accuracy: Validation and Reliability of Surveys

When we need to gather data on a specific and timely research topic, ad-hoc surveys are the most appropriate because they are not part of ongoing studies. Now, how can we work on survey validation and improve its reliability?  

   

Importance of Survey Validation within Market Research  

Survey validity and reliability are achieved by monitoring these four processes:  

  • Ensuring that the questions are correctly formulated and make sense  
  • Ensuring that the sample is representative of the target population  
  • Eliminating biases in survey design and execution  
  • Ensuring that the collected data is as accurate as possible  

Online data sampling provider

Only by approaching survey validation in this way can we ensure that we are measuring the right variables and that the results are valid.  

Key Steps for Online Survey Validation  

We know the process to achieve survey validation, but how can we complement it with an online survey that provides reliable results? Let's see:  

  • Defining the survey objective and asking the right questions transparently to obtain the necessary information and avoid biased results  
  • Analyzing the necessary sample size to obtain representative and significant results based on participant selection criteria with the help of sampling services, which we will discuss later  
  • Measuring the validity and reliability of the survey using statistical techniques such as Cronbach's alpha coefficient and exploratory factor analysis  

By following these steps, it is easier to know how to calculate the reliability of a survey, which in turn will allow obtaining precise and relevant information for the benefit of companies.  

Reliability Analysis: Ensuring Data Consistency  

To analyze survey validation, it is essential to carry out a series of considerations to evaluate the consistency and accuracy of the obtained data:  

1. Checking that the survey is well-designed and that its structure is coherent and clear. The questionnaire must be worded precisely and without biases that may influence respondents' answers.

2.Calculating the sample size to obtain reliable results. A sample that is too small may not be representative, while a sample that is too large may be costly and impractical  

3.Evaluating the consistency of respondents' answers by including repeated or control questions in the questionnaire  

Since a poorly valid ad-hoc survey can lead to organizations drawing incorrect conclusions, it is important to conduct a rigorous reliability analysis as many times as necessary. Only then are truthful answers guaranteed.  

How is Validation and Reliability Applied in Ad-hoc Surveys?  

To ensure the reliability of data collected in ad-hoc surveys, at Netquest we implement measures such as control questions to identify any inconsistent responses. This way, we verify the quality and consistency of responses throughout the survey.  

Once we have finished collecting data, we conduct an analysis to estimate the reliability of the information.  

How Our Data Panel Elevates the Quality of Ad-hoc Surveys  

Our sampling service improves the reliability of ad-hoc surveys through an online panel, composed of real users who participate by answering questions in exchange for Korus, points that can be exchanged for a wide variety of gifts online.  

Such motivation makes participants act genuinely, offering better quality standards.  

The Key to High-Quality Surveys  

The best strategy to ensure that surveys are of high quality is to collect data based on validity and reliability. This is achieved by asking clear and relevant questions to representative samples of the target population with the help of sampling services.  

Finally, it is crucial to validate the survey through pilot tests; consistent data collection methods such as standardized questionnaires and random sampling techniques; internal consistency analysis of all responses, and some reproducibility studies.  

At Netquest, the quality of our data is fundamental to providing reliable and accurate results in every research project. We have international certifications such as ISO 20252 and GDPR standards that make our services stand out for their reliability in selecting real panelists, ensuring sample representativeness, and result validity. With a focus on excellence and transparency, we are committed to providing high-quality data that drives informed decisions and effective strategies for our clients.  

Do not hesitate to contact us if you want to know more about our services or need help with your next research project.  

Alejandra Ortiz

Bachelor's degree in Communication Sciences from UNAM with studies in advertising, digital marketing, and market research. Coffee and animal lover with a keen interest in data analysis and content creation. Always seeking new challenges and learning opportunities.

Data provider for research

How to Obtain a Representative Sample fo...

Obtaining a representative sample is a critical step in any market research or quantitative study. A quality sample ...

Data provider

Advantages and Disadvantages of Online S...

The digital era continues to evolve, and so do online surveys. This method of data collection

ISO/TC 225 announces latest updates on r...

Over three days in October, working groups of the

Subscribe to our blog and receive the latest updates here or in your email

quantitative research validity reliability

ORIGINAL RESEARCH article

Reliability and validity of a novel attention assessment scale (broken ring envision search test) in the chinese population.

Yue Shi

  • Department of Rehabilitation Medicine, Third Affiliated Hospital of Soochow University, Changzhou, China

Background: The correct assessment of attentional function is the key to cognitive research. A new attention assessment scale, the Broken Ring enVision Search Test (BReViS), has not been validated in China. The purpose of this study was to assess the reliability and validity of the BReViS in the Chinese population.

Methods: From July to October 2023, 100 healthy residents of Changzhou were selected and subjected to the BReViS, Digital Cancelation Test (D-CAT), Symbol Digit Modalities Test (SDMT), and Digit Span Test (DST). Thirty individuals were randomly chosen to undergo the BReViS twice for test–retest reliability assessment. Correlation analysis was conducted between age, education level, gender, and various BReViS sub-tests including Selective Attention (SA), Orientation of Attention (OA), Focal Attention (FA), and Total Errors (Err). Intergroup comparisons and multiple linear regression analyses were performed. Additionally, correlation analyses between the BReViS sub-tests and with other attention tests were also analyzed.

Results: The correlation coefficients of the BReViS sub-tests (except for FA) between the two tests were greater than 0.600 ( p  < 0.001), indicating good test–retest reliability. The Cronbach’s alpha coefficient was 0.874, suggesting high internal consistency reliability. SA showed a significant negative correlation with the net score of D-CAT ( r  = −0.405, p  < 0.001), and a significant positive correlation with the error rate of D-CAT ( r  = 0.401, p  < 0.001), demonstrating good criterion-related validity. The correlation analysis among the results of each sub-test showed that the correlation coefficient between SA and Err was 0.532 ( p  < 0.001), and between OA and Err was-0.229 ( p  < 0.05), whereas there was no significant correlation between SA, OA, and FA, which indicated that the scale had good informational content validity and structural validity. Both SA and Err were significantly correlated with age and years of education, while gender was significantly correlated with OA and Err. Multiple linear regression suggested that Err was mainly affected by age and gender. There were significant differences in the above indexes among different age, education level and gender groups. Correlation analysis with other attention tests revealed that SA negatively correlated with DST forward and backward scores and SDMT scores. Err positively correlated with D-CAT net scores and negatively with D-CAT error rate, DST forward and backward scores, and SDMT scores. OA and FA showed no significant correlation with other attention tests.

Conclusion: The BReViS test, demonstrating good reliability and validity, assessing not only selective attention but also gauging capacities in immediate memory, information processing speed, visual scanning, and hand-eye coordination. The results are susceptible to demographic variables such as age, gender, and education level.

1 Introduction

Attention is the foundation of all cognitive functions, the prerequisite for continuous information processing, and a gateway for the flow of information to enter the brain and undergo selection ( Petersen and Posner, 2012 ). Precise and accurate assessment of attentional functions is key in cognitive research and a precondition for the rehabilitation of cognitive disorders. In clinical neuropsychology, visual search tasks (VSTs) are frequently used to evaluate selective visual attention deficits in patients with neurological conditions ( Eglin et al., 1989 ; Luck et al., 1989 ; Utz et al., 2013 ). These typically include paper-and-pencil target cancellation tasks such as the Attention Matrix ( Della Sala et al., 1992 ), Ruff 2&7 Selective Attention Test ( Marioni et al., 2012 ), Letter Cancellation Test ( Uttl and Pilkenton-Taylor, 2001 ), and the Visual Spatial Attention subtest in the Oxford Cognitive Screen ( Demeyere et al., 2015 ), which are effective tools for detecting attention deficits post-stroke. However, existing VSTs do not take into account the potential impact of stimulus layout and crowding on the test results of participants. Facchin et al. developed a novel attention assessment scale—the Broken Ring enVision Search Test (BReViS) to evaluate attentional functions ( Facchin et al., 2023 ). It assesses different components of attention including selective attention, the visual–spatial orientation of attention, and focal attention involving crowding phenomena, and is a novel open-ended paper-and-pencil assessment tool.

While studies have shown the effectiveness and applicability of the BReViS test in the Italian population and provided specific Italian normative data, its suitability for the Mainland Chinese population is yet to be concluded. Therefore, this study aims to examine the reliability and validity of the BReViS test in the healthy Chinese population and to analyze the characteristics of its preliminary application, in the hope of finding a simple and feasible tool for the clinical environment to assess neuropsychological patients’ attention deficits and provide a basis for the assessment and rehabilitation treatment of attentional disorders.

2 Sample and methods

2.1 study procedure.

General Information: From July to October 2023, a total of 100 healthy residents, including staff and accompanying personnel from the First People’s Hospital of Changzhou and residents of Tianning and Xinbei districts of Changzhou, were selected. The cohort comprised 47 males and 53 females; ages ranged from 19 to 84 years, with an average age of (52.35 ± 22.01) years; years of education ranged from 2 to 20 years, with an average of (12.39 ± 3.86) years. Of these, the number of people with 2 years of education was 1.

Inclusion criteria: Age 19–84 years; Right-handed; Normal or corrected-to-normal vision.

Exclusion criteria: Auditory, visual, or speech impairments; Past history of neurological or psychiatric diseases (including brain injury, stroke, clinically diagnosed dementia, depression, etc.); History of addiction to tobacco, alcohol, or addictive drugs.

Grouping method: In order to make between-group comparisons between different ages, education levels and genders, the subjects were divided into 4 groups according to different ages in the statistical analyses, with those aged 18–34 years classified as the youth group, those aged 35–49 years classified as the young-adult group, those aged 50–65 years classified as the middle-aged group, and those older than 65 years classified as the senior group. Similarly, they were divided into four groups according to their education level: those with 1–6 years of education were classified as the elementary group, those with 7–9 years of education were classified as the middle school group, those with 10–12 years of education were classified as the high school/vocational group, and those with more than 12 years of education were classified as the college/university and above group. They were divided into male and female groups by gender. Demographic characteristics of the groups are reported in Table 1 . Thirty subjects were randomly selected as the retesting group and the BReViS test was administered again after 2 weeks. There were 30 subjects in the retesting group, of whom 14 were male and 16 were female; their ages ranged from 19 to 72 years, with a mean of (44.07 ± 15.67) years; and their years of education ranged from 6 to 19 years, with a mean of (13.86 ± 2.81) years.

www.frontiersin.org

Table 1 . Demographic characteristics of the patients’ sample.

2.2 Measurements and applied questionnaires

2.2.1 the brevis test.

It was developed by Facchin et al. (2023) . We have obtained authorization from the original authors to use it. The test consisted of four cancellation quiz cards, each consisting of five rows of circles with notches in different orientations arranged in different layouts and degrees of crowding, with 25 targets per card and randomly defined target locations. Subjects were asked to identify and cross out all the targets on each card that had the same notch orientation as the circles shown at the top of the card, and to record the execution time, number of omissions, self-corrections, and errors crossings for the completion of the 4 test cards. The performance time for each quiz card was calculated based on the execution time and omissions for each card. The calculation formula is as follows:

By combining the execution times of the four test cards, the following four indices are calculated: Selective Attention (SA), Orientation of Attention (OA), Focal Attention (FA), and Total Errors (Err).

SA represents the capacity to suppress irrelevant stimuli (distractors) and solely select relevant stimuli (targets) under the simplest conditions. It directly corresponds to the performance time of the first card (linear layout, low crowding), which is less affected by random arrays and crowded displays. SA = Performance time for the first card. Higher SA index values suggest lower efficiency of selective attention.

OA refers to the strategic direction of visual attention, which is the capacity to guide selective visual attention with effective endogenous strategies throughout the visual scene ( Connor et al., 2004 ), one of the two components of visual–spatial attention measured by BReViS. High OA index values indicate an inability to follow effective endogenous strategies during the visual search process, necessitating exogenous cues to perform the task correctly. It is calculated with the following formula using the performance time of each card:

FA can be interpreted as the ability to adjust the focus of attention based on the position of stimuli within the array, another component of visual–spatial attention ( Castiello and Umilta, 1990 ). It corresponds to the comparison between two levels of crowding: high and low. High FA index values suggest a higher sensitivity to crowding. It is calculated with the following formula using the performance time of each card:

The Err index represents the overall errors made across all sub-tests. Err = Total number of errors across all four test cards.

2.2.2 Other attention tests

The Digit Cancellation Test (D-CAT) is used to measure selective attention ( Hatta et al., 2004 ). Participants were required to locate and strike through the number preceding the number 3 from a random sequence of numbers 1–9, with the time taken to complete the test recorded. Net scores and error rates are calculated based on the number of correct cancelations, omissions, and mistakes. Higher net scores and lower error rates indicate better selective attention.

The Symbol Digit Modalities Test (SDMT) was published by Aaron Smith in 1973 and revised in 1982 to assess speed of information processing, visual scanning ability, and hand-eye coordination ( Strober et al., 2019 ). This test involves an encoding key of 9 different abstract symbols, each associated with a number. Participants must write the number corresponding to each symbol as quickly as possible within 90 s. Scoring is based on the number of correct symbols and reversed symbols. Higher scores indicate better speed of information processing, visual scanning ability, and hand-eye coordination.

The Digit Span Test (DST) is a commonly used psychological assessment tool that measures short-term memory and attention span ( Park and Lee, 2019 ). In its traditional form, the Digit Span Test consists of two parts: forward digit span and backward digit span. This test evaluates the participant’s ability to recall a sequence of numbers in the correct order both forwards and backwards after the tester reads them out. Participants repeat a series of random numbers at a rate of one number per second, starting with a sequence of 3 numbers and increasing in length up to 12 numbers or until two consecutive errors are made. One point is scored for each correctly recalled sequence. The higher the scores on forward and backward digit span, the greater the capacity of immediate memory.

2.2.3 Sample size calculation

This study mainly used correlation analysis and multiple linear regression analysis, so it was calculated using G*Power software 3.1 ( Faul et al., 2009 ), correlation analysis input target effect size of 0.3, type I error of 5% (α = 0.05), and power of 80% (β = 0.20), and the sample size of 82 participants was calculated. Multiple linear regression analyses were conducted with an input independent variable of 3 (U = 3), effect size = 0.15 (F2 = 0.15), type I error of 5% (α = 0.05), and power of 80% (β = 0.20), resulting in a calculated sample size of 77 participants. The final sample size was 100 participants, taking into account an allowable 20% dropout rate.

2.2.4 Experimental procedure

Participants filled out informed consent forms; They were subjected to the BReViS test and other attention tests. Among them, 30 were randomly selected to retake the BReViS test after two weeks. All tests were administered by the same physician.

2.3 Statistical analysis

SPSS 17.0 software was used for statistical analysis. Spearman’s correlation analysis was employed to assess the correlation between the BReVis test and other attention tests, as well as the correlation between each sub-test of the BReViS and age, educational level, and gender. Kruskal-Wallis test was used to compare the differences in the BReViS sub-test scores among different age and educational level groups, while Mann–Whitney U test was utilized to compare the differences between gender groups. Multiple linear regression analysis was conducted to investigate the influence of demographic characteristics on scale evaluation results, with statistical significance set at p  < 0.05. Pearson correlation coefficient was employed to analyze the test–retest reliability of the BReViS; Cronbach’s α coefficient was used to indicate internal consistency, with a coefficient above 0.80 considered excellent, between 0.70 and 0.80 acceptable, and below 0.7 indicating poor reliability. The Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy and Bartlett’s test of sphericity were employed to analyze the appropriateness of factor analysis, to validate the structural validity of the BReViS. Finally, correlation analyses between the results of the BReViS subtests were conducted using Spearman’s correlation analysis to test the content and structural validity of the scale.

3.1 Descriptive results

The descriptive mean results on the four BReViS sub-tests scores are reported in Tables 2 – 4 .

www.frontiersin.org

Table 2 . Mean performance time (and SD) for each sub-test, divided by age group.

www.frontiersin.org

Table 3 . Mean performance time (and SD) for each sub-test, divided by education level group.

www.frontiersin.org

Table 4 . Mean performance time (and SD) for each sub-test, divided by gender group.

3.2 Correlation analysis of age with the BReViS sub-tests

Age showed a positive correlation with both SA ( r  = 0.776, p  < 0.001) and Err ( r  = 0.607, p  < 0.001), with no significant correlation with the other sub-tests.

3.3 Comparison of different age groups

As shown in Table 5 , analyses of multiple between-group comparisons across age groups showed significant differences in sub-test scores for SA and Err ( p  < 0.001). Detailed two-by-two intergroup comparisons highlighted significant differences in SA scores between the youth and middle-aged groups (adjusted p  = 0.006), as well as between the youth and senior groups (adjusted p  = 0.000). Similarly, Err scores differed significantly between the young and middle-aged groups (adjusted p  = 0.005), and between the youth and senior groups (adjusted p = 0.000). Additionally, a distinct variance was observed in SA scores between the young-adult and senior groups (adjusted p  = 0.017), as shown in Table 6 .

www.frontiersin.org

Table 5 . Analysis of variance between different age groups (Mean Rank).

www.frontiersin.org

Table 6 . Two-by-two comparison of SA and Err between different age groups.

3.4 Correlation analysis of education level with the BReViS sub-tests

Years of education were negatively correlated with both SA ( r  = −0.715, p  < 0.001) and Err ( r  = −0.502, p  < 0.001), with no significant correlation with the remaining sub-tests.

3.5 Comparison of different education level groups

As shown in Table 7 , analyses of multiple between-group comparisons across education level groups unveiled significant disparities in the scores for sub-tests SA and Err, while OA and FA did not exhibit such differences ( p  < 0.001). Detailed two-by-two intergroup comparisons highlighted significant differences in SA scores: the college/university and above group demonstrated significant disparities when compared with the elementary, middle school, and high school/vocational groups (adjusted p  = 0.000 for all comparisons). Similarly, Err scores significantly differed between the college/university and above group and the elementary group (adjusted p = 0.000), as well as between the college/university and above group and both the middle school (adjusted p  = 0.027) and high school/vocational groups (adjusted p  = 0.006), as detailed in Table 8 .

www.frontiersin.org

Table 7 . Analysis of variance between different education level groups (mean rank).

www.frontiersin.org

Table 8 . Two-by-two comparison of SA and Err between different education level groups.

3.6 Correlation analysis of gender with the BReViS sub-tests

Gender showed a negative correlation with OA ( r  = −0.251, p  = 0.012) and a positive correlation with Err ( r  = 0.215, p  = 0.032), with no significant correlation with SA and FA.

3.7 Comparison of the two gender groups

The comparison results between the two gender groups showed a significant difference in OA and Err ( p  < 0.05), while no significant difference was observed in SA and FA, as detailed in Table 9 . Combining the results from Table 4 , it was evident that males scored higher in the OA test and lower in the Err test compared to females.

www.frontiersin.org

Table 9 . Comparison of the two gender groups (Mean Rank).

3.8 Impact of demographic variables

Multiple linear regression analysis suggested that when demographic variables age, education level, and gender were introduced into the linear regression model of SA and Err, SA was affected by years of education level and age, while Err was influenced by age and gender ( Table 10 ).

www.frontiersin.org

Table 10 . Impact of demographic variables.

3.9 Relevance to other attention tests

SA was negatively correlated with the net score of D-CAT and positively correlated with the error rate of D-CAT. It was also negatively correlated with DST forward and backward scores and SDMT scores. Err showed a positive correlation with the net score of D-CAT and a negative correlation with the error rate of D-CAT, DST forward and backward scores, and SDMT scores. OA and FA did not show significant correlation with other attention tests ( Table 11 ).

www.frontiersin.org

Table 11 . Relevance to other attention tests.

3.10 Reliability testing

3.10.1 Re-testability of the BReViS test: Results showed that the correlation coefficients for SA, OA, and Err were all greater than 0.600, p  < 0.001. Only the correlation coefficient for FA was below 0.6, p  > 0.05, which was not statistically significant ( Table 12 ).

www.frontiersin.org

Table 12 . Re-testability of the BReViS test.

3.10.2 Internal Consistency Reliability: Cronbach’s alpha coefficient was 0.874, indicating high internal consistency reliability for the BReViS test.

3.11 Validity testing

3.11.1 Construct Validity: The Kaiser-Meyer-Olkin (KMO) measure and Bartlett’s test of sphericity results were 0.763 and 252.601 (P<0.001), respectively, indicating the scale was not very suitable for factor analysis.

3.11.2 Criterion Validity: In this study, the D-CAT was used as a criterion, and Spearman’s correlation analysis was used to calculate the correlation between BReViS’s SA and the net scores and error rates of D-CAT to evaluate the degree of criterion-related validity. The results showed that SA was significantly negatively correlated with the net score of D-CAT ( r  = −0.405, p < 0.001) and significantly positively correlated with the error rate of D-CAT ( r  = 0.401, p < 0.001), indicating the questionnaire has good criterion-related validity, as seen in Table 11 .

3.12 Correlation between sub-tests

The correlation analysis of the results among the various sub-tests of the BReViS test indicated that the correlation coefficient between SA and Err was 0.532, and between OA and Err was-0.229, with p  < 0.05, suggesting a certain degree of consistency between them, which contributes to ensuring the reliability of the scale. Meanwhile, the correlation between SA, OA, and FA was not high, indicating that the scale has excellent information content and structural validity, as seen in Table 13 .

www.frontiersin.org

Table 13 . Correlation between sub-tests.

4 Discussion

Attention is a fundamental psychological concept, deeply embedded in cognitive processing, defined by the deliberate focusing on particular stimuli ( van Es et al., 2018 ). This focusing elevates the level of awareness about these stimuli, epitomizing attention’s selective nature. Solso, MacLin M.K., and MacLin O.H. (2005) highlight that “the essence of attention lies in the concentration and focus of consciousness,” underlining attention’s critical role in selecting an item from an array of simultaneous stimuli or thought sequences ( Baddeley, 1988 ). Selective attention, therefore, is the capacity to direct an individual’s finite processing resources toward a particular environmental aspect. This complex concept encompasses a range of processes, including spatial attention with its directional and focal elements ( Carrasco, 2011 ). Such capability allows for the filtration of extensive information from the surroundings, facilitating the efficient usage of scarce cognitive resources.

Historically, attention has been a central theme in psychological studies, resulting in a plethora of theoretical frameworks and experimental methodologies. One of the most significant paradigms for investigating selective visual attention’s traits is visual search ( Bacon and Egeth, 1997 ; Verghese, 2001 ; Wolfe, 2003 ). Everyday life is replete with visual search scenarios, whether it’s choosing products on supermarket shelves, animals searching for food amidst leaves, locating a friend in a large gathering, or playing visual search games ( Wolfe, 2020 ). Clinical neuropsychology frequently employs visual search tasks (VST) to evaluate selective visual attention deficits in patients with neurological conditions ( Senger et al., 2017 ). Standard VST protocols involve participants identifying a target among numerous stimuli, like figures or letters, assessing performance based on response accuracy and time ( Wolfe et al., 2002 ).

Studies suggest that visual task outcomes are not just influenced by attention toward the target’s location (the spatial component) but also by adjusting the attention window according to the task requirements (the focal component) ( Albonico et al., 2016 ), with each component operating independently ( Castiello and Umilta, 1990 ; Carrasco and Yeshurun, 2009 ). Traditional VSTs, however, tend to neglect the influence of distractor arrangement and density on performance, thus failing to adequately capture the nuances of spatial attention ( Weintraub and Mesulam, 1988 ; Mesulam, 2000 ). The BReViS assessment offers a refreshing alternative to conventional paper-and-pencil visual search tests by modifying the stimulus arrangement within the visual field, allowing for a comprehensive evaluation of selective visual attention and its distinct facets. Though previously utilized within the Italian demographic without undergoing thorough reliability and validity verification, this study introduces the BReViS test to the Mainland Chinese audience, undertaking a comprehensive examination of its reliability and validity among individuals aged 19 to 84.

4.1 Reliability testing

When a test has good reliability, it will yield almost the same scores for the same group of people at different times. The quality of reliability is also a prerequisite for validity testing. In this study, the test–retest reliability of the BReViS showed high correlation coefficients for three of the four sub-tests—SA, OA, and Err—on reassessment after two weeks. The test–retest results indicate that the BReViS test has good retest reliability, suggesting good temporal stability. The lack of statistical significance for FA in the correlation analysis may be due to the longer duration of this test, which may lead to fatigue in older participants resulting in unstable scores. Additionally, a higher Cronbach’s alpha coefficient indicates stronger internal consistency of the scale. It is generally considered that a Cronbach’s alpha coefficient greater than 0.7 indicates good consistency among items ( Tavakol and Dennick, 2011 ). The results of this study show a total Cronbach’s alpha coefficient of 0.874 for the BReViS test, indicating high internal consistency reliability. It’s interesting to note that the average score for FA increased from −1.57 in the first test to 0.67 in the second, indicating a higher sensitivity to crowding in the latter. Research has shown that sensitivity to visual crowding is influenced by various factors that can affect an individual’s ability to distinguish objects in cluttered environments. These factors include contrast, eccentricity, visual acuity and age, spatial frequency, attention and perceptual learning, as well as stimulus similarity ( Coates et al., 2013 ; Veríssimo et al., 2022 ). Therefore, factors such as the brightness of the room, the depth of color of the test figures, the position of the test paper in the field of vision, whether the participant is focused, has undergone perceptual learning, and the objects surrounding the test paper can all affect sensitivity to crowding. The variability in the results of the two tests in this study reminds us that these influences need to be more tightly controlled in future studies.

4.2 Validity testing

The Kaiser-Meyer-Olkin (KMO) measure and Bartlett’s test suggested that the structure of the BReViS test might not be well suited for factor analyses, but that there was some correlation between the BReViS measures. The correlation analysis among the results of each subt-est of the BReViS showed a correlation coefficient of 0.532 between SA and Err, and − 0.229 between OA and Err, with p  < 0.05, indicating a certain level of consistency between them, which contributes to ensuring the reliability of the scale. However, the correlations among SA, OA, and FA were not high, suggesting that the scale has excellent information content and structural validity. Given that BReViS was developed to assess SA, this study employed the D-CAT as a criterion measure and found a significant correlation between SA and the D-CAT results, indicating good criterion-related validity.

4.3 The influence of age on BReViS

This study showed that age was significantly positively correlated with the sub-tests SA and Err. Multiple linear regression analysis suggested that SA is greatly influenced by age and education level, while Err is more influenced by age and gender. Therefore, age is a major factor influencing BReViS test results, which is consistent with the findings of the scale developers in the Italian population and previous research. The rank-sum test analysis across different age groups reveals that young adults significantly outperform both middle-aged and senior groups in selective attention tasks, making fewer errors. Additionally, the young-adult group demonstrate superior selective attention capabilities compared to those in the senior group. This pattern supports the notion that selective attention abilities undergo a pronounced growth during adolescence, which is then followed by a discernible decline as individuals age ( Moore and Zirnsak, 2017 ). Neurophysiological alterations, observable through changes in the amplitude and latency of event-related potential (ERP) components, accompany this evolution in attention processing ( Madden et al., 2007 ). Complementing these findings, functional MRI studies have identified a diminished activation in critical regions associated with visual attention control - namely, the bilateral fusiform gyrus, the right lingual gyrus, and the right precuneus-in elderly individuals when compared to their younger counterparts ( Lyketsos et al., 1999 ; Lee et al., 2003 ).

4.4 The influence of education level on BReViS

This study found that years of education were negatively correlated with both SA and Err, and significant differences in SA and Err scores were also observed across different education level groups. Analysis using rank-sum tests across different educational attainment groups indicates that individuals with tertiary education (the college/university group and above) perform significantly better in selective attention tasks than those from the elementary ( Mueller et al., 2008 ; Yehezkel et al., 2015 ), middle School and high school/vocational groups. They made fewer errors, suggesting a correlation between higher education levels and improved selective attention abilities. Studies have shown that individuals with higher levels of education often perform better on various cognitive tests ( Lindenberger and Baltes, 1997 ; Hultsch et al., 1999 ), likely due to the enhanced cognitive strategies, problem-solving skills, and knowledge base provided by formal education. Additionally, higher education may mitigate the impact of aging on cognitive performance ( Lee et al., 2003 ; Jones et al., 2006 ; Tun and Lachman, 2008 ; Marioni et al., 2012 ). Research by Stern et al. (2005) and others indicates that higher educational attainment can moderate the decline in reaction and attention abilities due to aging and lower the risk of dementia ( Bell et al., 2006 ), partly because cognitive reserve accumulation improves brain network efficiency ( Rubia et al., 2010 ). These findings highlight the importance of considering educational background when interpreting cognitive assessment results.

4.5 The influence of gender on BReViS

In this study, the SA index was influenced by age and educational level, but no significant gender differences were observed. Gender was positively correlated with the Err index and negatively correlated with the OA index, with significant differences between genders, indicating that females committed more total errors than males. Males had higher OA scores than females, suggesting that males in the visual search process rely on exogenous cues to perform tasks correctly and are less likely to follow effective endogenous strategies. This is consistent with the observations made by the authors in a normal Italian population. The differences in OA scores between males and females may be related to the activation of different brain regions during the execution of spatial selective attention tasks. Males show increased activation in the left hemisphere’s inferior parietal lobule, while females show significant activation in the right hemisphere’s inferior frontal gyrus, insula, caudate, and temporal areas ( de Fockert et al., 2001 ; Boi et al., 2011 ), which may be related to the modulation by estrogen and testosterone ( Oberauer, 2019 ). Additionally, FA was not observed to be affected by gender, age and years of education in this study, which is in line with the results of the most recent application of the scale, i.e., crowding did not worsen with age ( Pegoraro et al., 2024 ), and these findings are consistent with previous studies ( Malavita et al., 2017 ; Shamsi et al., 2022 ).

4.6 The correlation between BReViS and other attention scales

SA was significantly positively correlated with the cancellation time and error rate in the D-CAT and significantly negatively correlated with the net score of cancellation. Err was negatively correlated with the net score of cancellation and positively correlated with the cancellation error rate. These results indicate that BReViS’s SA and Err have good consistency with the D-CAT in assessing selective attention in the normal population.

Research demonstrates that enhancing selective attention significantly improves test outcomes in immediate memory capabilities ( Plebanek and Sloutsky, 2019 ). For instance, within the context of the DST, superior selective attention enables individuals to recall and reproduce digit sequences with greater accuracy, thus exhibiting an increased memory capacity. This study reveals a negative correlation between SA and Err with the scores of forward and backward span in the DST, offering a crucial insight: higher scores of SA and Err indicate weaker selective attention, an increased error rate, and a noticeable decline in the subjects’ immediate memory capacity. This finding highlights the close interrelation among immediate memory, selective attention, and cognitive efficiency, suggesting that individuals with a larger immediate memory capacity can more effectively resist distractions, thereby reducing error rates ( Posner and Petersen, 1990 ; Rayner, 1998 ; Ku, 2018 ). In clinical practice, this correlation is important to identify and assess deficits in attention, working memory, or other cognitive functions.

The negative correlation between SA and Err with scores on the SDMT unveils a significant cognitive phenomenon: there is a direct correlation between elevated selective attention and increased efficiency of visual scanning, speed of information processing, and hand-eye coordination. Selective attention, a critical dimension of attention management, involves filtering task-relevant information from the environment while disregarding irrelevant distractions ( De la Torre et al., 2015 ). The efficacy of selective attention depends to a large extent on the efficiency of visual scanning, a crucial aspect because it requires the individual to quickly localize and identify key targets among numerous visual stimuli ( Reigal et al., 2019 ). Furthermore, the acceleration of information processing speed is a key factor in enhancing the efficiency of selective attention, allowing individuals to recognize important information within shorter durations and respond accordingly ( Posner, 1980 ). In tasks requiring rapid identification of visual information followed by corresponding physical actions, exceptional hand-eye coordination markedly improves the precision and efficiency of task execution ( Castiello and Umilta, 1990 ). Thus, the effective concentration of selective attention on specific stimuli or tasks is supported by an individual’s performance in terms of a combination of speed of information processing, visual scanning ability, and hand-eye coordination. The improvement of these cognitive abilities not only further enhances the performance of selective attention but also, reciprocally, enhances the operational efficacy of these cognitive functions, thereby creating a positive feedback loop. This phenomenon offers profound insights into how individuals process information efficiently in complex environments within the domain of cognitive science.

The allocation of attentional resources in space involves two distinct processes: the orienting process, which selectively concentrates on specific aspects of the environment while ignoring others. The OA index reflects orienting ability, influenced by factors like stimulus salience, personal interests or goals, and the presence of attention-directing cues ( Chun et al., 2011 ). The focusing process narrows attention to a specific area or object, acting like a magnifying glass, allowing selective concentration on a limited spatial area ( Turatto et al., 2000 ; Chun et al., 2011 ). The FA index reflects focusing ability. Some studies suggest that focusing and orienting may vary based on visual conditions ( Turatto et al., 2000 ). This research found no significant correlation between OA and FA with DST and SDMT, suggesting that orienting and focusing abilities might not be affected by immediate memory capacity, information processing speed, visual scanning ability, and hand-eye coordination skills.

5 Conclusion

The BreViS test, demonstrating good reliability and validity, is adept for application across a broad age range (19 to 84 years) within the general population, assessing not only selective attention but also gauging capacities in immediate memory, information processing speed, visual scanning, and hand-eye coordination. The influence of demographic variables such as age, gender, and education level on test outcomes underscores the necessity for nuanced interpretation of results in research and clinical settings.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Ethics statement

The studies involving humans were approved by the Ethics Committee of the Third Affiliated Hospital of Soochow University. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

YS: Writing – original draft. YZ: Writing – review & editing.

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Albonico, A., Malaspina, M., Bricolo, E., Martelli, M., and Daini, R. (2016). Temporal dissociation between the focal and orientation components of spatial attention in central and peripheral vision. Acta Psychologica 171, 85–92. doi: 10.1016/j.actpsy.2016.10.003

PubMed Abstract | Crossref Full Text | Google Scholar

Bacon, W. J., and Egeth, H. E. (1997). Goal-directed guidance of attention: evidence from conjunctive visual search. J. Exp. Psychol. Hum. Percept. Perform. 23, 948–961. doi: 10.1037/0096-1523.23.4.948

Crossref Full Text | Google Scholar

Baddeley, A. (1988). Cognitive psychology and human memory. Trends Neurosci. 11, 176–181. doi: 10.1016/0166-2236(88)90145-2

Bell, E. C., Willson, M. C., Wilman, A. H., Dave, S., and Silverstone, P. H. (2006). Males and females differ in brain activation during cognitive tasks. NeuroImage 30, 529–538. doi: 10.1016/j.neuroimage.2005.09.049

Boi, M., Vergeer, M., Ogmen, H., and Herzog, M. H. (2011). Nonretinotopic exogenous attention. Curr. Biol. 21, 1732–1737. doi: 10.1016/j.cub.2011.08.059

Carrasco, M. (2011). Visual attention: the past 25 years. Vis. Res. 51, 1484–1525. doi: 10.1016/j.visres.2011.04.012

Carrasco, M., and Yeshurun, Y. (2009). Covert attention effects on spatial resolution. Prog. Brain Res. 176, 65–86. doi: 10.1016/S0079-6123(09)17605-7

Castiello, U., and Umilta, C. (1990). Size of the attentional focus and efficiency of processing. Acta Psychol. 73, 195–209. doi: 10.1016/0001-6918(90)90022-8

Chun, M. M., Golomb, J. D., and Turk-Browne, N. B. (2011). A taxonomy of external and internal attention. Annu. Rev. Psychol. 62, 73–101. doi: 10.1146/annurev.psych.093008.100427

Coates, D. R., Chin, J. M., and Chung, S. T. (2013). Factors affecting crowded acuity: eccentricity and contrast Optometry and vision science. Am. Acad. Optom. 90, 628–638. doi: 10.1097/OPX.0b013e31829908a4

Connor, C. E., Egeth, H. E., and Yantis, S. (2004). Visual attention: bottom-up versus top-down. Curr. Biol. 14, R850–R852. doi: 10.1016/j.cub.2004.09.041

de Fockert, J. W., Rees, G., Frith, C. D., and Lavie, N. (2001). The role of working memory in visual selective attention. Science 291, 1803–1806,

Google Scholar

De la Torre, G. G., Barroso, J. M., León-Carrión, J., Mestre, J. M., and Bozal, R. G. (2015). Reaction time and attention: toward a new standard in the assessment of ADHD? A pilot study. J. Atten. Disord. 19, 1074–1082. doi: 10.1177/1087054712466440

Della Sala, S., Laiacona, M., Spinnler, H., and Ubezio, C. (1992). A cancellation test: its reliability in assessing attentional deficits in Alzheimer's disease. Psychol. Med. 22, 885–901. doi: 10.1017/S0033291700038460

Demeyere, N., Riddoch, M. J., Slavkova, E. D., Bickerton, W. L., and Humphreys, G. W. (2015). The Oxford cognitive screen (OCS): validation of a stroke-specific short cognitive screening tool. Psychol. Assess. 27, 883–894. doi: 10.1037/pas0000082

Eglin, M., Robertson, L. C., and Knight, R. T. (1989). Visual search performance in the neglect syndrome. J. Cogn. Neurosci. 1, 372–385. doi: 10.1162/jocn.1989.1.4.372

Facchin, A., Simioni, M., Maffioletti, S., and Daini, R. (2023). Broken ring enVision search (BReViS): a new clinical test of attention to assess the effect of layout and crowding on visual search. Brain Sci. 13:494. doi: 10.3390/brainsci13030494

Faul, F., Erdfelder, E., Buchner, A., and Lang, A. G. (2009). Statistical power analyses using G*power 3.1: tests for correlation and regression analyses. Behav. Res. Methods 41, 1149–1160. doi: 10.3758/BRM.41.4.1149

Hatta, T., Masui, T., Ito, Y., Ito, E., Hasegawa, Y., and Matsuyama, Y. (2004). Relation between the prefrontal cortex and cerebro-cerebellar functions: evidence from the results of stabilometrical indexes. Appl. Neuropsychol. 11, 153–160. doi: 10.1207/s15324826an1103_3

Hultsch, D., Hertzog, C., Small, B. J., and Dixon, R. A. (1999). Use it or lose it? Engage lifestyle as a buffer of cognitive decline in aging? Psychol. Aging 14, 245–263. doi: 10.1037/0882-7974.14.2.245

Jones, R. N., Yang, F. M., Zhang, Y., Kiely, D. K., Marcantonio, E. R., and Inouye, S. K. (2006). Does educational attainment contribute to risk for delirium? A potential role for cognitive reserve. Journal of Gerontology: Medical Sciences. 61, 1307–1311. doi: 10.1093/gerona/61.12.1307

Ku, Y. (2018). Selective attention on representations in working memory: cognitive and neural mechanisms. PeerJ 6:e4585. doi: 10.7717/peerj.4585

Lee, S., Kawachi, I., Berkman, L. F., and Grodstein, F. (2003). Education, other socioeconomic indicators, and cognitive function. Am. J. Epidemiol. 157, 712–720. doi: 10.1093/aje/kwg042

Lindenberger, U., and Baltes, P. B. (1997). Intellectual functioning in old and very old age: cross-sectional results from the Berlin aging study. Psychol. Aging 12, 410–432. doi: 10.1037/0882-7974.12.3.410

Luck, S. J., Hillyard, S. A., Mangun, G. R., and Gazzaniga, M. S. (1989). Independent hemispheric attentional systems mediate visual search in split-brain patients. Nature 342, 543–545. doi: 10.1038/342543a0

Lyketsos, C. G., Chen, L., and Anthony, J. C. (1999). Cognitive decline in adulthood: an 11.5 year follow-up of the Baltimore epidemiological catchment area study. Am. J. Psychiatry 156, 58–65. doi: 10.1176/ajp.156.1.58

Madden, D. J., Spaniol, J., Whiting, W. L., Bucur, B., Provenzale, J. M., Cabeza, R., et al. (2007). Adult age differences in the functional neuroanatomy of visual attention: a combined fMRI and DTI study. Neurobiol. Aging 28, 459–476. doi: 10.1016/j.neurobiolaging.2006.01.005

Malavita, M. S., Vidyasagar, T. R., and McKendrick, A. M. (2017). The effect of aging and attention on visual crowding and surround suppression of perceived contrast threshold. Invest. Ophthalmol. Vis. Sci. 58, 860–867. doi: 10.1167/iovs.16-20632

Marioni, R. E., van den Hout, A., Valenzuela, M. J., Brayne, C., Matthews, F. E., Function, M. R. C. C., et al. (2012). Active cognitive lifestyle associates with cognitive recovery and a reduced risk of cognitive decline. J. Alzheimers Dis. 28, 223–230. doi: 10.3233/JAD-2011-110377

Mesulam, M.-M. (2000). Principles of behavioral and cognitive neurology . Oxford, UK: Oxford University Press.

Moore, T., and Zirnsak, M. (2017). Neural mechanisms of selective visual attention. Annu. Rev. Psychol. 68, 47–72. doi: 10.1146/annurev-psych-122414-033400

Mueller, V., Brehmer, Y., von Oertzen, T., Li, S. C., and Lindenberger, U. (2008). Electrophysiological correlates of selective attention: a lifespan comparison. BMC Neurosci. 9:18. doi: 10.1186/1471-2202-9-18

Oberauer, K. (2019). Working memory and attention - a conceptual analysis and review. J. Cogn. 2:36. doi: 10.5334/joc.58

Park, M. O., and Lee, S. H. (2019). Effect of a dual-task program with different cognitive tasks applied to stroke patients: a pilot randomized controlled trial. Neuro Rehabil. 44, 239–249. doi: 10.3233/NRE-182563

Pegoraro, S., Facchin, A., Luchesa, F., Rolandi, E., Guaita, A., Arduino, L. S., et al. (2024). The complexity of Reading revealed by a study with healthy older adults. Brain Sci. 14:230. doi: 10.3390/brainsci14030230

Petersen, S. E., and Posner, M. I. (2012). The attention system of the human brain: 20 years after. Annu. Rev. Neurosci. 35, 73–89. doi: 10.1146/annurev-neuro-062111-150525

Plebanek, D. J., and Sloutsky, V. M. (2019). Selective attention, filtering, and the development of working memory. Dev. Sci. 22:e12727. doi: 10.1111/desc.12727

Posner, M. I. (1980). Orienting of attention. Q. J. Exp. Psychol. 32, 3–25. doi: 10.1080/00335558008248231

Posner, M. I., and Petersen, S. E. (1990). The attention system of the human brain. Annu. Rev. Neurosci. 13, 25–42. doi: 10.1146/annurev.ne.13.030190.000325

Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychol. Bull. 124, 372–422. doi: 10.1037/0033-2909.124.3.372

Reigal, R. E., Barrero, S., Martín, I., Morales-Sánchez, V., Juárez-Ruiz de Mier, R., and Hernández-Mendo, A. (2019). Relationships between reaction time, selective attention, physical activity, and physical fitness in children. Front. Psychol. 10:2278. doi: 10.3389/fpsyg.2019.02278

Rubia, K., Hyde, Z., Halari, R., Giampietro, V., and Smith, A. (2010). Effects of age and sex on developmental neural networks of visual-spatial attention allocation. NeuroImage 51, 817–827. doi: 10.1016/j.neuroimage.2010.02.058

Senger, C., Margarido, M. R. R. A., De Moraes, C. G., De Fendi, L. I., Messias, A., and Paula, J. S. (2017). Visual search performance in patients with vision impairment: a systematic review. Curr. Eye Res. 42, 1561–1571. doi: 10.1080/02713683.2017.1338348

Shamsi, F., Liu, R., and Kwon, M. (2022). Foveal crowding appears to be robust to normal aging and glaucoma unlike parafoveal and peripheral crowding. J. Vis. 22:10. doi: 10.1167/jov.22.8.10

Stern, Y., Haback, C., Moeller, J., Scarmeas, N., Anderson, K. E., Hilton, H. J., et al. (2005). Brain networks associated with cognitive reserve in healthy young and old adults. Cerb. Cort. 15, 394–402. doi: 10.1093/cercor/bhh142

Strober, L., DeLuca, J., Benedict, R. H., Jacobs, A., Cohen, J. A., Chiaravalloti, N., et al. (2019). Symbol digit modalities test: a valid clinical trial endpoint for measuring cognition in multiple sclerosis. Mult. Scler. 25, 1781–1790. doi: 10.1177/1352458518808204

Tavakol, M., and Dennick, R. (2011). Making sense of Cronbach's alpha. Int. J. Med. Educ. 2, 53–55. doi: 10.5116/ijme.4dfb.8dfd

Tun, P. A., and Lachman, M. E. (2008). Age differences in reaction time and attention in a national telephone sample of adults: education, sex, and task complexity matter. Dev. Psychol. 44, 1421–1429. doi: 10.1037/a0012845

Turatto, M., Benso, F., Facoetti, A., Galfano, G., Mascetti, G. G., and Umiltà, C. (2000). Automatic and voluntary focusing of attention. Percept. Psychophys. 62, 935–952. doi: 10.3758/BF03212079

Uttl, B., and Pilkenton-Taylor, C. (2001). Letter cancellation performance across the adult life span. Clin. Neuropsychol. 15, 521–530. doi: 10.1076/clin.15.4.521.1881

Utz, K. S., Hankeln, T. M., Jung, L., Lammer, A., Waschbisch, A., Lee, D. H., et al. (2013). Visual search as a tool for a quick and reliable assessment of cognitive functions in patients with multiple sclerosis. PLoS One 8:e81531. doi: 10.1371/journal.pone.0081531

van Es, D. M., Theeuwes, J., and Knapen, T. (2018). Spatial sampling in human visual cortex is modulated by both spatial and feature-based attention. eLife . 7:e36928. doi: 10.7554/eLife.36928

Verghese, P. (2001). Visual search and attention: a signal detection theory approach. Neuron 31, 523–535. doi: 10.1016/S0896-6273(01)00392-0

Veríssimo, J., Verhaeghen, P., Goldman, N., Weinstein, M., and Ullman, M. T. (2022). Evidence that ageing yields improvements as well as declines across attention and executive functions. Nat. Hum. Behav. 6, 97–110. doi: 10.1038/s41562-021-01169-7

Weintraub, S., and Mesulam, M. M. (1988). Visual Hemispatial inattention: stimulus parameters and exploratory strategies. J. Neurol. Neurosurg. Psychiatry 51, 1481–1488. doi: 10.1136/jnnp.51.12.1481

Wolfe, J. M. (2003). Moving towards solutions to some enduring controversies in visual search. Trends Cogn. Sci. 7, 70–76. doi: 10.1016/S1364-6613(02)00024-4

Wolfe, J. M. (2020). Visual search: how do we find what we are looking for? Annu. Rev. Vis. Sci. 6, 539–562. doi: 10.1146/annurev-vision-091718-015048

Wolfe, J. M., Oliva, A., Horowitz, T. S., Butcher, S. J., and Bompas, A. (2002). Segmentation of objects from backgrounds in visual search tasks. Vis. Res. 42, 2985–3004. doi: 10.1016/S0042-6989(02)00388-7

Yehezkel, O., Sterkin, A., Lev, M., and Polat, U. (2015). Crowding is proportional to visual acuity in young and aging eyes. J. Vis. 15:23. doi: 10.1167/15.8.23

Keywords: attention, attention assessment, broken ring enVision search test, reliability, validity, age, education level, gender

Citation: Shi Y and Zhang Y (2024) Reliability and validity of a novel attention assessment scale (broken ring enVision search test) in the Chinese population. Front. Psychol . 15:1375326. doi: 10.3389/fpsyg.2024.1375326

Received: 23 January 2024; Accepted: 25 April 2024; Published: 09 May 2024.

Reviewed by:

Copyright © 2024 Shi and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Yi Zhang, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

medRxiv

A QUANTITATIVE ASSESSMENT OF VISUAL FUNCTION FOR YOUNG AND MEDICALLY COMPLEX CHILDREN WITH CEREBRAL VISUAL IMPAIRMENT: DEVELOPMENT AND INTER-RATER RELIABILITY

  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kathleen M. Weden
  • For correspondence: [email protected]
  • Info/History
  • Preview PDF

Background Cerebral Visual Impairment (CVI) is the most common cause of low vision in children. Standardized, quantifiable measures of visual function are needed.

Objective This study developed and evaluated a new method for quantifying visual function in young and medically complex children with CVI using remote videoconferencing.

Methods Children diagnosed with CVI who had been unable to complete clinic-based recognition acuity tests were recruited from a low-vision rehabilitation clinic(n=22)Video-based Visual Function Assessment (VFA) was implemented using videoconference technology. Three low-vision rehabilitation clinicians independently scored recordings of each child’s VFA. Interclass correlations for inter-rater reliability was analyzed using intraclass correlations (ICC). Correlations were estimated between the video-based VFA scores and both clinically obtained acuity measures and children’s cognitive age equivalence.

Results Inter-rater reliability was analyzed using intraclass correlations (ICC). Correlations were estimated between the VFA scores, clinically obtained acuity measures, and cognitive age equivalence. ICCs showed good agreement (ICC and 95% CI 0.835 (0.701-0.916)) on VFA scores across raters and agreement was comparable to that from previous, similar studies. VFA scores strongly correlated (r= -0.706, p=0.002) with clinically obtained acuity measures. VFA scores and the cognitive age equivalence were moderately correlated (r= 0.518, p=0.005), with notable variation in VFA scores for participants below a ten month cognitive age-equivalence. The variability in VFA scores among children with lowest cognitive age-equivalence may have been an artifact of the study’s scoring method, or may represent existent variability in visual function for children with the lowest cognitive age-equivalence.

Conclusions Our new VFA is a reliable, quantitative measure of visual function for young and medically complex children with CVI. Future study of the VFA intrarater reliability and validity is warranted.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This work was supported by the EyeSight Foundation of Alabama, Alie B. Gorrie Low Vision Research Fund and Research to Prevent Blindness. Additional support came from the National Institutes of Health [UL1 TR003096 to R.O.] and Grant T32 HS013852.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

IRB of the University of Alabama at Birmingham gave ethical approval for this work

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Data Availability

All data produced in the present study are available upon reasonable request to the authors

View the discussion thread.

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Reddit logo

Citation Manager Formats

  • EndNote (tagged)
  • EndNote 8 (xml)
  • RefWorks Tagged
  • Ref Manager
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Rehabilitation Medicine and Physical Therapy
  • Addiction Medicine (323)
  • Allergy and Immunology (627)
  • Anesthesia (163)
  • Cardiovascular Medicine (2367)
  • Dentistry and Oral Medicine (288)
  • Dermatology (206)
  • Emergency Medicine (379)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (835)
  • Epidemiology (11765)
  • Forensic Medicine (10)
  • Gastroenterology (702)
  • Genetic and Genomic Medicine (3731)
  • Geriatric Medicine (348)
  • Health Economics (633)
  • Health Informatics (2393)
  • Health Policy (929)
  • Health Systems and Quality Improvement (896)
  • Hematology (340)
  • HIV/AIDS (781)
  • Infectious Diseases (except HIV/AIDS) (13303)
  • Intensive Care and Critical Care Medicine (767)
  • Medical Education (365)
  • Medical Ethics (104)
  • Nephrology (398)
  • Neurology (3493)
  • Nursing (198)
  • Nutrition (523)
  • Obstetrics and Gynecology (673)
  • Occupational and Environmental Health (662)
  • Oncology (1820)
  • Ophthalmology (535)
  • Orthopedics (218)
  • Otolaryngology (287)
  • Pain Medicine (232)
  • Palliative Medicine (66)
  • Pathology (446)
  • Pediatrics (1032)
  • Pharmacology and Therapeutics (426)
  • Primary Care Research (420)
  • Psychiatry and Clinical Psychology (3172)
  • Public and Global Health (6137)
  • Radiology and Imaging (1280)
  • Rehabilitation Medicine and Physical Therapy (746)
  • Respiratory Medicine (825)
  • Rheumatology (379)
  • Sexual and Reproductive Health (372)
  • Sports Medicine (322)
  • Surgery (401)
  • Toxicology (50)
  • Transplantation (172)
  • Urology (145)
  • Library News Online

Heard Libraries’ research engagement collaborations bolster faculty scholarship

When Vanderbilt Law School professors Kevin Stack and Ganesh Sitaraman were researching their 2023 article advocating for the professional licensure of election administrators, Heard Libraries’ Keri Stophel is among those they turned to for help.  

Keri Stophel

Stophel, a research services law librarian at the Alyne Queener Massey Law Library , routinely performs in-depth state and federal regulatory research in support of faculty members’ work. In her collaboration with Stack and Sitaraman, she developed methodologies to search for relevant statutes, regulations, policy documents and guidance about election administration, as well as the scope of training requirements across the country. Stophel also organized and surveyed other materials, such as election audit reports and processes, and found additional secondary sources on various regulatory topics.   

Her contributions were integral to the writing of “ Election Administration as a Licensed Profession ,” published in Wisconsin Law Review by Stack, the Lee S. and Charles A. Speir Professor of Law, and Sitaraman, the New York Alumni Chancellor’s Professor of Law. In their article, they argue that election administrators should be subject to professional licensure, much the same way doctors and lawyers are, which would expand the requirements for election officials’ training, enhance their professional identification, and reinforce norms of reliability and impartiality. “Such a reform meets our moment,” write the authors in their introduction, citing how licensure could greatly improve public confidence in the integrity of U.S. elections.  

Stophel was indispensable to the process, Stack said. “Keri is able to carefully assemble hard-to-find information about state processes and laws. It is hard to imagine attempting to make claims about state practices without her help.”  

Stophel’s research on the topic also helped to inform the fourth edition of The Regulatory State , a textbook co-authored by Stack, Lisa Bressman, the David Daniels Allen Distinguished Professor of Law, and Edward Rubin, University Distinguished Professor of Law and Political Science; an article in the Iowa Law Review , “ Representative Rulemaking ,” by Stack and Jim Rossi, the Judge D.L. Lansden Professor of Law; and a forthcoming article concerning state election regulation powers.    

Translating research into knowledge  

Joshua Borycz

This collaboration highlights an important facet of the academic librarian’s role: leveraging research expertise to find and organize information and evaluate its quality, accuracy and validity to bolster faculty scholarship. In essence, librarians are information specialists who help to translate research into knowledge that has practical and far-reaching applications.  

Embedding librarians at critical points in the research lifecycle at Vanderbilt is a guiding mission of the Heard Libraries and the focus of the libraries’ Research Engagement Committee.  

“Our goal is, essentially, to systematize active research engagement between faculty and librarians,” said Joshua Borycz , chair of the committee and librarian for STEM research at the Sarah Shannon Stevenson Science and Engineering Library . “We provide support and training opportunities to librarians, and we are developing a website and information on services for faculty. Ideally, we want to make it so that active outreach and engagement with faculty about their research are just as integrated into the work life of librarians as instruction.”  

Improving patient care  

Rachel Lane Walden

Having the most up-to-date research is vital in evidence-based nursing practice. Rachel Lane Walden works closely with the Vanderbilt University Medical Center Evidence-Based Practice and Nursing Research Committee to facilitate research activities with the aim of improving patient care.   

As a committee member, Walden, assistant director for research and education services at the Annette and Irwin Eskind Family Biomedical Library , provides training and consultations on how to craft research questions, search the literature, and use citation managers and other software tools. This year, the committee is also designing a repository for research-based practice nursing projects and is currently creating a proposal for that initiative.  

“Rachel’s commitment to supporting evidence-based nursing practices and research is demonstrated time and again with her involvement on the committee, her outside-the-nursing-box thinking, and her willingness to educate, train and present when needed,” said Stacy Stark, co-chair of the committee and principal quality and patient safety advisor for Vanderbilt Behavioral Health and Quality, Safety and Risk Prevention . “Rachel is an invaluable resource for the committee and for nurses at VUMC, and her depth of knowledge and expertise cannot be overstated.”     

Fostering connections  

Staff in the Heard Libraries’ Digital Lab are using their unique expertise in harnessing data to organize, visualize—and, ultimately, to strengthen—professional and scholarly networks.  

Shenmeng Xu

Working in partnership with Professor of Otolaryngology Alex Gelbard , Librarian for Scholarly Communications Shenmeng Xu and Data Science and Data Curation Specialist Steven Baskauf embarked on a project to map the large collaborative network of otolaryngologists practicing in the U.S. The project spanned two years and involved input from additional Vanderbilt faculty, medical school trainees, students and staff.  

“Libraries have always connected surgeons,” said Gelbard, the Robert H. Ossoff Endowed Director of Research at VUMC’s Center for Complex Airway Reconstruction. “Historically, they were essential to the flow of ideas and experimental results between authors and readers. Central to this experience were the professional librarians who identified, collected and organized this information.  

“However, with the rise of new digital tools to connect each other directly, newer structures linking academic surgeons have emerged,” he said. “At the individual level, a small set of influencers with high measures of social connectivity can disproportionately influence large populations in different contexts. At the collective level, network ties may play an important role in fostering the global spread of information.”  

Steven Baskauf

The team used peer-reviewed manuscript co-authorships to define network connections, then Xu and Baskauf utilized emerging tools in network analysis to explore, visualize and quantify the relationships between faculty physicians in accredited otolaryngology programs. The results provide a comprehensive new methodology to examine the interrelationships, influence and impact of individual surgeons, grouped networks and sub-specialties in academic otolaryngology. Their work was presented at a meeting of The Triological Society earlier this year, and Gelbard and his team have applied for a Digital Lab Seed Grant to develop the project further.  

“Shenmeng and Steve leveraged both their technologic expertise in data acquisition and network analysis, along with their professional commitment to understanding the movement of ideas across academics, to make this project possible,” Gelbard said . “With this work, the team showed that quantitative measures of social connectedness in physician networks correlate with academic impact, and collaborative interactions within the academic community are strongly shaped by sub-specialty affiliation and academic institution.”  

For more news and information about the Jean and Alexander Heard Libraries, visit library.vanderbilt.edu .  

Share This Story

COMMENTS

  1. Reliability vs. Validity in Research

    Reliability is about the consistency of a measure, and validity is about the accuracy of a measure.opt. It's important to consider reliability and validity when you are creating your research design, planning your methods, and writing up your results, especially in quantitative research. Failing to do so can lead to several types of research ...

  2. Validity and reliability in quantitative studies

    Validity. Validity is defined as the extent to which a concept is accurately measured in a quantitative study. For example, a survey designed to explore depression but which actually measures anxiety would not be considered valid. The second measure of quality in a quantitative study is reliability, or the accuracy of an instrument.In other words, the extent to which a research instrument ...

  3. (PDF) Validity and Reliability in Quantitative Research

    Abstract and Figures. The validity and reliability of the scales used in research are important factors that enable the research to yield healthy results. For this reason, it is useful to ...

  4. PDF Validity and reliability in quantitative studies

    the studies. In quantitative research, this is achieved through measurement of the validity and reliability.1 Validity Validity is defined as the extent to which a concept is accurately measured in a quantitative study. For example, a survey designed to explore depression but which actually measures anxiety would not be consid-ered valid.

  5. Quantitative Research Excellence: Study Design and Reliable and Valid

    Critical Analysis of Reliability and Validity in Literature Reviews. Go to citation Crossref Google Scholar Pub Med. ... Quantitative Research for the Qualitative Researcher. 2014. SAGE Knowledge. Book chapter . Issues in Validity and Reliability. Show details Hide details. Daniel J. Boudah.

  6. Reliability vs Validity in Research

    Revised on 10 October 2022. Reliability and validity are concepts used to evaluate the quality of research. They indicate how well a method, technique, or test measures something. Reliability is about the consistency of a measure, and validity is about the accuracy of a measure. It's important to consider reliability and validity when you are ...

  7. The Significance of Validity and Reliability in Quantitative Research

    Quantitative research is used to investigate and analyze data to draw meaningful conclusions. Validity and reliability are two critical concepts in quantitative analysis that ensure the accuracy and consistency of the research results. Validity refers to the extent to which the research measures what it intends to measure, while reliability ...

  8. Reliability and Validity: Linking Evidence to Practice

    Testing reliability and validity generally involves assessing agreement between 2 scores, either scores on the same measure collected twice (reliability) or scores on different measures (validity). ... Measurement is an entire field of research by itself. Although the general concepts are quite straightforward, you do not have to scratch too ...

  9. Validity and reliability in quantitative studies

    Validity and reliability in quantitative studies Evid Based Nurs. 2015 Jul;18(3):66-7. doi: 10.1136/eb-2015-102129. Epub 2015 May 15. Authors Roberta Heale 1 , Alison Twycross 2 Affiliations 1 School of Nursing, Laurentian University, Sudbury, Ontario ...

  10. (PDF) Validity and reliability in quantitative research

    3 Predictive validity —means that the instrument. should have high correlations with futur e criterions. For example, a score of high self-ef ficacy related to. performing a task should predict ...

  11. Reliability and validity: Importance in Medical Research

    Reliability and validity are among the most important and fundamental domains in the assessment of any measuring methodology for data-collection in a good research. Validity is about what an instrument measures and how well it does so, whereas reliability concerns the truthfulness in the data obtained and the degree to which any measuring tool ...

  12. Validity and reliability in quantitative studies

    Validity is a test that shows an instrument's accuracy, logic, and relevance for measuring variables in a quantitative study (Cypress, 2017; Heale & Twycross, 2015). Based on this background, the ...

  13. Critically Appraising the Quality and Credibility of Quantitative

    Various factors that determine the quality and believability of a study will be presented, including, • assessing the study's methods in terms of internal validity • examining factors associated with external validity and relevance; and • evaluating the credibility of the research and researcher in terms of possible biases that might ...

  14. Critical Analysis of Reliability and Validity in Literature Reviews

    A process for assessing the reliability and validity of questions for use in online surveys: Exploring how communication technology is used between Lead Maternity Carer midwives and pregnant people in Aotearoa New Zealand

  15. Validity, reliability, and generalizability in qualitative research

    Fundamental concepts of validity, reliability, and generalizability as applicable to qualitative research are then addressed with an update on the current views and controversies. Keywords: Controversies, ... In quantitative research, reliability refers to exact replicability of the processes and the results. In qualitative research with ...

  16. Reliability and Validity

    Reliability refers to the consistency of the measurement. Reliability shows how trustworthy is the score of the test. If the collected data shows the same results after being tested using various methods and sample groups, the information is reliable. If your method has reliability, the results will be valid. Example: If you weigh yourself on a ...

  17. PDF Reliability, Validity and Ethics

    • Reporting on reliability • Writing about validity • Reporting on ethics • Writing about research procedure 3.2 Technical information Reliability and validity are essential elements in any quantitative research project. Reliability refers to the consistency of the results and how sure read-ers can be of the replicability of the research.

  18. Verification Strategies for Establishing Reliability and Validity in

    Verification is the process of checking, confirming, making sure, and being certain. In qualitative research, verification refers to the mechanisms used during the process of research to incrementally contribute to ensuring reliability and validity and, thus, the rigor of a study.

  19. PDF VALIDITY OF QUANTITATIVE RESEARCH

    Statistical conclusion validity is an issue whenever statistical tests are used to test hypotheses. The research design can address threats to validity through. considerations of statistical power. alpha reduction procedures (e.g., Bonferoni technique) when multiple tests are used. use of reliable instruments.

  20. Reliability and Validity

    Validity is dependent on reliability and relevance. Relevance is the degree to which a test pertains to its objectives, described earlier. If reliability is a part of validity, you can begin to see how the two are related. Unfortunately, this also leads to some confusion about the two.

  21. Reliability, Validity and Ethics

    This chapter is about writing about the procedure of the research. This includes a discussion of reliability, validity and the ethics of research and writing. The level of detail about these issues varies across texts, but the reliability and validity of the study must feature in the text. Some-times these issues are evident from the research ...

  22. Maximizing Data Accuracy: Validation and Reliability of Surveys

    Survey validity and reliability are achieved by monitoring these four processes: Ensuring that the questions are correctly formulated and make sense. Ensuring that the sample is representative of the target population. Eliminating biases in survey design and execution. Ensuring that the collected data is as accurate as possible.

  23. Frontiers

    Keywords: attention, attention assessment, broken ring enVision search test, reliability, validity, age, education level, gender. Citation: Shi Y and Zhang Y (2024) Reliability and validity of a novel attention assessment scale (broken ring enVision search test) in the Chinese population. Front. Psychol. 15:1375326. doi: 10.3389/fpsyg.2024.1375326

  24. Ensure Data Reliability in Project Management

    Selecting appropriate research methods is crucial for data reliability and validity. Whether you use qualitative or quantitative approaches, your methods should align with your research goals.

  25. A Quantitative Assessment of Visual Function for Young and Medically

    Future study of the VFA intrarater reliability and validity is warranted. ### Competing Interest Statement The authors have declared no competing interest. ### Funding Statement This work was supported by the EyeSight Foundation of Alabama, Alie B. Gorrie Low Vision Research Fund and Research to Prevent Blindness.

  26. Research Critique- Quantitative (docx)

    Quantitative Critique 4 The study's reliability and broader application in the educational landscape should be expanded through future research. Future research needs to have a large sample size that provides a representative population to reduce risks of sampling bias. The participant population should also be diversified to account for the possible variations in the level of engagement among ...

  27. Heard Libraries' research engagement collaborations bolster faculty

    Collaborations between Vanderbilt University faculty and librarians at the Jean and Alexander Heard Libraries highlight an important facet of the academic librarian's role: leveraging research expertise to find and organize information and evaluate its quality, accuracy and validity to bolster faculty scholarship.