• Privacy Policy

Research Method

Home » Variables in Research – Definition, Types and Examples

Variables in Research – Definition, Types and Examples

Table of Contents

Variables in Research

Variables in Research

Definition:

In Research, Variables refer to characteristics or attributes that can be measured, manipulated, or controlled. They are the factors that researchers observe or manipulate to understand the relationship between them and the outcomes of interest.

Types of Variables in Research

Types of Variables in Research are as follows:

Independent Variable

This is the variable that is manipulated by the researcher. It is also known as the predictor variable, as it is used to predict changes in the dependent variable. Examples of independent variables include age, gender, dosage, and treatment type.

Dependent Variable

This is the variable that is measured or observed to determine the effects of the independent variable. It is also known as the outcome variable, as it is the variable that is affected by the independent variable. Examples of dependent variables include blood pressure, test scores, and reaction time.

Confounding Variable

This is a variable that can affect the relationship between the independent variable and the dependent variable. It is a variable that is not being studied but could impact the results of the study. For example, in a study on the effects of a new drug on a disease, a confounding variable could be the patient’s age, as older patients may have more severe symptoms.

Mediating Variable

This is a variable that explains the relationship between the independent variable and the dependent variable. It is a variable that comes in between the independent and dependent variables and is affected by the independent variable, which then affects the dependent variable. For example, in a study on the relationship between exercise and weight loss, the mediating variable could be metabolism, as exercise can increase metabolism, which can then lead to weight loss.

Moderator Variable

This is a variable that affects the strength or direction of the relationship between the independent variable and the dependent variable. It is a variable that influences the effect of the independent variable on the dependent variable. For example, in a study on the effects of caffeine on cognitive performance, the moderator variable could be age, as older adults may be more sensitive to the effects of caffeine than younger adults.

Control Variable

This is a variable that is held constant or controlled by the researcher to ensure that it does not affect the relationship between the independent variable and the dependent variable. Control variables are important to ensure that any observed effects are due to the independent variable and not to other factors. For example, in a study on the effects of a new teaching method on student performance, the control variables could include class size, teacher experience, and student demographics.

Continuous Variable

This is a variable that can take on any value within a certain range. Continuous variables can be measured on a scale and are often used in statistical analyses. Examples of continuous variables include height, weight, and temperature.

Categorical Variable

This is a variable that can take on a limited number of values or categories. Categorical variables can be nominal or ordinal. Nominal variables have no inherent order, while ordinal variables have a natural order. Examples of categorical variables include gender, race, and educational level.

Discrete Variable

This is a variable that can only take on specific values. Discrete variables are often used in counting or frequency analyses. Examples of discrete variables include the number of siblings a person has, the number of times a person exercises in a week, and the number of students in a classroom.

Dummy Variable

This is a variable that takes on only two values, typically 0 and 1, and is used to represent categorical variables in statistical analyses. Dummy variables are often used when a categorical variable cannot be used directly in an analysis. For example, in a study on the effects of gender on income, a dummy variable could be created, with 0 representing female and 1 representing male.

Extraneous Variable

This is a variable that has no relationship with the independent or dependent variable but can affect the outcome of the study. Extraneous variables can lead to erroneous conclusions and can be controlled through random assignment or statistical techniques.

Latent Variable

This is a variable that cannot be directly observed or measured, but is inferred from other variables. Latent variables are often used in psychological or social research to represent constructs such as personality traits, attitudes, or beliefs.

Moderator-mediator Variable

This is a variable that acts both as a moderator and a mediator. It can moderate the relationship between the independent and dependent variables and also mediate the relationship between the independent and dependent variables. Moderator-mediator variables are often used in complex statistical analyses.

Variables Analysis Methods

There are different methods to analyze variables in research, including:

  • Descriptive statistics: This involves analyzing and summarizing data using measures such as mean, median, mode, range, standard deviation, and frequency distribution. Descriptive statistics are useful for understanding the basic characteristics of a data set.
  • Inferential statistics : This involves making inferences about a population based on sample data. Inferential statistics use techniques such as hypothesis testing, confidence intervals, and regression analysis to draw conclusions from data.
  • Correlation analysis: This involves examining the relationship between two or more variables. Correlation analysis can determine the strength and direction of the relationship between variables, and can be used to make predictions about future outcomes.
  • Regression analysis: This involves examining the relationship between an independent variable and a dependent variable. Regression analysis can be used to predict the value of the dependent variable based on the value of the independent variable, and can also determine the significance of the relationship between the two variables.
  • Factor analysis: This involves identifying patterns and relationships among a large number of variables. Factor analysis can be used to reduce the complexity of a data set and identify underlying factors or dimensions.
  • Cluster analysis: This involves grouping data into clusters based on similarities between variables. Cluster analysis can be used to identify patterns or segments within a data set, and can be useful for market segmentation or customer profiling.
  • Multivariate analysis : This involves analyzing multiple variables simultaneously. Multivariate analysis can be used to understand complex relationships between variables, and can be useful in fields such as social science, finance, and marketing.

Examples of Variables

  • Age : This is a continuous variable that represents the age of an individual in years.
  • Gender : This is a categorical variable that represents the biological sex of an individual and can take on values such as male and female.
  • Education level: This is a categorical variable that represents the level of education completed by an individual and can take on values such as high school, college, and graduate school.
  • Income : This is a continuous variable that represents the amount of money earned by an individual in a year.
  • Weight : This is a continuous variable that represents the weight of an individual in kilograms or pounds.
  • Ethnicity : This is a categorical variable that represents the ethnic background of an individual and can take on values such as Hispanic, African American, and Asian.
  • Time spent on social media : This is a continuous variable that represents the amount of time an individual spends on social media in minutes or hours per day.
  • Marital status: This is a categorical variable that represents the marital status of an individual and can take on values such as married, divorced, and single.
  • Blood pressure : This is a continuous variable that represents the force of blood against the walls of arteries in millimeters of mercury.
  • Job satisfaction : This is a continuous variable that represents an individual’s level of satisfaction with their job and can be measured using a Likert scale.

Applications of Variables

Variables are used in many different applications across various fields. Here are some examples:

  • Scientific research: Variables are used in scientific research to understand the relationships between different factors and to make predictions about future outcomes. For example, scientists may study the effects of different variables on plant growth or the impact of environmental factors on animal behavior.
  • Business and marketing: Variables are used in business and marketing to understand customer behavior and to make decisions about product development and marketing strategies. For example, businesses may study variables such as consumer preferences, spending habits, and market trends to identify opportunities for growth.
  • Healthcare : Variables are used in healthcare to monitor patient health and to make treatment decisions. For example, doctors may use variables such as blood pressure, heart rate, and cholesterol levels to diagnose and treat cardiovascular disease.
  • Education : Variables are used in education to measure student performance and to evaluate the effectiveness of teaching strategies. For example, teachers may use variables such as test scores, attendance, and class participation to assess student learning.
  • Social sciences : Variables are used in social sciences to study human behavior and to understand the factors that influence social interactions. For example, sociologists may study variables such as income, education level, and family structure to examine patterns of social inequality.

Purpose of Variables

Variables serve several purposes in research, including:

  • To provide a way of measuring and quantifying concepts: Variables help researchers measure and quantify abstract concepts such as attitudes, behaviors, and perceptions. By assigning numerical values to these concepts, researchers can analyze and compare data to draw meaningful conclusions.
  • To help explain relationships between different factors: Variables help researchers identify and explain relationships between different factors. By analyzing how changes in one variable affect another variable, researchers can gain insight into the complex interplay between different factors.
  • To make predictions about future outcomes : Variables help researchers make predictions about future outcomes based on past observations. By analyzing patterns and relationships between different variables, researchers can make informed predictions about how different factors may affect future outcomes.
  • To test hypotheses: Variables help researchers test hypotheses and theories. By collecting and analyzing data on different variables, researchers can test whether their predictions are accurate and whether their hypotheses are supported by the evidence.

Characteristics of Variables

Characteristics of Variables are as follows:

  • Measurement : Variables can be measured using different scales, such as nominal, ordinal, interval, or ratio scales. The scale used to measure a variable can affect the type of statistical analysis that can be applied.
  • Range : Variables have a range of values that they can take on. The range can be finite, such as the number of students in a class, or infinite, such as the range of possible values for a continuous variable like temperature.
  • Variability : Variables can have different levels of variability, which refers to the degree to which the values of the variable differ from each other. Highly variable variables have a wide range of values, while low variability variables have values that are more similar to each other.
  • Validity and reliability : Variables should be both valid and reliable to ensure accurate and consistent measurement. Validity refers to the extent to which a variable measures what it is intended to measure, while reliability refers to the consistency of the measurement over time.
  • Directionality: Some variables have directionality, meaning that the relationship between the variables is not symmetrical. For example, in a study of the relationship between smoking and lung cancer, smoking is the independent variable and lung cancer is the dependent variable.

Advantages of Variables

Here are some of the advantages of using variables in research:

  • Control : Variables allow researchers to control the effects of external factors that could influence the outcome of the study. By manipulating and controlling variables, researchers can isolate the effects of specific factors and measure their impact on the outcome.
  • Replicability : Variables make it possible for other researchers to replicate the study and test its findings. By defining and measuring variables consistently, other researchers can conduct similar studies to validate the original findings.
  • Accuracy : Variables make it possible to measure phenomena accurately and objectively. By defining and measuring variables precisely, researchers can reduce bias and increase the accuracy of their findings.
  • Generalizability : Variables allow researchers to generalize their findings to larger populations. By selecting variables that are representative of the population, researchers can draw conclusions that are applicable to a broader range of individuals.
  • Clarity : Variables help researchers to communicate their findings more clearly and effectively. By defining and categorizing variables, researchers can organize and present their findings in a way that is easily understandable to others.

Disadvantages of Variables

Here are some of the main disadvantages of using variables in research:

  • Simplification : Variables may oversimplify the complexity of real-world phenomena. By breaking down a phenomenon into variables, researchers may lose important information and context, which can affect the accuracy and generalizability of their findings.
  • Measurement error : Variables rely on accurate and precise measurement, and measurement error can affect the reliability and validity of research findings. The use of subjective or poorly defined variables can also introduce measurement error into the study.
  • Confounding variables : Confounding variables are factors that are not measured but that affect the relationship between the variables of interest. If confounding variables are not accounted for, they can distort or obscure the relationship between the variables of interest.
  • Limited scope: Variables are defined by the researcher, and the scope of the study is therefore limited by the researcher’s choice of variables. This can lead to a narrow focus that overlooks important aspects of the phenomenon being studied.
  • Ethical concerns: The selection and measurement of variables may raise ethical concerns, especially in studies involving human subjects. For example, using variables that are related to sensitive topics, such as race or sexuality, may raise concerns about privacy and discrimination.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Nominal Variable

Nominal Variable – Definition, Purpose and...

Attribute

Attribute – Meanings, Definition and Examples

Interval Variable

Interval Variable – Definition, Purpose and...

Qualitative Variable

Qualitative Variable – Types and Examples

Moderating Variable

Moderating Variable – Definition, Analysis...

Ordinal Variable

Ordinal Variable – Definition, Purpose and...

What are Examples of Variables in Research?

Table of contents, introduction.

In writing your thesis, one of the first terms that you encounter is the word variable. Failure to understand the meaning and the usefulness of variables in your study will prevent you from doing excellent research. What are variables, and how do you use variables in your research?

You may find it challenging to understand just what variables are in research, especially those that deal with quantitative data analysis. This initial difficulty about variables becomes much more confusing when you encounter the phrases “dependent variable” and “independent variable” as you go deeper in studying this vital concept of research, as well as statistics.

Therefore, it is a must that you should be able to grasp thoroughly the meaning of variables and ways on how to measure them. Yes, the variables should be measurable so that you will use your data for statistical analysis.

Definition of Variable

Variables are those simplified portions of the complex phenomena that you intend to study. The word variable is derived from the root word “vary,” meaning, changing in amount, volume, number, form, nature, or type. These variables should be measurable, i.e., they can be counted or subjected to a scale.

Examples of Variables in Research: 6 Phenomena

Phenomenon 1: climate change, phenomenon 2: crime and violence in the streets, phenomenon 3: poor performance of students in college entrance exams, phenomenon 4: fish kill, phenomenon 5: poor crop growth, phenomenon 6:  how content goes viral.

Notice in the above variable examples that all the factors listed under the phenomena can be counted or measured using an ordinal, ratio, or interval scale, except for the last one. The factors that influence how content goes viral are essentially subjective.

The expected values derived from these variables will be in terms of numbers, amount, category, or type. Quantified variables allow statistical analysis . Variable descriptions, correlations, or differences are then determined.

Difference Between Independent and Dependent Variables

Independent variables.

For example, in the second phenomenon, i.e., crime and violence in the streets, the independent variables are the number of law enforcers. If there are more law enforcers, it is expected that it will reduce the following:

Dependent Variables

For example, in the first phenomenon on climate change, temperature as the independent variable influences sea level rise, the dependent variable. Increased temperature will cause the expansion of water in the sea. Thus, sea-level rise on a global scale will occur.

Finding the relationship between variables

Finding the relationship between variables requires a thorough  review of the literature . Through a review of the relevant and reliable literature, you will find out which variables influence the other variable. You do not just guess relationships between variables. The entire process is the essence of research.

At this point, I believe that the concept of the variable is now clear to you. Share this information with your peers, who may have difficulty in understanding what the variables are in research.

Related Posts

Method and methodology: the difference, when to stop searching the literature: three tips, regression analysis: 5 steps and 4 applications, about the author, patrick regoniel, 128 comments.

Your question is unclear to me Biyaminu. What do you mean? If you want to cite this, see the citation box after the article.

I salute your work, before I was have no enough knowledge about variable I think I was claimed from my lecturers, but the real meaning I was in the mid night. thanks

thanks for the explanation a bout variables. keep on posting information a bout reseach on my email.

You can see in the last part of the above article an explanation about dependent and independent variables.

I am requested to write 50 variables in my research as per my topic which is about street vending. I am really clueless.

Dear Alhaji, just be clear about what you want to do. Your research question must be clearly stated before you build your conceptual framework.

Can you please give me what are the possible variables in terms of installation of street lights along barangay roads of calauan, laguna: an assessment?

SimplyEducate.Me Privacy Policy

variables in research paper example

Research Variables 101

Independent variables, dependent variables, control variables and more

By: Derek Jansen (MBA) | Expert Reviewed By: Kerryn Warren (PhD) | January 2023

If you’re new to the world of research, especially scientific research, you’re bound to run into the concept of variables , sooner or later. If you’re feeling a little confused, don’t worry – you’re not the only one! Independent variables, dependent variables, confounding variables – it’s a lot of jargon. In this post, we’ll unpack the terminology surrounding research variables using straightforward language and loads of examples .

Overview: Variables In Research

1. ?
2. variables
3. variables
4. variables

5. variables
6. variables
7. variables
8. variables

What (exactly) is a variable?

The simplest way to understand a variable is as any characteristic or attribute that can experience change or vary over time or context – hence the name “variable”. For example, the dosage of a particular medicine could be classified as a variable, as the amount can vary (i.e., a higher dose or a lower dose). Similarly, gender, age or ethnicity could be considered demographic variables, because each person varies in these respects.

Within research, especially scientific research, variables form the foundation of studies, as researchers are often interested in how one variable impacts another, and the relationships between different variables. For example:

  • How someone’s age impacts their sleep quality
  • How different teaching methods impact learning outcomes
  • How diet impacts weight (gain or loss)

As you can see, variables are often used to explain relationships between different elements and phenomena. In scientific studies, especially experimental studies, the objective is often to understand the causal relationships between variables. In other words, the role of cause and effect between variables. This is achieved by manipulating certain variables while controlling others – and then observing the outcome. But, we’ll get into that a little later…

The “Big 3” Variables

Variables can be a little intimidating for new researchers because there are a wide variety of variables, and oftentimes, there are multiple labels for the same thing. To lay a firm foundation, we’ll first look at the three main types of variables, namely:

  • Independent variables (IV)
  • Dependant variables (DV)
  • Control variables

What is an independent variable?

Simply put, the independent variable is the “ cause ” in the relationship between two (or more) variables. In other words, when the independent variable changes, it has an impact on another variable.

For example:

  • Increasing the dosage of a medication (Variable A) could result in better (or worse) health outcomes for a patient (Variable B)
  • Changing a teaching method (Variable A) could impact the test scores that students earn in a standardised test (Variable B)
  • Varying one’s diet (Variable A) could result in weight loss or gain (Variable B).

It’s useful to know that independent variables can go by a few different names, including, explanatory variables (because they explain an event or outcome) and predictor variables (because they predict the value of another variable). Terminology aside though, the most important takeaway is that independent variables are assumed to be the “cause” in any cause-effect relationship. As you can imagine, these types of variables are of major interest to researchers, as many studies seek to understand the causal factors behind a phenomenon.

Need a helping hand?

variables in research paper example

What is a dependent variable?

While the independent variable is the “ cause ”, the dependent variable is the “ effect ” – or rather, the affected variable . In other words, the dependent variable is the variable that is assumed to change as a result of a change in the independent variable.

Keeping with the previous example, let’s look at some dependent variables in action:

  • Health outcomes (DV) could be impacted by dosage changes of a medication (IV)
  • Students’ scores (DV) could be impacted by teaching methods (IV)
  • Weight gain or loss (DV) could be impacted by diet (IV)

In scientific studies, researchers will typically pay very close attention to the dependent variable (or variables), carefully measuring any changes in response to hypothesised independent variables. This can be tricky in practice, as it’s not always easy to reliably measure specific phenomena or outcomes – or to be certain that the actual cause of the change is in fact the independent variable.

As the adage goes, correlation is not causation . In other words, just because two variables have a relationship doesn’t mean that it’s a causal relationship – they may just happen to vary together. For example, you could find a correlation between the number of people who own a certain brand of car and the number of people who have a certain type of job. Just because the number of people who own that brand of car and the number of people who have that type of job is correlated, it doesn’t mean that owning that brand of car causes someone to have that type of job or vice versa. The correlation could, for example, be caused by another factor such as income level or age group, which would affect both car ownership and job type.

To confidently establish a causal relationship between an independent variable and a dependent variable (i.e., X causes Y), you’ll typically need an experimental design , where you have complete control over the environmen t and the variables of interest. But even so, this doesn’t always translate into the “real world”. Simply put, what happens in the lab sometimes stays in the lab!

As an alternative to pure experimental research, correlational or “ quasi-experimental ” research (where the researcher cannot manipulate or change variables) can be done on a much larger scale more easily, allowing one to understand specific relationships in the real world. These types of studies also assume some causality between independent and dependent variables, but it’s not always clear. So, if you go this route, you need to be cautious in terms of how you describe the impact and causality between variables and be sure to acknowledge any limitations in your own research.

Research methodology webinar

What is a control variable?

In an experimental design, a control variable (or controlled variable) is a variable that is intentionally held constant to ensure it doesn’t have an influence on any other variables. As a result, this variable remains unchanged throughout the course of the study. In other words, it’s a variable that’s not allowed to vary – tough life 🙂

As we mentioned earlier, one of the major challenges in identifying and measuring causal relationships is that it’s difficult to isolate the impact of variables other than the independent variable. Simply put, there’s always a risk that there are factors beyond the ones you’re specifically looking at that might be impacting the results of your study. So, to minimise the risk of this, researchers will attempt (as best possible) to hold other variables constant . These factors are then considered control variables.

Some examples of variables that you may need to control include:

  • Temperature
  • Time of day
  • Noise or distractions

Which specific variables need to be controlled for will vary tremendously depending on the research project at hand, so there’s no generic list of control variables to consult. As a researcher, you’ll need to think carefully about all the factors that could vary within your research context and then consider how you’ll go about controlling them. A good starting point is to look at previous studies similar to yours and pay close attention to which variables they controlled for.

Of course, you won’t always be able to control every possible variable, and so, in many cases, you’ll just have to acknowledge their potential impact and account for them in the conclusions you draw. Every study has its limitations , so don’t get fixated or discouraged by troublesome variables. Nevertheless, always think carefully about the factors beyond what you’re focusing on – don’t make assumptions!

 A control variable is intentionally held constant (it doesn't vary) to ensure it doesn’t have an influence on any other variables.

Other types of variables

As we mentioned, independent, dependent and control variables are the most common variables you’ll come across in your research, but they’re certainly not the only ones you need to be aware of. Next, we’ll look at a few “secondary” variables that you need to keep in mind as you design your research.

  • Moderating variables
  • Mediating variables
  • Confounding variables
  • Latent variables

Let’s jump into it…

What is a moderating variable?

A moderating variable is a variable that influences the strength or direction of the relationship between an independent variable and a dependent variable. In other words, moderating variables affect how much (or how little) the IV affects the DV, or whether the IV has a positive or negative relationship with the DV (i.e., moves in the same or opposite direction).

For example, in a study about the effects of sleep deprivation on academic performance, gender could be used as a moderating variable to see if there are any differences in how men and women respond to a lack of sleep. In such a case, one may find that gender has an influence on how much students’ scores suffer when they’re deprived of sleep.

It’s important to note that while moderators can have an influence on outcomes , they don’t necessarily cause them ; rather they modify or “moderate” existing relationships between other variables. This means that it’s possible for two different groups with similar characteristics, but different levels of moderation, to experience very different results from the same experiment or study design.

What is a mediating variable?

Mediating variables are often used to explain the relationship between the independent and dependent variable (s). For example, if you were researching the effects of age on job satisfaction, then education level could be considered a mediating variable, as it may explain why older people have higher job satisfaction than younger people – they may have more experience or better qualifications, which lead to greater job satisfaction.

Mediating variables also help researchers understand how different factors interact with each other to influence outcomes. For instance, if you wanted to study the effect of stress on academic performance, then coping strategies might act as a mediating factor by influencing both stress levels and academic performance simultaneously. For example, students who use effective coping strategies might be less stressed but also perform better academically due to their improved mental state.

In addition, mediating variables can provide insight into causal relationships between two variables by helping researchers determine whether changes in one factor directly cause changes in another – or whether there is an indirect relationship between them mediated by some third factor(s). For instance, if you wanted to investigate the impact of parental involvement on student achievement, you would need to consider family dynamics as a potential mediator, since it could influence both parental involvement and student achievement simultaneously.

Mediating variables can explain the relationship between the independent and dependent variable, including whether it's causal or not.

What is a confounding variable?

A confounding variable (also known as a third variable or lurking variable ) is an extraneous factor that can influence the relationship between two variables being studied. Specifically, for a variable to be considered a confounding variable, it needs to meet two criteria:

  • It must be correlated with the independent variable (this can be causal or not)
  • It must have a causal impact on the dependent variable (i.e., influence the DV)

Some common examples of confounding variables include demographic factors such as gender, ethnicity, socioeconomic status, age, education level, and health status. In addition to these, there are also environmental factors to consider. For example, air pollution could confound the impact of the variables of interest in a study investigating health outcomes.

Naturally, it’s important to identify as many confounding variables as possible when conducting your research, as they can heavily distort the results and lead you to draw incorrect conclusions . So, always think carefully about what factors may have a confounding effect on your variables of interest and try to manage these as best you can.

What is a latent variable?

Latent variables are unobservable factors that can influence the behaviour of individuals and explain certain outcomes within a study. They’re also known as hidden or underlying variables , and what makes them rather tricky is that they can’t be directly observed or measured . Instead, latent variables must be inferred from other observable data points such as responses to surveys or experiments.

For example, in a study of mental health, the variable “resilience” could be considered a latent variable. It can’t be directly measured , but it can be inferred from measures of mental health symptoms, stress, and coping mechanisms. The same applies to a lot of concepts we encounter every day – for example:

  • Emotional intelligence
  • Quality of life
  • Business confidence
  • Ease of use

One way in which we overcome the challenge of measuring the immeasurable is latent variable models (LVMs). An LVM is a type of statistical model that describes a relationship between observed variables and one or more unobserved (latent) variables. These models allow researchers to uncover patterns in their data which may not have been visible before, thanks to their complexity and interrelatedness with other variables. Those patterns can then inform hypotheses about cause-and-effect relationships among those same variables which were previously unknown prior to running the LVM. Powerful stuff, we say!

Latent variables are unobservable factors that can influence the behaviour of individuals and explain certain outcomes within a study.

Let’s recap

In the world of scientific research, there’s no shortage of variable types, some of which have multiple names and some of which overlap with each other. In this post, we’ve covered some of the popular ones, but remember that this is not an exhaustive list .

To recap, we’ve explored:

  • Independent variables (the “cause”)
  • Dependent variables (the “effect”)
  • Control variables (the variable that’s not allowed to vary)

If you’re still feeling a bit lost and need a helping hand with your research project, check out our 1-on-1 coaching service , where we guide you through each step of the research journey. Also, be sure to check out our free dissertation writing course and our collection of free, fully-editable chapter templates .

variables in research paper example

Psst... there’s more!

This post was based on one of our popular Research Bootcamps . If you're working on a research project, you'll definitely want to check this out ...

Fiona

Very informative, concise and helpful. Thank you

Ige Samuel Babatunde

Helping information.Thanks

Ancel George

practical and well-demonstrated

Michael

Very helpful and insightful

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

variables in research paper example

  • Print Friendly

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Types of Variables in Research | Definitions & Examples

Types of Variables in Research | Definitions & Examples

Published on 19 September 2022 by Rebecca Bevans . Revised on 28 November 2022.

In statistical research, a variable is defined as an attribute of an object of study. Choosing which variables to measure is central to good experimental design .

You need to know which types of variables you are working with in order to choose appropriate statistical tests and interpret the results of your study.

You can usually identify the type of variable by asking two questions:

  • What type of data does the variable contain?
  • What part of the experiment does the variable represent?

Table of contents

Types of data: quantitative vs categorical variables, parts of the experiment: independent vs dependent variables, other common types of variables, frequently asked questions about variables.

Data is a specific measurement of a variable – it is the value you record in your data sheet. Data is generally divided into two categories:

  • Quantitative data represents amounts.
  • Categorical data represents groupings.

A variable that contains quantitative data is a quantitative variable ; a variable that contains categorical data is a categorical variable . Each of these types of variable can be broken down into further types.

Quantitative variables

When you collect quantitative data, the numbers you record represent real amounts that can be added, subtracted, divided, etc. There are two types of quantitative variables: discrete and continuous .

Discrete vs continuous variables
Type of variable What does the data represent? Examples
Discrete variables (aka integer variables) Counts of individual items or values.
Continuous variables (aka ratio variables) Measurements of continuous or non-finite values.

Categorical variables

Categorical variables represent groupings of some kind. They are sometimes recorded as numbers, but the numbers represent categories rather than actual amounts of things.

There are three types of categorical variables: binary , nominal , and ordinal variables.

Binary vs nominal vs ordinal variables
Type of variable What does the data represent? Examples
Binary variables (aka dichotomous variables) Yes/no outcomes.
Nominal variables Groups with no rank or order between them.
Ordinal variables Groups that are ranked in a specific order.

*Note that sometimes a variable can work as more than one type! An ordinal variable can also be used as a quantitative variable if the scale is numeric and doesn’t need to be kept as discrete integers. For example, star ratings on product reviews are ordinal (1 to 5 stars), but the average star rating is quantitative.

Example data sheet

To keep track of your salt-tolerance experiment, you make a data sheet where you record information about the variables in the experiment, like salt addition and plant health.

To gather information about plant responses over time, you can fill out the same data sheet every few days until the end of the experiment. This example sheet is colour-coded according to the type of variable: nominal , continuous , ordinal , and binary .

Example data sheet showing types of variables in a plant salt tolerance experiment

Prevent plagiarism, run a free check.

Experiments are usually designed to find out what effect one variable has on another – in our example, the effect of salt addition on plant growth.

You manipulate the independent variable (the one you think might be the cause ) and then measure the dependent variable (the one you think might be the effect ) to find out what this effect might be.

You will probably also have variables that you hold constant ( control variables ) in order to focus on your experimental treatment.

Independent vs dependent vs control variables
Type of variable Definition Example (salt tolerance experiment)
Independent variables (aka treatment variables) Variables you manipulate in order to affect the outcome of an experiment. The amount of salt added to each plant’s water.
Dependent variables (aka response variables) Variables that represent the outcome of the experiment. Any measurement of plant health and growth: in this case, plant height and wilting.
Control variables Variables that are held constant throughout the experiment. The temperature and light in the room the plants are kept in, and the volume of water given to each plant.

In this experiment, we have one independent and three dependent variables.

The other variables in the sheet can’t be classified as independent or dependent, but they do contain data that you will need in order to interpret your dependent and independent variables.

Example of a data sheet showing dependent and independent variables for a plant salt tolerance experiment.

What about correlational research?

When you do correlational research , the terms ‘dependent’ and ‘independent’ don’t apply, because you are not trying to establish a cause-and-effect relationship.

However, there might be cases where one variable clearly precedes the other (for example, rainfall leads to mud, rather than the other way around). In these cases, you may call the preceding variable (i.e., the rainfall) the predictor variable and the following variable (i.e., the mud) the outcome variable .

Once you have defined your independent and dependent variables and determined whether they are categorical or quantitative, you will be able to choose the correct statistical test .

But there are many other ways of describing variables that help with interpreting your results. Some useful types of variable are listed below.

Type of variable Definition Example (salt tolerance experiment)
A variable that hides the true effect of another variable in your experiment. This can happen when another variable is closely related to a variable you are interested in, but you haven’t controlled it in your experiment. Pot size and soil type might affect plant survival as much as or more than salt additions. In an experiment, you would control these potential confounders by holding them constant.
Latent variables A variable that can’t be directly measured, but that you represent via a proxy. Salt tolerance in plants cannot be measured directly, but can be inferred from measurements of plant health in our salt-addition experiment.
Composite variables A variable that is made by combining multiple variables in an experiment. These variables are created when you analyse data, not when you measure it. The three plant-health variables could be combined into a single plant-health score to make it easier to present your findings.

A confounding variable is closely related to both the independent and dependent variables in a study. An independent variable represents the supposed cause , while the dependent variable is the supposed effect . A confounding variable is a third variable that influences both the independent and dependent variables.

Failing to account for confounding variables can cause you to wrongly estimate the relationship between your independent and dependent variables.

Discrete and continuous variables are two types of quantitative variables :

  • Discrete variables represent counts (e.g., the number of objects in a collection).
  • Continuous variables represent measurable amounts (e.g., water volume or weight).

You can think of independent and dependent variables in terms of cause and effect: an independent variable is the variable you think is the cause , while a dependent variable is the effect .

In an experiment, you manipulate the independent variable and measure the outcome in the dependent variable. For example, in an experiment about the effect of nutrients on crop growth:

  • The  independent variable  is the amount of nutrients added to the crop field.
  • The  dependent variable is the biomass of the crops at harvest time.

Defining your variables, and deciding how you will manipulate and measure them, is an important part of experimental design .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Bevans, R. (2022, November 28). Types of Variables in Research | Definitions & Examples. Scribbr. Retrieved 23 September 2024, from https://www.scribbr.co.uk/research-methods/variables-types/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, a quick guide to experimental design | 5 steps & examples, quasi-experimental design | definition, types & examples, construct validity | definition, types, & examples.

  • USC Libraries
  • Research Guides

Organizing Your Social Sciences Research Paper

  • Independent and Dependent Variables
  • Purpose of Guide
  • Design Flaws to Avoid
  • Glossary of Research Terms
  • Reading Research Effectively
  • Narrowing a Topic Idea
  • Broadening a Topic Idea
  • Extending the Timeliness of a Topic Idea
  • Academic Writing Style
  • Applying Critical Thinking
  • Choosing a Title
  • Making an Outline
  • Paragraph Development
  • Research Process Video Series
  • Executive Summary
  • The C.A.R.S. Model
  • Background Information
  • The Research Problem/Question
  • Theoretical Framework
  • Citation Tracking
  • Content Alert Services
  • Evaluating Sources
  • Primary Sources
  • Secondary Sources
  • Tiertiary Sources
  • Scholarly vs. Popular Publications
  • Qualitative Methods
  • Quantitative Methods
  • Insiderness
  • Using Non-Textual Elements
  • Limitations of the Study
  • Common Grammar Mistakes
  • Writing Concisely
  • Avoiding Plagiarism
  • Footnotes or Endnotes?
  • Further Readings
  • Generative AI and Writing
  • USC Libraries Tutorials and Other Guides
  • Bibliography

Definitions

Dependent Variable The variable that depends on other factors that are measured. These variables are expected to change as a result of an experimental manipulation of the independent variable or variables. It is the presumed effect.

Independent Variable The variable that is stable and unaffected by the other variables you are trying to measure. It refers to the condition of an experiment that is systematically manipulated by the investigator. It is the presumed cause.

Cramer, Duncan and Dennis Howitt. The SAGE Dictionary of Statistics . London: SAGE, 2004; Penslar, Robin Levin and Joan P. Porter. Institutional Review Board Guidebook: Introduction . Washington, DC: United States Department of Health and Human Services, 2010; "What are Dependent and Independent Variables?" Graphic Tutorial.

Identifying Dependent and Independent Variables

Don't feel bad if you are confused about what is the dependent variable and what is the independent variable in social and behavioral sciences research . However, it's important that you learn the difference because framing a study using these variables is a common approach to organizing the elements of a social sciences research study in order to discover relevant and meaningful results. Specifically, it is important for these two reasons:

  • You need to understand and be able to evaluate their application in other people's research.
  • You need to apply them correctly in your own research.

A variable in research simply refers to a person, place, thing, or phenomenon that you are trying to measure in some way. The best way to understand the difference between a dependent and independent variable is that the meaning of each is implied by what the words tell us about the variable you are using. You can do this with a simple exercise from the website, Graphic Tutorial. Take the sentence, "The [independent variable] causes a change in [dependent variable] and it is not possible that [dependent variable] could cause a change in [independent variable]." Insert the names of variables you are using in the sentence in the way that makes the most sense. This will help you identify each type of variable. If you're still not sure, consult with your professor before you begin to write.

Fan, Shihe. "Independent Variable." In Encyclopedia of Research Design. Neil J. Salkind, editor. (Thousand Oaks, CA: SAGE, 2010), pp. 592-594; "What are Dependent and Independent Variables?" Graphic Tutorial; Salkind, Neil J. "Dependent Variable." In Encyclopedia of Research Design , Neil J. Salkind, editor. (Thousand Oaks, CA: SAGE, 2010), pp. 348-349;

Structure and Writing Style

The process of examining a research problem in the social and behavioral sciences is often framed around methods of analysis that compare, contrast, correlate, average, or integrate relationships between or among variables . Techniques include associations, sampling, random selection, and blind selection. Designation of the dependent and independent variable involves unpacking the research problem in a way that identifies a general cause and effect and classifying these variables as either independent or dependent.

The variables should be outlined in the introduction of your paper and explained in more detail in the methods section . There are no rules about the structure and style for writing about independent or dependent variables but, as with any academic writing, clarity and being succinct is most important.

After you have described the research problem and its significance in relation to prior research, explain why you have chosen to examine the problem using a method of analysis that investigates the relationships between or among independent and dependent variables . State what it is about the research problem that lends itself to this type of analysis. For example, if you are investigating the relationship between corporate environmental sustainability efforts [the independent variable] and dependent variables associated with measuring employee satisfaction at work using a survey instrument, you would first identify each variable and then provide background information about the variables. What is meant by "environmental sustainability"? Are you looking at a particular company [e.g., General Motors] or are you investigating an industry [e.g., the meat packing industry]? Why is employee satisfaction in the workplace important? How does a company make their employees aware of sustainability efforts and why would a company even care that its employees know about these efforts?

Identify each variable for the reader and define each . In the introduction, this information can be presented in a paragraph or two when you describe how you are going to study the research problem. In the methods section, you build on the literature review of prior studies about the research problem to describe in detail background about each variable, breaking each down for measurement and analysis. For example, what activities do you examine that reflect a company's commitment to environmental sustainability? Levels of employee satisfaction can be measured by a survey that asks about things like volunteerism or a desire to stay at the company for a long time.

The structure and writing style of describing the variables and their application to analyzing the research problem should be stated and unpacked in such a way that the reader obtains a clear understanding of the relationships between the variables and why they are important. This is also important so that the study can be replicated in the future using the same variables but applied in a different way.

Fan, Shihe. "Independent Variable." In Encyclopedia of Research Design. Neil J. Salkind, editor. (Thousand Oaks, CA: SAGE, 2010), pp. 592-594; "What are Dependent and Independent Variables?" Graphic Tutorial; “Case Example for Independent and Dependent Variables.” ORI Curriculum Examples. U.S. Department of Health and Human Services, Office of Research Integrity; Salkind, Neil J. "Dependent Variable." In Encyclopedia of Research Design , Neil J. Salkind, editor. (Thousand Oaks, CA: SAGE, 2010), pp. 348-349; “Independent Variables and Dependent Variables.” Karl L. Wuensch, Department of Psychology, East Carolina University [posted email exchange]; “Variables.” Elements of Research. Dr. Camille Nebeker, San Diego State University.

  • << Previous: Design Flaws to Avoid
  • Next: Glossary of Research Terms >>
  • Last Updated: Sep 17, 2024 10:59 AM
  • URL: https://libguides.usc.edu/writingguide

variables in research paper example

Variables in Research | Types, Definiton & Examples

variables in research paper example

Introduction

What is a variable, what are the 5 types of variables in research, other variables in research.

Variables are fundamental components of research that allow for the measurement and analysis of data. They can be defined as characteristics or properties that can take on different values. In research design , understanding the types of variables and their roles is crucial for developing hypotheses , designing methods , and interpreting results .

This article outlines the the types of variables in research, including their definitions and examples, to provide a clear understanding of their use and significance in research studies. By categorizing variables into distinct groups based on their roles in research, their types of data, and their relationships with other variables, researchers can more effectively structure their studies and achieve more accurate conclusions.

variables in research paper example

A variable represents any characteristic, number, or quantity that can be measured or quantified. The term encompasses anything that can vary or change, ranging from simple concepts like age and height to more complex ones like satisfaction levels or economic status. Variables are essential in research as they are the foundational elements that researchers manipulate, measure, or control to gain insights into relationships, causes, and effects within their studies. They enable the framing of research questions, the formulation of hypotheses, and the interpretation of results.

Variables can be categorized based on their role in the study (such as independent and dependent variables ), the type of data they represent (quantitative or categorical), and their relationship to other variables (like confounding or control variables). Understanding what constitutes a variable and the various variable types available is a critical step in designing robust and meaningful research.

variables in research paper example

ATLAS.ti makes complex data easy to understand

Turn to our powerful data analysis tools to make the most of your research. Get started with a free trial.

Variables are crucial components in research, serving as the foundation for data collection , analysis , and interpretation . They are attributes or characteristics that can vary among subjects or over time, and understanding their types is essential for any study. Variables can be broadly classified into five main types, each with its distinct characteristics and roles within research.

This classification helps researchers in designing their studies, choosing appropriate measurement techniques, and analyzing their results accurately. The five types of variables include independent variables, dependent variables, categorical variables, continuous variables, and confounding variables. These categories not only facilitate a clearer understanding of the data but also guide the formulation of hypotheses and research methodologies.

Independent variables

Independent variables are foundational to the structure of research, serving as the factors or conditions that researchers manipulate or vary to observe their effects on dependent variables. These variables are considered "independent" because their variation does not depend on other variables within the study. Instead, they are the cause or stimulus that directly influences the outcomes being measured. For example, in an experiment to assess the effectiveness of a new teaching method on student performance, the teaching method applied (traditional vs. innovative) would be the independent variable.

The selection of an independent variable is a critical step in research design, as it directly correlates with the study's objective to determine causality or association. Researchers must clearly define and control these variables to ensure that observed changes in the dependent variable can be attributed to variations in the independent variable, thereby affirming the reliability of the results. In experimental research, the independent variable is what differentiates the control group from the experimental group, thereby setting the stage for meaningful comparison and analysis.

Dependent variables

Dependent variables are the outcomes or effects that researchers aim to explore and understand in their studies. These variables are called "dependent" because their values depend on the changes or variations of the independent variables.

Essentially, they are the responses or results that are measured to assess the impact of the independent variable's manipulation. For instance, in a study investigating the effect of exercise on weight loss, the amount of weight lost would be considered the dependent variable, as it depends on the exercise regimen (the independent variable).

The identification and measurement of the dependent variable are crucial for testing the hypothesis and drawing conclusions from the research. It allows researchers to quantify the effect of the independent variable , providing evidence for causal relationships or associations. In experimental settings, the dependent variable is what is being tested and measured across different groups or conditions, enabling researchers to assess the efficacy or impact of the independent variable's variation.

To ensure accuracy and reliability, the dependent variable must be defined clearly and measured consistently across all participants or observations. This consistency helps in reducing measurement errors and increases the validity of the research findings. By carefully analyzing the dependent variables, researchers can derive meaningful insights from their studies, contributing to the broader knowledge in their field.

Categorical variables

Categorical variables, also known as qualitative variables, represent types or categories that are used to group observations. These variables divide data into distinct groups or categories that lack a numerical value but hold significant meaning in research. Examples of categorical variables include gender (male, female, other), type of vehicle (car, truck, motorcycle), or marital status (single, married, divorced). These categories help researchers organize data into groups for comparison and analysis.

Categorical variables can be further classified into two subtypes: nominal and ordinal. Nominal variables are categories without any inherent order or ranking among them, such as blood type or ethnicity. Ordinal variables, on the other hand, imply a sort of ranking or order among the categories, like levels of satisfaction (high, medium, low) or education level (high school, bachelor's, master's, doctorate).

Understanding and identifying categorical variables is crucial in research as it influences the choice of statistical analysis methods. Since these variables represent categories without numerical significance, researchers employ specific statistical tests designed for a nominal or ordinal variable to draw meaningful conclusions. Properly classifying and analyzing categorical variables allow for the exploration of relationships between different groups within the study, shedding light on patterns and trends that might not be evident with numerical data alone.

Continuous variables

Continuous variables are quantitative variables that can take an infinite number of values within a given range. These variables are measured along a continuum and can represent very precise measurements. Examples of continuous variables include height, weight, temperature, and time. Because they can assume any value within a range, continuous variables allow for detailed analysis and a high degree of accuracy in research findings.

The ability to measure continuous variables at very fine scales makes them invaluable for many types of research, particularly in the natural and social sciences. For instance, in a study examining the effect of temperature on plant growth, temperature would be considered a continuous variable since it can vary across a wide spectrum and be measured to several decimal places.

When dealing with continuous variables, researchers often use methods incorporating a particular statistical test to accommodate a wide range of data points and the potential for infinite divisibility. This includes various forms of regression analysis, correlation, and other techniques suited for modeling and analyzing nuanced relationships between variables. The precision of continuous variables enhances the researcher's ability to detect patterns, trends, and causal relationships within the data, contributing to more robust and detailed conclusions.

Confounding variables

Confounding variables are those that can cause a false association between the independent and dependent variables, potentially leading to incorrect conclusions about the relationship being studied. These are extraneous variables that were not considered in the study design but can influence both the supposed cause and effect, creating a misleading correlation.

Identifying and controlling for a confounding variable is crucial in research to ensure the validity of the findings. This can be achieved through various methods, including randomization, stratification, and statistical control. Randomization helps to evenly distribute confounding variables across study groups, reducing their potential impact. Stratification involves analyzing the data within strata or layers that share common characteristics of the confounder. Statistical control allows researchers to adjust for the effects of confounders in the analysis phase.

Properly addressing confounding variables strengthens the credibility of research outcomes by clarifying the direct relationship between the dependent and independent variables, thus providing more accurate and reliable results.

variables in research paper example

Beyond the primary categories of variables commonly discussed in research methodology , there exists a diverse range of other variables that play significant roles in the design and analysis of studies. Below is an overview of some of these variables, highlighting their definitions and roles within research studies:

  • Discrete variables : A discrete variable is a quantitative variable that represents quantitative data , such as the number of children in a family or the number of cars in a parking lot. Discrete variables can only take on specific values.
  • Categorical variables : A categorical variable categorizes subjects or items into groups that do not have a natural numerical order. Categorical data includes nominal variables, like country of origin, and ordinal variables, such as education level.
  • Predictor variables : Often used in statistical models, a predictor variable is used to forecast or predict the outcomes of other variables, not necessarily with a causal implication.
  • Outcome variables : These variables represent the results or outcomes that researchers aim to explain or predict through their studies. An outcome variable is central to understanding the effects of predictor variables.
  • Latent variables : Not directly observable, latent variables are inferred from other, directly measured variables. Examples include psychological constructs like intelligence or socioeconomic status.
  • Composite variables : Created by combining multiple variables, composite variables can measure a concept more reliably or simplify the analysis. An example would be a composite happiness index derived from several survey questions .
  • Preceding variables : These variables come before other variables in time or sequence, potentially influencing subsequent outcomes. A preceding variable is crucial in longitudinal studies to determine causality or sequences of events.

variables in research paper example

Master qualitative research with ATLAS.ti

Turn data into critical insights with our data analysis platform. Try out a free trial today.

variables in research paper example

RESEARCH VARIABLES: TYPES, USES AND DEFINITION OF TERMS

  • In book: Research in Education (pp.43-54)
  • Publisher: His Lineage Publishing House

Olayemi Jumoke Abiodun-Oyebanji at University of Ibadan

  • University of Ibadan

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations

Erwin Halim

  • Tiffany Angelene Dharsono
  • Marylise Hebrard
  • Ahmed Ali DARAR
  • Susan Sabillo
  • Merry Jean Tiauzon
  • Mylene A Bautista

Guarin Maguate

  • Elsye Mayshelly
  • Jonathan Phelipe Silaban

Nursyamsi N.L

  • Kapeso Singogo
  • Louis Cohen
  • Lawrence Manion
  • Keith Morrison
  • J O Adeleke
  • O Aderounmu
  • D O Owuamanam
  • A E Uzoagulu
  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

variables in research paper example

Transcription Service for Your Academic Paper

Start Transcription now

Editing & Proofreading for Your Research Paper

Get it proofread now

Online Printing & Binding with Free Express Delivery

Configure binding now

  • Academic essay overview
  • The writing process
  • Structuring academic essays
  • Types of academic essays
  • Academic writing overview
  • Sentence structure
  • Academic writing process
  • Improving your academic writing
  • Titles and headings
  • APA style overview
  • APA citation & referencing
  • APA structure & sections
  • Citation & referencing
  • Structure and sections
  • APA examples overview
  • Commonly used citations
  • Other examples
  • British English vs. American English
  • Chicago style overview
  • Chicago citation & referencing
  • Chicago structure & sections
  • Chicago style examples
  • Citing sources overview
  • Citation format
  • Citation examples
  • College essay overview
  • Application
  • How to write a college essay
  • Types of college essays
  • Commonly confused words
  • Definitions
  • Dissertation overview
  • Dissertation structure & sections
  • Dissertation writing process
  • Graduate school overview
  • Application & admission
  • Study abroad
  • Master degree
  • Harvard referencing overview
  • Language rules overview
  • Grammatical rules & structures
  • Parts of speech
  • Punctuation
  • Methodology overview
  • Analyzing data
  • Experiments
  • Observations
  • Inductive vs. Deductive
  • Qualitative vs. Quantitative
  • Types of validity
  • Types of reliability
  • Sampling methods
  • Theories & Concepts
  • Types of research studies
  • Types of variables
  • MLA style overview
  • MLA examples
  • MLA citation & referencing
  • MLA structure & sections
  • Plagiarism overview
  • Plagiarism checker
  • Types of plagiarism
  • Printing production overview
  • Research bias overview
  • Types of research bias
  • Example sections
  • Types of research papers
  • Research process overview
  • Problem statement
  • Research proposal
  • Research topic
  • Statistics overview
  • Levels of measurment
  • Frequency distribution
  • Measures of central tendency
  • Measures of variability
  • Hypothesis testing
  • Parameters & test statistics
  • Types of distributions
  • Correlation
  • Effect size
  • Hypothesis testing assumptions
  • Types of ANOVAs
  • Types of chi-square
  • Statistical data
  • Statistical models
  • Spelling mistakes
  • Tips overview
  • Academic writing tips
  • Dissertation tips
  • Sources tips
  • Working with sources overview
  • Evaluating sources
  • Finding sources
  • Including sources
  • Types of sources

Your Step to Success

Transcription Service for Your Paper

Printing & Binding with 3D Live Preview

Types of Variables in Research – Definition & Examples

How do you like this article cancel reply.

Save my name, email, and website in this browser for the next time I comment.

types-of-variables-in-research-Definition

A fundamental component in statistical investigations is the methodology you employ in selecting your research variables. The careful selection of appropriate variable types can significantly enhance the robustness of your experimental design . This piece explores the diverse array of variable classifications within the field of statistical research. Additionally, understanding the different types of variables in research can greatly aid in shaping your experimental hypotheses and outcomes.

Inhaltsverzeichnis

  • 1 Types of Variables in Research – In a Nutshell
  • 2 Definition: Types of variables in research
  • 3 Types of variables in research – Quantitative vs. Categorical
  • 4 Types of variables in research – Independent vs. Dependent
  • 5 Other useful types of variables in research

Types of Variables in Research – In a Nutshell

  • A variable is an attribute of an item of analysis in research.
  • The types of variables in research can be categorized into: independent vs. dependent , or categorical vs. quantitative .
  • The types of variables in research (correlational) can be classified into predictor or outcome variables.
  • Other types of variables in research are confounding variables , latent variables , and composite variables.

Definition: Types of variables in research

A variable is a trait of an item of analysis in research. Types of variables in research are imperative, as they describe and measure places, people, ideas , or other research objects . There are many types of variables in research. Therefore, you must choose the right types of variables in research for your study.

Note that the correct variable will help with your research design , test selection, and result interpretation.

In a study testing whether some genders are more stress-tolerant than others, variables you can include are the level of stressors in the study setting, male and female subjects, and productivity levels in the presence of stressors.

Also, before choosing which types of variables in research to use, you should know how the various types work and the ideal statistical tests and result interpretations you will use for your study. The key is to determine the type of data the variable contains and the part of the experiment the variable represents.

Types of variables in research – Quantitative vs. Categorical

Data is the precise extent of a variable in statistical research that you record in a data sheet. It is generally divided into quantitative and categorical classes.

Quantitative or numerical data represents amounts, while categorical data represents collections or groupings.

The type of data contained in your variable will determine the types of variables in research. For instance, variables consisting of quantitative data are called quantitative variables, while those containing categorical data are called categorical variables. The section below explains these two types of variables in research better.

Quantitative variables

The scores you record when collecting quantitative data usually represent real values you can add, divide , subtract , or multiply . There are two types of quantitative variables: discrete variables and continuous variables .

The table below explains the elements that set apart discrete and continuous types of variables in research:

Discrete or integer variables Individual item counts or values • Number of employees in a company
• Number of students in a school district
Continuous or ratio variables Measurements of non-finite or continuous scores • Age
• Weight
• Volume
• Distance

Categorical variables

Categorical variables contain data representing groupings. Additionally, the data in categorical variables is sometimes recorded as numbers . However, the numbers represent categories instead of real amounts.

There are three categorical types of variables in research: nominal variables, ordinal variables , and binary variables . Here is a tabular summary.

Binary/dichotomous variables YES/NO outcomes • Win/lose in a game
• Pass/fail in an exam
Nominal variables No-rank groups or orders between groups • Colors
• Participant name
• Brand names
Ordinal variables Groups ranked in a particular order • Performance rankings in an exam
• Rating scales of survey responses

It is worth mentioning that some categorical variables can function as multiple types. For example, in some studies, you can use ordinal variables as quantitative variables if the scales are numerical and not discrete.

Data sheet of quantitative and categorical variables

A data sheet is where you record the data on the variables in your experiment.

In a study of the salt-tolerance levels of various plant species, you can record the data on salt addition and how the plant responds in your datasheet.

The key is to gather the information and draw a conclusion over a specific period and filling out a data sheet along the process.

Below is an example of a data sheet containing binary, nominal, continuous , and ordinal types of variables in research.

A 12 0 - - -
A 18 50 - - -
B 11 0 - - -
B 15 50 - - -
C 25 0 - - -
C 31 50 - - -

Types of variables in research – Independent vs. Dependent

types-of-variables-in-research-Dependent-independet-and-constant-variable

The purpose of experiments is to determine how the variables affect each other. As stated in our experiment above, the study aims to find out how the quantity of salt introduce in the water affects the plant’s growth and survival.

Therefore, the researcher manipulates the independent variables and measures the dependent variables . Additionally, you may have control variables that you hold constant.

The table below summarizes independent variables, dependent variables , and control variables .

Independent/ treatment variables The variables you manipulate to affect the experiment outcome The amount of salt added to the water
Dependent/ response variables The variable that represents the experiment outcomes The plant’s growth or survival
Control variables Variables held constant throughout the study Temperature or light in the experiment room

Data sheet of independent and dependent variables

In salt-tolerance research, there is one independent variable (salt amount) and three independent variables. All other variables are neither dependent nor independent.

Below is a data sheet based on our experiment:

Types of variables in correlational research

The types of variables in research may differ depending on the study.

In correlational research , dependent and independent variables do not apply because the study objective is not to determine the cause-and-effect link between variables.

However, in correlational research, one variable may precede the other, as illness leads to death, and not vice versa. In such an instance, the preceding variable, like illness, is the predictor variable, while the other one is the outcome variable.

Other useful types of variables in research

The key to conducting effective research is to define your types of variables as independent and dependent. Next, you must determine if they are categorical or numerical types of variables in research so you can choose the proper statistical tests for your study.

Below are other types of variables in research worth understanding.

Confounding variables Hides the actual impact of an alternative variable in your study Pot size and soil type
Latent variables Cannot be measured directly Salt tolerance
Composite variables Formed by combining multiple variables The health variables combined into a single health score

What is the definition for independent and dependent variables?

An autonomous or independent variable is the one you believe is the origin of the outcome, while the dependent variable is the one you believe affects the outcome of your study.

What are quantitative and categorical variables?

Knowing the types of variables in research that you can work with will help you choose the best statistical tests and result representation techniques. It will also help you with your study design.

Discrete and continuous variables: What is their difference?

Discrete variables are types of variables in research that represent counts, like the quantities of objects. In contrast, continuous variables are types of variables in research that represent measurable quantities like age, volume, and weight.

I am extremely impressed with the customer service, professionalism and the...

We use cookies on our website. Some of them are essential, while others help us to improve this website and your experience.

  • External Media

Individual Privacy Preferences

Cookie Details Privacy Policy Imprint

Here you will find an overview of all cookies used. You can give your consent to whole categories or display further information and select certain cookies.

Accept all Save

Essential cookies enable basic functions and are necessary for the proper function of the website.

Show Cookie Information Hide Cookie Information

Name
Anbieter Eigentümer dieser Website,
Zweck Speichert die Einstellungen der Besucher, die in der Cookie Box von Borlabs Cookie ausgewählt wurden.
Cookie Name borlabs-cookie
Cookie Laufzeit 1 Jahr
Name
Anbieter Bachelorprint
Zweck Erkennt das Herkunftsland und leitet zur entsprechenden Sprachversion um.
Datenschutzerklärung
Host(s) ip-api.com
Cookie Name georedirect
Cookie Laufzeit 1 Jahr
Name
Anbieter Playcanvas
Zweck Display our 3D product animations
Datenschutzerklärung
Host(s) playcanv.as, playcanvas.as, playcanvas.com
Cookie Laufzeit 1 Jahr

Statistics cookies collect information anonymously. This information helps us to understand how our visitors use our website.

Akzeptieren
Name
Anbieter Google Ireland Limited, Gordon House, Barrow Street, Dublin 4, Ireland
Zweck Cookie von Google zur Steuerung der erweiterten Script- und Ereignisbehandlung.
Datenschutzerklärung
Cookie Name _ga,_gat,_gid
Cookie Laufzeit 2 Jahre

Content from video platforms and social media platforms is blocked by default. If External Media cookies are accepted, access to those contents no longer requires manual consent.

Akzeptieren
Name
Anbieter Meta Platforms Ireland Limited, 4 Grand Canal Square, Dublin 2, Ireland
Zweck Wird verwendet, um Facebook-Inhalte zu entsperren.
Datenschutzerklärung
Host(s) .facebook.com
Akzeptieren
Name
Anbieter Google Ireland Limited, Gordon House, Barrow Street, Dublin 4, Ireland
Zweck Wird zum Entsperren von Google Maps-Inhalten verwendet.
Datenschutzerklärung
Host(s) .google.com
Cookie Name NID
Cookie Laufzeit 6 Monate
Akzeptieren
Name
Anbieter Meta Platforms Ireland Limited, 4 Grand Canal Square, Dublin 2, Ireland
Zweck Wird verwendet, um Instagram-Inhalte zu entsperren.
Datenschutzerklärung
Host(s) .instagram.com
Cookie Name pigeon_state
Cookie Laufzeit Sitzung
Akzeptieren
Name
Anbieter Openstreetmap Foundation, St John’s Innovation Centre, Cowley Road, Cambridge CB4 0WS, United Kingdom
Zweck Wird verwendet, um OpenStreetMap-Inhalte zu entsperren.
Datenschutzerklärung
Host(s) .openstreetmap.org
Cookie Name _osm_location, _osm_session, _osm_totp_token, _osm_welcome, _pk_id., _pk_ref., _pk_ses., qos_token
Cookie Laufzeit 1-10 Jahre
Akzeptieren
Name
Anbieter Twitter International Company, One Cumberland Place, Fenian Street, Dublin 2, D02 AX07, Ireland
Zweck Wird verwendet, um Twitter-Inhalte zu entsperren.
Datenschutzerklärung
Host(s) .twimg.com, .twitter.com
Cookie Name __widgetsettings, local_storage_support_test
Cookie Laufzeit Unbegrenzt
Akzeptieren
Name
Anbieter Vimeo Inc., 555 West 18th Street, New York, New York 10011, USA
Zweck Wird verwendet, um Vimeo-Inhalte zu entsperren.
Datenschutzerklärung
Host(s) player.vimeo.com
Cookie Name vuid
Cookie Laufzeit 2 Jahre
Akzeptieren
Name
Anbieter Google Ireland Limited, Gordon House, Barrow Street, Dublin 4, Ireland
Zweck Wird verwendet, um YouTube-Inhalte zu entsperren.
Datenschutzerklärung
Host(s) google.com
Cookie Name NID
Cookie Laufzeit 6 Monate

Privacy Policy Imprint

  • Search Close search
  • Find a journal
  • Search calls for papers
  • Journal Suggester
  • Open access publishing

We’re here to help

Find guidance on Author Services

Publication Cover

Open access

Quantifying potential selection bias in observational research: simulations and Analyses exploring religion and depression using a prospective UK cohort study (ALSPAC)

  • Cite this article
  • https://doi.org/10.1080/2153599X.2024.2377545

Introduction

  • Supplemental material

Acknowledgements

Disclosure statement, additional information.

  • Full Article
  • Figures & data
  • Supplemental
  • Reprints & Permissions
  • View PDF PDF View EPUB EPUB

A major problem in research is non-random participation, which can occur either at recruitment into, or loss to follow-up during, the study. In both cases the analytic sample differs from the target population, potentially resulting in selection bias and erroneous inferences. We perform simulations and analyses using data from ALSPAC (Avon Longitudinal Study of Parents and Children) to quantify the potential impact of selection bias. Our motivating example focuses on religion (the exposure) and depression (the outcome). ALSPAC was broadly representative of the target population of pregnant mothers during recruitment, but continued participation after 30 years is non-random, including by factors such as religion and mental health. Despite non-random participation by the exposure and outcome (i.e., collider stratification selection bias), given plausible selection scenarios our simulations found little bias across all models with broadly nominal (i.e., 95%) confidence interval coverage. Analyses comparing the full cohort (i.e., the target population) against a highly-selected sample of participants still participating after 30 years also suggested minimal bias. Given the strengths and patterns of selection explored in this study, the resulting bias could be negligible. However, this cannot be assumed to hold for all studies and must be explored on a case-by-case basis.

  • simulations
  • selection bias

Figure 1. Example Directed Acyclic Graphs (DAGs) demonstrating type 1/collider stratification selection bias (left) and type 2/effect modifier selection bias (right). Notation: X = Exposure; Y = Outcome; S = Selection; W = Effect measure modifier (note that, as DAGs are non-parametric, the interaction between X and W on Y is not explicitly stated in the right-hand DAG). To help fix ideas, say that our exposure X is religiosity, the outcome Y is depression, the effect modifier W is socioeconomic background (with the religiosity-depression association differing by levels of socioeconomic background in the right-hand DAG; we assume no effect modification in the left-hand DAG), while selection S is recruitment into a study. In the left-hand DAG (collider stratification selection bias) both the exposure and outcome cause selection (say, more religious individuals and individuals without depression are more likely to take part in the study); given that analyses are conditional upon selection into the study, this will result in collider bias and bias the exposure–outcome relationship. In the right-hand DAG (effect modifier selection bias) only the effect modifier W causes selection (say, those from higher socioeconomic backgrounds are more likely to take part); as the analyses are biased toward one level of W, and as W is an effect measure modifier of the exposure–outcome relationship (say, religiosity reduces rates of depression only in individuals from lower, but not higher, socioeconomic backgrounds), the resulting exposure–outcome association will also be biased.

Figure 1. Example Directed Acyclic Graphs (DAGs) demonstrating type 1/collider stratification selection bias (left) and type 2/effect modifier selection bias (right). Notation: X = Exposure; Y = Outcome; S = Selection; W = Effect measure modifier (note that, as DAGs are non-parametric, the interaction between X and W on Y is not explicitly stated in the right-hand DAG). To help fix ideas, say that our exposure X is religiosity, the outcome Y is depression, the effect modifier W is socioeconomic background (with the religiosity-depression association differing by levels of socioeconomic background in the right-hand DAG; we assume no effect modification in the left-hand DAG), while selection S is recruitment into a study. In the left-hand DAG (collider stratification selection bias) both the exposure and outcome cause selection (say, more religious individuals and individuals without depression are more likely to take part in the study); given that analyses are conditional upon selection into the study, this will result in collider bias and bias the exposure–outcome relationship. In the right-hand DAG (effect modifier selection bias) only the effect modifier W causes selection (say, those from higher socioeconomic backgrounds are more likely to take part); as the analyses are biased toward one level of W, and as W is an effect measure modifier of the exposure–outcome relationship (say, religiosity reduces rates of depression only in individuals from lower, but not higher, socioeconomic backgrounds), the resulting exposure–outcome association will also be biased.

In the current study (ALSPAC; the Avon Longitudinal Study of Parents and Children), the original pregnancy cohort was broadly representative of the target population of pregnant women in 1990–1992 around Bristol (UK) (Fraser et al., Citation 2013 ; Northstone et al., Citation 2019 ). However, continued participation in the study after 30 years is known to be non-random (Fernández-Sanlés et al., Citation 2021 ). Selection bias has been explored in ALSPAC previously, with factors such as younger maternal age at birth, other than white ethnicity, being male, lower socioeconomic position, and mental health issues (among others) being associated with lower rates of continued participation (Boyd et al., Citation 2013 ; Cornish et al., Citation 2021 ; Fernández-Sanlés et al., Citation 2021 ; Fraser et al., Citation 2013 ).

In this study we focus specifically on religion as an exposure and depression as an outcome, but the concepts and principles are universal and apply to all observational research topics. We have previously shown that there is the potential for collider stratification selection bias within the ALSPAC religious/spiritual belief and behaviors (RSBB) data, with attendance at a place of worship associated with continued ALSPAC participation, independent of various sociodemographic confounders (Morgan et al., Citation 2022 ). Whilst this paper suggested that selection may result in bias when RSBB is the exposure or outcome in a given analysis, it did not attempt to quantify the direction or magnitude to which results may be biased. In the current paper we therefore aimed to quantify the extent to which selection may bias results, using a motivating example with RSBB measured during pregnancy as our exposure and depression measured 8 weeks post-natally as our outcome. We chose this as our motivating example as the effect of RSBB on mental health is a key topic in religious research, with numerous studies suggesting that religion is a protective risk factor against depression (Koenig et al., Citation 2012 ; VanderWeele et al., Citation 2016 ). Additionally, within the ALSPAC cohort—and likely more widely—depression and depressive symptoms are associated with selection and lower rates of study participation (Cornish et al., Citation 2021 ; Fernández-Sanlés et al., Citation 2021 ; Taylor et al., Citation 2018 ). As both RSBB and depression are known to relate to continued study participation, the risk of collider stratification selection bias when exploring this topic may be high. This research will help inform future work in this area regarding the extent to which selection may—or may not—bias inferences. Note that the aim of this paper is to quantify levels of potential collider stratification selection bias given realistic parameter values, not to explore methods to overcome or correct for potential selection bias (such as multiple imputation or weighting methods; for more information on these approaches, see Seaman & White, Citation 2013 ; Van Buuren, Citation 2018 ; White et al., Citation 2011 ). We also focus on the impact of selection bias specifically, and throughout this paper assume that there is no unmeasured confounding or measurement error (Hernan & Robins, Citation 2020 ). Finally, in this paper we focus on potential mismatch between the sample and one specific target population, and do not consider the generalizability or transportability of causal effects to other target populations (Bareinboim & Pearl, Citation 2013 ; Dahabreh & Hernán, Citation 2019 ).

To quantify these selection effects, we first used a series of simulations to estimate the effect of RSBB on depression under a range of selection scenarios (focusing on collider stratification selection bias). These simulations were informed by data from ALSPAC and external sources (for a similar approach, see Millard et al., Citation 2023 ); these simulations were largely based on data collected early in ALSPAC with little missing data in each variable, which we assume to be largely unaffected by selection and therefore broadly representative of the target population (that is, pregnant mothers in the Bristol area of the UK in the early 1990s). We then varied the strength and patterning of selection (i.e., continued participation) by the exposure and outcome—again using plausible parameters—to investigate whether selection resulted in biased estimates. Our second method of exploring selection bias involved analyses comparing different subsets of the ALSPAC data. As data from early in ALSPAC can be assumed to be broadly representative of the target population (i.e., no selection bias), we can therefore compare the results from this target population against selected subsets of participants who continued participating in ALSPAC to assess the extent to which they may be biased by selection (for a similar approach, see Howe et al., Citation 2013 ). While this scenario is specific to ALSPAC, meaning the impact of selection bias is unlikely to be identical to other studies, the risk of selection bias is common in most research settings. In addition to describing this in ALSPAC, other research can use the methods presented here to quantify and explore potential selection bias in future work.

An analysis plan for all analyses reported in this paper was pre-registered on the Open Science Foundation (OSF) website ( https://osf.io/9bfzt/ ). The analysis plan was followed as specified, with some additions and exploratory analyses as detailed in Supplementary Section S1.

Participants

Pregnant women resident in Bristol and surrounding areas in the UK with expected delivery dates between 1 April 1991 and 31 December 1992 were invited to take part in ALSPAC. The initial number of pregnancies enrolled was 14,541, of which there were a total of 14,676 foetuses, resulting in 14,062 live births and 13,988 children who were alive at 1 year of age (Boyd et al., Citation 2013 ; Fraser et al., Citation 2013 ; Major-Smith et al., Citation 2022 ; Northstone et al., Citation 2019 ). After removing pregnancies that did not result in a live birth (most being early miscarriages), removing one pregnancy if the mother had two pregnancies enrolled in ALSPAC, excluding mothers known to have died since the study began, and dropping observations for participants who had withdrawn consent for their data to be used, there were a total of 13,807 G0 (Generation-0) mothers.

Please note that the study website contains details of all the data that is available through a fully searchable data dictionary and variable search tool: http://www.bristol.ac.uk/alspac/researchers/our-data/ . From 2014 onwards, study data were collected and managed using REDCap electronic data capture tools hosted at the University of Bristol (Harris et al., Citation 2009 ).

Exposure variables

The primary goal of this paper was to simulate ALSPAC-style data to explore the impact of collider stratification selection bias, and therefore only a relatively small number of variables from the actual ALSPAC database were used here to illustrate these methods. For the variables used in the data-generating mechanism of this simulation, see the analysis section. The secondary aim of this paper was to compare regression models between groups of participants who have complete data for the exposure, outcome, and confounders (i.e., the baseline timepoint, which we assume to broadly represent the target population) with a model using just those who completed a recent questionnaire in 2020 (i.e., a highly-selected sub-sample of participants; approximately one-third of participants completed this questionnaire, meaning there is a large potential for selection bias). Note that, although some baseline variables had missing data, the proportion of missing data in these variables was small (<15% missing; e.g., of the potential 13,807 G0 mothers, 11,892 had data regarding religious beliefs and behaviors [14% missing]); throughout this paper we assume that these small levels of missing data do not result in meaningful levels of selection bias in our near-complete baseline sample.

For the simulation study, our exposure was a binary version of the religious attendance variable (regular attendance at a place of worship; yes/no). For the secondary analysis using actual ALSPAC data, we again used this binary religious attendance variable, in addition to a binary version of the religious belief variable (belief in God/a divine power; yes/no). Both RSBB variables were measured during pregnancy. Religious belief was originally coded as (Yes/Not Sure/No) and “not sure” was coded as no for the binary version. Religious attendance was originally coded as (Minimum Once a Week/ Minimum Once a Month/ Minimum Once a Year/ Not at All), for the binary version “Minimum Once a Year” and “Not at All” were coded as non-regular attendance, whereas “Minimum Once a Week” and “Minimum Once a Month” were coded as regular attendance. These variables were dichotomized as this greatly simplifies the simulation process and interpretation, and many previous studies have used binary versions of these variables, facilitating comparisons between studies (e.g., Li et al., Citation 2016 ).

Outcome measures

For both analyses, the outcome used was the Edinburgh Post-Natal Depression Scale (EPDS), a continuous measure of depression based on 10 questions which are summed to score from 0–30 (with higher scores indicating greater depressive symptoms). We also used a binary version of the variable with 10 or greater corresponding to the presence of probable depression and lower than 10 meaning no depression, this was in concordance with the recommended clinical use of the original scale (Cox et al., Citation 1987 ). This was measured 8 weeks after birth.

Confounder variables

As this paper is primarily to illustrate methods of quantifying potential selection bias, we are not aiming to estimate the true causal estimate on the effect of religious belief and attendance on depression, but rather to see how much results may be biased by selection. For this reason, in both the simulation and ALSPAC subset analyses, we have decided to focus on a small number of key sociodemographic factors that we believe could have a confounding effect on the relationship between the exposure and outcome. These are: mother’s age at birth (continuous; years), ethnicity (binary; white vs other-than-white), marital status (binary; not currently married vs married), and highest education level achieved (binary; O-level [qualifications at age 16 years] or lower vs A-level [qualifications at age 18 years] or higher). These were all measured during pregnancy.

Primary analysis—simulations

For the primary aim of this study, to simulate ALSPAC-style data to quantify the potential selection bias when using ALSPAC’s RSBB data, we used the ADEMP (Aims, Data generating mechanism, Estimands, Methods, and Performance measures) guidelines (Morris et al., Citation 2019 ). These steps are detailed below:

We aimed to assess the bias caused by selection that may arise when estimating the association between RSBB and depression under a range of selection scenarios. We used ALSPAC and external data to inform plausible parameters.

Data generating mechanism

Figure 2. This Directed Acyclic Graph (DAG) represents our assumed causal relations between the variables considered here, and which we will use as the data-generating mechanism for our simulation study and for our choice of confounders in the secondary ALSPAC subset analyses. While it attempts to represent reality to some extent, it is overly-simplified and overlooks many additional complexities present in the real-world (other sources of unmeasured confounding or selection, reverse causality between RSBB and depression, etc.). For the simulation study, we are assuming that all variables are fully-observed, other than depression, which represents selection due to continued participation. The dashed arrows from the exposure (RSBB; religious/spiritual beliefs and behaviors) and the outcome (Depression) to Participation (missingness) indicate that whether these cause selection, and the strength of selection, will be varied in the simulation study. The red arrow between RSBB and Depression indicates the causal effect estimate of interest (although note that in some scenarios the true exposure-outcome association is null). SEP = Socioeconomic position.

Figure 2. This Directed Acyclic Graph (DAG) represents our assumed causal relations between the variables considered here, and which we will use as the data-generating mechanism for our simulation study and for our choice of confounders in the secondary ALSPAC subset analyses. While it attempts to represent reality to some extent, it is overly-simplified and overlooks many additional complexities present in the real-world (other sources of unmeasured confounding or selection, reverse causality between RSBB and depression, etc.). For the simulation study, we are assuming that all variables are fully-observed, other than depression, which represents selection due to continued participation. The dashed arrows from the exposure (RSBB; religious/spiritual beliefs and behaviors) and the outcome (Depression) to Participation (missingness) indicate that whether these cause selection, and the strength of selection, will be varied in the simulation study. The red arrow between RSBB and Depression indicates the causal effect estimate of interest (although note that in some scenarios the true exposure-outcome association is null). SEP = Socioeconomic position.

Table 1. Summary of the selection scenarios explored in the simulation study and whether they are expected to result in bias (see text for explanation, and Supplementary Section S2 for full details on the parameter values used in the simulations).

Whether non-random selection results in collider stratification selection bias depends on a number of factors. Given the data-generating mechanism assumed in the Figure 2 DAG, if the outcome depression is binary, then bias will only occur when both the exposure and outcome are associated with selection (Bartlett et al., Citation 2015 ; Hughes et al., Citation 2019 ); therefore, in models with a binary outcome, when adjusting for all confounders the RSBB-depression association ought to be unbiased if only the exposure, only the outcome, or neither, cause selection. If the outcome depression is continuous, then bias will still occur when both exposure and outcome are associated with selection, but will also occur when just the outcome is associated with selection (Hughes et al., Citation 2019 ); in the latter scenario—where only the outcome causes selection—bias will be toward the null, meaning that this will only result in bias when the exposure-outcome association is non-null. Therefore, in the models explored here with a continuous outcome, when adjusting for all confounders the RSBB-depression association ought to be unbiased if neither the exposure or outcome causes selection, if only the exposure causes selection, or if only the outcome causes selection and RSBB does not cause depression (as noted above, these expectations will differ under the presence of effect modifier selection bias, which is not part of our simulations).

A list of the simulated variables, the equations used to generate them and an explanation containing the values used in the simulation is available in the Supplementary Information (Section S2).

The estimands of interest are the log-odds estimate (if binary outcome) and the mean difference coefficient (if continuous outcome) of depression of those with high RSBB, relative to those with low RSBB. As these are estimated from relatively simple regression models (see “Methods” below) without time-varying covariates, these estimands based on regression coefficients represent the marginal effect/average treatment effect of the exposure on the outcome without the need for more complex causal inference methods (e.g., g -methods; Hernan & Robins, Citation 2020 (see chapter 15)).

Analyses were repeated for each of the selection scenarios outlined in Table 1 , and in both unadjusted and adjusted models. Where the depression outcome is binary we will use logistic regression, and where the depression outcome is continuous we will use linear regression.

Performance measures

For each method detailed above, we estimated both bias (i.e., how much the effect estimate differs from the true RSBB-depression value; either on the log-odds or mean difference scale) and coverage (i.e., the proportion of simulations where the 95% confidence intervals include the true value). These performance measures were calculated over 1,000 simulation iterations, along with Monte Carlo standard errors and intervals (Morris et al., Citation 2019 ).

Secondary analysis—ALSPAC subset analysis

As a secondary analysis we used actual ALSPAC data to estimate the extent of selection bias using a different approach. We used the same RSBB-depression research question using linear and logistic regressions as in the simulated analyses, with religious belief and attendance as our binary exposures, depression (either continuous and binary) as our outcome, and with the same confounding variables. As this data was collected at baseline in ALSPAC, the amount of missing data is relatively low (as discussed above), meaning that the analytic sample is likely to be broadly representative of the target population and that there is less chance of selection bias. We compared results from this model with a second model restricting participants to those who returned the recent “Y” questionnaire, completed in 2020. This data collection suffers from considerable loss to follow-up, with approximately only one-third of participants returning a completed questionnaire, and is therefore likely to exhibit the greatest amount of selection bias in ALSPAC data. Comparing these two timepoints will inform us as to whether selection bias may impact our results (Howe et al., Citation 2013 ); note that as we are not modeling the type of selection bias in this scenario, collider stratification selection bias, effect modifier selection bias, or a combination of the two, could result in bias here.

Primary analysis—simulation study

Binary depression.

Figure 3. Forest plots showing the bias and coverage for the missingness models where depression was binary and RSBB did not cause depression. In these models we adjusted for confounders. For more information on each missingness model see Table 1 and Supplementary Section S2. RSBB = Religious/Spiritual Beliefs and Behaviors.

Figure 3. Forest plots showing the bias and coverage for the missingness models where depression was binary and RSBB did not cause depression. In these models we adjusted for confounders. For more information on each missingness model see Table 1 and Supplementary Section S2. RSBB = Religious/Spiritual Beliefs and Behaviors.

Figure 4. Forest plots showing the bias and coverage for the missingness models where depression was binary and RSBB caused depression. In these models we adjusted for confounders. For more information on each missingness model see Table 1 and Supplementary Section S2. RSBB = Religious/Spiritual Beliefs and Behaviors.

Figure 4. Forest plots showing the bias and coverage for the missingness models where depression was binary and RSBB caused depression. In these models we adjusted for confounders. For more information on each missingness model see Table 1 and Supplementary Section S2. RSBB = Religious/Spiritual Beliefs and Behaviors.

Continuous depression

Figure 5. Forest plots showing the bias and coverage for the missingness models where depression was continuous and RSBB did not cause depression. In these models we adjusted for confounders. For more information on each missingness model see Table 1 and Supplementary Section S2. RSBB = Religious/Spiritual Beliefs and Behaviors.

Figure 5. Forest plots showing the bias and coverage for the missingness models where depression was continuous and RSBB did not cause depression. In these models we adjusted for confounders. For more information on each missingness model see Table 1 and Supplementary Section S2. RSBB = Religious/Spiritual Beliefs and Behaviors.

Figure 6. Forest plots showing the bias and coverage for the missingness models where depression was continuous and RSBB caused depression. In these models we adjusted for confounders. For more information on each missingness model see Table 1 and Supplementary Section S2. RSBB = Religious/Spiritual Beliefs and Behavior.

Figure 6. Forest plots showing the bias and coverage for the missingness models where depression was continuous and RSBB caused depression. In these models we adjusted for confounders. For more information on each missingness model see Table 1 and Supplementary Section S2. RSBB = Religious/Spiritual Beliefs and Behavior.

Table 2. Table of the output of adjusted regression models for showing different religious/spiritual belief and behavior (RSBB) exposures, outcome types (continuous vs binary depression), and selection scenarios (full sample vs only those who completed the recent 2020 questionnaire).

In a simulation study using data from a UK-based longitudinal cohort we have attempted to quantify the magnitude of potential collider stratification selection bias in analyses under plausible selection scenarios. Despite both our exposure (RSBB) and outcome (depression) causing selection (Cornish et al., Citation 2021 ; Morgan et al., Citation 2022 ; Taylor et al., Citation 2018 ), our analyses have demonstrated that the extent of the selection bias that could be expected is surprisingly negligible. Simulations using missingness models based on different collider stratification selection scenarios that could occur within the data showed that the bias in the point estimate that could be expected across all scenarios was minimal and unlikely to meaningfully impact any associations between variables in analyses or subsequent conclusions. Regardless of the selection mechanism, these simulations also had approximately nominal 95% coverage, meaning that the true value would be included in the 95% confidence interval approximately 95% of the time. This occurred even in the scenarios with the strongest selection mechanism; for instance, with a binary depression outcome, if individuals higher in RSBB have approximately 50% greater odds of continued participation and individuals with depression have approximately 20% lower odds of continued participation, bias was minimal (a 2%-point difference in the odds ratios) and coverage was still around 95%. In the secondary analysis we compared regression models from the (nearly) full sample assumed to reflect the target population ( n ∼10,000) to the highly-selected sample of those who completed a recent questionnaire almost 30-years after the start of the study ( n ∼4,000) to explore how selection bias could impact results. However, for all analyses the effect sizes were broadly similar and 95% confidence intervals overlapping.

In this dataset, we therefore found that there is unlikely to be much bias in point estimates due to differences in characteristics between the target population and the analytic sample being studied. This finding was consistent across both linear and logistic regression simulation models, as well as the secondary analysis comparing of the (approximate) target population to a highly-selected subsample of ALSPAC participants. Together, these results suggest that—in the selection scenarios explored here—selection bias may introduce relatively little bias into results and subsequent conclusions. Together with previous papers exploring differences in ALSPAC participation by RSBB, depression, and other factors, these papers have highlighted that, whilst populations within cohort studies may change over time due to drop out and characteristics associated with continued participation may be skewed over time (Fernández-Sanlés et al., Citation 2021 ; Hernan & Robins, Citation 2020 ; Morgan et al., Citation 2022 ), this does not always equate to bias when conducting analyses. This is useful (and potentially encouraging) information for any researchers using these ALSPAC data in the future, as if the levels of selection are comparable to those modeled here, there is unlikely to be a substantial amount of selection bias that would impact results.

We must caution, however, that this conclusion only holds for the levels of collider stratification selection modeled in this study; different studies with different variables and different levels of missingness within ALSPAC, will likely have different—and perhaps stronger—selection mechanisms, which may result in increased levels of selection bias compared to those observed here. As such, we emphatically state that these results do not show that selection bias can be safely ignored, but instead that potential selection bias needs to be explored and assessed on a case-by-case basis. Hopefully this paper can provide a useful example on how simulation studies can be used to explore the magnitude and extent of potential selection bias in other studies. We also note that, even if selection does not result in bias, methods to account for missing data—such as multiple imputation and weighting methods (Seaman & White, Citation 2013 ; Van Buuren, Citation 2018 ; White et al., Citation 2011 )—may still be useful to help improve the efficiency of analyses (i.e., smaller standard errors/confidence intervals).

As we have stressed throughout this paper, our focus has primarily been on collider stratification selection bias, rather than effect modifier selection bias ( Figure 1 ). This is largely because in our simulations we focused on main effects, rather than interactions/effect modification (meaning that effect modifier selection bias cannot occur by design). Nonetheless, differences between the target population and analytic sample can occur due to effect modifier selection bias which needs to be considered in studies (Hernán, Citation 2017 ; Lu et al., Citation 2022 ). We give a brief demonstration of effect modifier selection bias in Supplementary Section S3, highlighting when it can occur and how it can lead to bias. In post-hoc exploratory analyses, we also tested for effect modification in the full ALSPAC sample (i.e., our presumed target population) between our RSBB exposures and each of the confounding variables (age, ethnicity, education, and marital status) on the depression outcome; we found little evidence for effect modification, suggesting that effect modifier selection bias would be unlikely to impact our results (at least for the effect modifiers explored here; Table S2). Indeed, the results of our second study, which could be biased by collider and/or effect modifier selection bias, suggest that neither type of selection bias is likely to bias these results. While we fully acknowledge this to be a post-hoc rationalization, this lack of effect modification provides further justification for focusing on collider stratification selection bias in our formal simulation analyses. Nonetheless, researchers need to carefully consider all forms of selection bias; although the simulation-based methods here were focused on collider stratification selection bias, they could also be applied more broadly to effect modifier selection bias in future studies.

There are several key strengths to this study. First, the use of many simulated scenarios provides an opportunity and perspective that would otherwise not be possible with real data. We can adjust and manipulate the selection scenarios in ways that are inconceivable using just observed data to understand how the data behave under different data-generating mechanisms. Another key strength is the use of 36 different missingness models that cover a wide range of potential scenarios for selection and specifying whether we expect to see bias within each model. These models are based on realistic values derived the observed data and previous analyses (Li et al., Citation 2016 ; Morgan et al., Citation 2022 ), providing some realism that these modeled effect sizes are appropriate. A final strength of this paper is the use of ALSPAC in the secondary analysis as a geographically representative population with good completeness of data early in the study; this broadly-representative and near-complete sample at baseline is necessary to both rule out a large degree of selection bias occurring on entry into the study, and also to minimize potential selection bias in the full-sample analyses.

However, there are some key limitations to this paper and the approaches used. This paper simulates data from ALSPAC which as mentioned is a strength as it is an almost perfect test case for selection bias as the baseline sample has little missing data and is broadly representative of the target population. This, however, makes the methods applied in this paper less generalizable for other studies as there may be more selection upon enrollment into the study in other cohorts and/or less complete baseline data. In these situations, selection is much more difficult to both simulate and/or control for as we inherently would not have any data on those who refused to join or with missing baseline data. If initial participation in the study was non-random, then this could result in selection bias and misleading causal estimates and would be very difficult to account for due to a lack of additional information on these individuals (Munafò et al., Citation 2018 ). An example of this issue is with UK Biobank, where over 9 million individuals were invited but only around 5% (∼500,000) participated in baseline assessments. This resulted in the participants differing from both non-participants and the general population by being older, more likely to be female and more likely to live in less deprived areas (Fry et al., Citation 2017 ). There have been studies that investigated the potential bias due to volunteering (volunteer bias) in UK Biobank, suggesting that there may be substantial bias in associations due to non-random selection into UK Biobank (Van Alten et al., Citation 2024 ). Methods such as those adopted in the present study to try and quantify potential selection bias are much more difficult to apply in situations such as these as substantial degree of selection may have already occurred.

We have shown, using both simulated and observational data from a large prospective UK cohort, that even if an exposure (here, RSBB) and outcome (here, depression) are associated with participation under realistic parameters, this does not necessarily result in substantial selection bias and erroneous conclusions. Whilst future research using ALSPAC data may be more confident in conclusions drawn from this data despite potential non-random participation and selection bias, we again stress that these results only apply to the effect sizes and levels of missingness studied here, and cannot be assumed to hold for stronger selection pressures or in other studies with different sources of selection. By emphasizing how differences between the sample and target population have the potential to result in selection bias—even if in practice such bias may be rather minimal!—we hope that this paper encourages researchers in the study of religion to increasingly consider these issues and the assumptions made in future research.

Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. Informed consent for the use of data collected via questionnaires and clinics was obtained from participants following the recommendations of the ALSPAC Ethics and Law Committee at the time.

Supplementary Material

We are extremely grateful to all the families who took part in this study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists and nurses.

Data and code availability

ALSPAC data access is through a system of managed open access. Information about access to ALSPAC data is given on the ALSPAC website ( http://www.bristol.ac.uk/alspac/researchers/access/ ) and in the ALSPAC data management plan ( http://www.bristol.ac.uk/alspac/researchers/data-access/documents/alspac-data-management-plan.pdf ). Data used for this submission will be made available on request to the Executive ([email protected]). The datasets presented in this article are linked to ALSPAC project number B3906, please quote this project number during your application. Complete code used in the analyses for this paper is available on the OSF here: https://osf.io/9bfzt/ .

No potential conflict of interest was reported by the author(s).

  • Bareinboim, E., & Pearl, J. (2013). A general algorithm for deciding transportability of experimental results. Journal of Causal Inference , 1 ( 1 ), 107–134. https://doi.org/10.1515/jci-2012-0004   Google Scholar
  • Bartlett, J. W., Harel, O., & Carpenter, J. R. (2015). Asymptotically unbiased estimation of exposure odds ratios in complete records logistic regression. American Journal of Epidemiology , 182 ( 8 ), 730–736. https://doi.org/10.1093/aje/kwv114   PubMed Web of Science ® Google Scholar
  • Boyd, A., Golding, J., Macleod, J., Lawlor, D. A., Fraser, A., Henderson, J., Molloy, L., Ness, A., Ring, S., & Davey Smith, G. (2013). Cohort profile: The “children of the 90s”—the index offspring of the Avon longitudinal study of parents and children. International Journal of Epidemiology , 42 ( 1 ), 111–127. https://doi.org/10.1093/ije/dys064   PubMed Web of Science ® Google Scholar
  • Cornish, R. P., Macleod, J., Boyd, A., & Tilling, K. (2021). Factors associated with participation over time in the Avon longitudinal study of parents and children: A study using linked education and primary care data. International Journal of Epidemiology , 50 ( 1 ), 293–302. https://doi.org/10.1093/ije/dyaa192   PubMed Web of Science ® Google Scholar
  • Cox, J. L., Holden, J. M., & Sagovsky, R. (1987). Detection of postnatal depression: Development of the 10-item Edinburgh postnatal depression scale. British Journal of Psychiatry , 150 ( 6 ), 782–786. https://doi.org/10.1192/bjp.150.6.782   PubMed Web of Science ® Google Scholar
  • Dahabreh, I. J., & Hernán, M. A. (2019). Extending inferences from a randomized trial to a target population. European Journal of Epidemiology , 34 ( 8 ), 719–722. https://doi.org/10.1007/s10654-019-00533-2   PubMed Web of Science ® Google Scholar
  • Fernández-Sanlés, A., Smith, D., Clayton, G. L., Northstone, K., Carter, A. R., Millard, L. A., Borges, M. C., Timpson, N. J., Tilling, K., Griffith, G. J., & Lawlor, D. A. (2021). Bias from questionnaire invitation and response in COVID-19 research: An example using ALSPAC. Wellcome Open Research , 6 , 184. https://doi.org/10.12688/wellcomeopenres.17041.1   PubMed Google Scholar
  • Fry, A., Littlejohns, T. J., Sudlow, C., Doherty, N., Adamska, L., Sprosen, T., Collins, R., & Allen, N. E. (2017). Cohort profile: The Avon longitudinal study of parents and children: ALSPAC mothers cohort. International Journal of Epidemiology , 42 ( 1 ), 97–110. https://doi.org/10.1093/ije/dys066   Google Scholar
  • Fraser, A., Macdonald-Wallis, C., Tilling, K., Boyd, A., Golding, J., Davey Smith, G., Henderson, J., Macleod, J., Molloy, L., Ness, A., Ring, S., Nelson, S. M., & Lawlor, D. A. (2013). Comparison of sociodemographic and health-related characteristics of UK biobank participants with those of the general population. American Journal of Epidemiology , 186 ( 9 ), 1026–1034. https://doi.org/10.1093/aje/kwx246   Google Scholar
  • Harris, P. A., Taylor, R., Thielke, R., Payne, J., Gonzalez, N., & Conde, J. G. (2009). Research electronic data capture (REDCap)—A metadata-driven methodology and workflow process for providing translational research informatics support. Journal of Biomedical Informatics , 42 ( 2 ), 377–381. https://doi.org/10.1016/j.jbi.2008.08.010   PubMed Web of Science ® Google Scholar
  • Hernán, M. A. (2017). Invited commentary: Selection bias without colliders. American Journal of Epidemiology , 185 ( 11 ), 1048–1050. https://doi.org/10.1093/aje/kwx077   PubMed Web of Science ® Google Scholar
  • Hernan, M. A., & Robins, J. M. (2020). Causal inference . CRC : Taylor & Francis [distributor].   Google Scholar
  • Howe, L. D., Tilling, K., Galobardes, B., & Lawlor, D. A. (2013). (2013). Loss to follow-up in cohort studies: Bias in estimates of socioeconomic inequalities. Epidemiology , 24 ( 1 ), 1–9. https://doi.org/10.1097/EDE.0b013e31827623b1   PubMed Web of Science ® Google Scholar
  • Hughes, R. A., Heron, J., Sterne, J. A. C., & Tilling, K. (2019). Accounting for missing data in statistical analyses: Multiple imputation is not always the answer. International Journal of Epidemiology , 48 ( 4 ), 1294–1304. https://doi.org/10.1093/ije/dyz032   PubMed Web of Science ® Google Scholar
  • Koenig, H. G., King, D. E., & Carson, V. B. (2012). Handbook of religion and health (2nd ed.). Oxford University Press.   Google Scholar
  • Li, S., Okereke, O. I., Chang, S.-C., Kawachi, I., & VanderWeele, T. J. (2016). Religious service attendance and lower depression among women—a prospective cohort study. Annals of Behavioral Medicine , 50 ( 6 ), 876–884. https://doi.org/10.1007/s12160-016-9813-9   PubMed Web of Science ® Google Scholar
  • Lu, H., Cole, S. R., Howe, C. J., & Westreich, D. (2022). Toward a clearer definition of selection bias when estimating causal effects. Epidemiology , 33 ( 5 ), 699–706. https://doi.org/10.1097/EDE.0000000000001516   PubMed Web of Science ® Google Scholar
  • Major-Smith, D., Heron, J., Fraser, A., Lawlor, D. A., Golding, J., & Northstone, K. (2022). The Avon longitudinal study of parents and children (ALSPAC): a 2022 update on the enrolled sample of mothers and the associated baseline data. Wellcome Open Research , 7 , 283. https://doi.org/10.12688/wellcomeopenres.18564.1   PubMed Google Scholar
  • Millard, L. A. C., Fernández-Sanlés, A., Carter, A. R., Hughes, R. A., Tilling, K., Morris, T. P., Major-Smith, D., Griffith, G. J., Clayton, G. L., Kawabata, E., Davey Smith, G., Lawlor, D. A., & Borges, M. C. (2023). Exploring the impact of selection bias in observational studies of COVID-19: A simulation study. International Journal of Epidemiology , 52 ( 1 ), 44–57. https://doi.org/10.1093/ije/dyac221   PubMed Web of Science ® Google Scholar
  • Morgan, J., Halstead, I., Northstone, K., & Major-Smith, D. (2022). Religious/spiritual beliefs and behaviours and study participation in a prospective cohort study (ALSPAC) in Southwest England. Wellcome Open Research , 7 , 186. https://doi.org/10.12688/wellcomeopenres.17975.1   PubMed Google Scholar
  • Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods. Statistics in Medicine , 38 ( 11 ), 2074–2102. https://doi.org/10.1002/sim.8086   PubMed Web of Science ® Google Scholar
  • Munafò, M. R., Tilling, K., Taylor, A. E., Evans, D. M., & Davey Smith, G. (2018). Collider scope: When selection bias can substantially influence observed associations. International Journal of Epidemiology , 47 ( 1 ), 226–235. https://doi.org/10.1093/ije/dyx206   PubMed Web of Science ® Google Scholar
  • Northstone, K., Lewcock, M., Groom, A., Boyd, A., Macleod, J., Timpson, N., & Wells, N. (2019). The Avon longitudinal study of parents and children (ALSPAC): an update on the enrolled sample of index children in 2019. Wellcome Open Research , 4 , 51. https://doi.org/10.12688/wellcomeopenres.15132.1   PubMed Google Scholar
  • Seaman, S. R., & White, I. R. (2013). Review of inverse probability weighting for dealing with missing data. Statistical Methods in Medical Research , 22 ( 3 ), 278–295. https://doi.org/10.1177/0962280210395740   PubMed Web of Science ® Google Scholar
  • Taylor, A. E., Jones, H. J., Sallis, H., Euesden, J., Stergiakouli, E., Davies, N. M., Zammit, S., Lawlor, D. A., Munafò, M. R., Davey Smith, G., & Tilling, K. (2018). Exploring the association of genetic factors with participation in the Avon longitudinal study of parents and children. International Journal of Epidemiology , 47 ( 4 ), 1207–1216. https://doi.org/10.1093/ije/dyy060   PubMed Web of Science ® Google Scholar
  • Van Alten, S., Domingue, B. W., Faul, J., Galama, T., & Marees, A. T. (2024). Reweighting UK Biobank corrects for pervasive selection bias due to volunteering. International Journal of Epidemiology , 53 (3), dyae054. https://doi.org/10.1093/ije/dyae054   Web of Science ® Google Scholar
  • Van Buuren, S. (2018). Flexible imputation of missing data, second edition (2nd ed.). CRC Press, [2019] |: Chapman and Hall/CRC. https://doi.org/10.1201/9780429492259.   Google Scholar
  • VanderWeele, T. J., Jackson, J. W., & Li, S. (2016). Causal inference and longitudinal data: A case study of religion and mental health. Social Psychiatry and Psychiatric Epidemiology , 51 ( 11 ), 1457–1466. https://doi.org/10.1007/s00127-016-1281-9   PubMed Web of Science ® Google Scholar
  • White, I. R., Royston, P., & Wood, A. M. (2011). Multiple imputation using chained equations: Issues and guidance for practice. Statistics in Medicine , 30 ( 4 ), 377–399. https://doi.org/10.1002/sim.4067   PubMed Web of Science ® Google Scholar
  • Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations. Articles with the Crossref icon will open in a new tab.

  • People also read
  • Recommended articles

To cite this article:

Download citation, your download is now in progress and you may close this window.

  • Choose new content alerts to be informed about new research of interest to you
  • Easy remote access to your institution's subscriptions on any device, from any location
  • Save your searches and schedule alerts to send you new results
  • Export your search results into a .csv file to support your research

Login or register to access this feature

Register now or learn more

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Confounding Variables | Definition, Examples & Controls

Confounding Variables | Definition, Examples & Controls

Published on May 29, 2020 by Lauren Thomas . Revised on June 22, 2023.

In research that investigates a potential cause-and-effect relationship, a confounding variable is an unmeasured third variable that influences both the supposed cause and the supposed effect.

It’s important to consider potential confounding variables and account for them in your research design to ensure your results are valid . Left unchecked, confoudning variables can introduce many research biases to your work, causing you to misinterpret your results.

Table of contents

What is a confounding variable, why confounding variables matter, how to reduce the impact of confounding variables, other interesting articles, frequently asked questions about confounding variables.

Confounding variables (a.k.a. confounders or confounding factors) are a type of extraneous variable that are related to a study’s independent and dependent variables . A variable must meet two conditions to be a confounder:

  • It must be correlated with the independent variable. This may be a causal relationship, but it does not have to be.
  • It must be causally related to the dependent variable.

Example of a confounding variable

Prevent plagiarism. Run a free check.

To ensure the internal validity of your research, you must account for confounding variables. If you fail to do so, your results may not reflect the actual relationship between the variables that you are interested in, biasing your results.

For instance, you may find a cause-and-effect relationship that does not actually exist, because the effect you measure is caused by the confounding variable (and not by your independent variable). This can lead to omitted variable bias or placebo effects , among other biases.

Even if you correctly identify a cause-and-effect relationship, confounding variables can result in over- or underestimating the impact of your independent variable on your dependent variable.

There are several methods of accounting for confounding variables. You can use the following methods when studying any type of subjects— humans, animals, plants, chemicals, etc. Each method has its own advantages and disadvantages.

Restriction

In this method, you restrict your treatment group by only including subjects with the same values of potential confounding factors.

Since these values do not differ among the subjects of your study, they cannot correlate with your independent variable and thus cannot confound the cause-and-effect relationship you are studying.

  • Relatively easy to implement
  • Restricts your sample a great deal
  • You might fail to consider other potential confounders

In this method, you select a comparison group that matches with the treatment group. Each member of the comparison group should have a counterpart in the treatment group with the same values of potential confounders, but different independent variable values.

This allows you to eliminate the possibility that differences in confounding variables cause the variation in outcomes between the treatment and comparison group. If you have accounted for any potential confounders, you can thus conclude that the difference in the independent variable must be the cause of the variation in the dependent variable.

  • Allows you to include more subjects than restriction
  • Can prove difficult to implement since you need pairs of subjects that match on every potential confounding variable
  • Other variables that you cannot match on might also be confounding variables

Statistical control

If you have already collected the data, you can include the possible confounders as control variables in your regression models ; in this way, you will control for the impact of the confounding variable.

Any effect that the potential confounding variable has on the dependent variable will show up in the results of the regression and allow you to separate the impact of the independent variable.

  • Easy to implement
  • Can be performed after data collection
  • You can only control for variables that you observe directly, but other confounding variables you have not accounted for might remain

Randomization

Another way to minimize the impact of confounding variables is to randomize the values of your independent variable. For instance, if some of your participants are assigned to a treatment group while others are in a control group , you can randomly assign participants to each group.

Randomization ensures that with a sufficiently large sample, all potential confounding variables—even those you cannot directly observe in your study—will have the same average value between different groups. Since these variables do not differ by group assignment, they cannot correlate with your independent variable and thus cannot confound your study.

Since this method allows you to account for all potential confounding variables, which is nearly impossible to do otherwise, it is often considered to be the best way to reduce the impact of confounding variables.

  • Allows you to account for all possible confounding variables, including ones that you may not observe directly
  • Considered the best method for minimizing the impact of confounding variables
  • Most difficult to carry out
  • Must be implemented prior to beginning data collection
  • You must ensure that only those in the treatment (and not control) group receive the treatment

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Ecological validity

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

A confounding variable , also called a confounder or confounding factor, is a third variable in a study examining a potential cause-and-effect relationship.

A confounding variable is related to both the supposed cause and the supposed effect of the study. It can be difficult to separate the true effect of the independent variable from the effect of the confounding variable.

In your research design , it’s important to identify potential confounding variables and plan how you will reduce their impact.

A confounding variable is closely related to both the independent and dependent variables in a study. An independent variable represents the supposed cause , while the dependent variable is the supposed effect . A confounding variable is a third variable that influences both the independent and dependent variables.

Failing to account for confounding variables can cause you to wrongly estimate the relationship between your independent and dependent variables.

An extraneous variable is any variable that you’re not investigating that can potentially affect the dependent variable of your research study.

A confounding variable is a type of extraneous variable that not only affects the dependent variable, but is also related to the independent variable.

To ensure the internal validity of your research, you must consider the impact of confounding variables. If you fail to account for them, you might over- or underestimate the causal relationship between your independent and dependent variables , or even find a causal relationship where none exists.

There are several methods you can use to decrease the impact of confounding variables on your research: restriction, matching, statistical control and randomization.

In restriction , you restrict your sample by only including certain subjects that have the same values of potential confounding variables.

In matching , you match each of the subjects in your treatment group with a counterpart in the comparison group. The matched subjects have the same values on any potential confounding variables, and only differ in the independent variable .

In statistical control , you include potential confounders as variables in your regression .

In randomization , you randomly assign the treatment (or independent variable) in your study to a sufficiently large number of subjects, which allows you to control for all potential confounding variables.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Thomas, L. (2023, June 22). Confounding Variables | Definition, Examples & Controls. Scribbr. Retrieved September 23, 2024, from https://www.scribbr.com/methodology/confounding-variables/

Is this article helpful?

Lauren Thomas

Lauren Thomas

Other students also liked, independent vs. dependent variables | definition & examples, extraneous variables | examples, types & controls, control variables | what are they & why do they matter, what is your plagiarism score.

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Impact of COVID-19 on psychological distress in subsequent stages of the pandemic: The role of received social support

Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Writing – original draft

* E-mail: [email protected]

Affiliations Department of Psychology, Indiana University of Pennsylvania, Indiana, PA, United States of America, Institute of Psychology, Polish Academy of Sciences, Warsaw, Poland

ORCID logo

Roles Conceptualization, Formal analysis, Methodology, Writing – original draft

Affiliation Academy of Health and Social Studies, NHL Stenden University of Applied Sciences, Leeuwarden, The Netherlands

  • Krzysztof Kaniasty, 
  • Erik van der Meulen

PLOS

  • Published: September 25, 2024
  • https://doi.org/10.1371/journal.pone.0310734
  • Reader Comments

Fig 1

This longitudinal study examined a sample of adult Poles (N = 1245), who were interviewed three times from July 2021 to August 2022, during the later stages of the COVID-19 pandemic. The study had two primary objectives. The first was to assess the impact of the pandemic on psychological distress, measured through symptoms of depression and anxiety. The pandemic’s effects were evaluated using three predictors: direct exposure to COVID-19, COVID-19 related stressors, and perceived threats from COVID-19. The second objective was to investigate the role of received social support in coping with the pandemic’s hardships. Receipt of social support was measured by both the quantity of help received and the perceived quality of that support. A Latent Growth Curve Model (LGCM) was employed to analyze psychological distress across three waves, controlling for sociodemographic variables, non-COVID life events, coping self-efficacy, and perceived social support. Findings indicated that COVID-19 stressors and COVID-19 threats were strongly and consistently associated with greater psychological distress throughout the study period. The impact of direct COVID-19 exposure was limited. The quantity of received support predicted higher distress, whereas higher quality of received support was linked to better mental health. Crucially, the relationship between the quantity of support and distress was moderated by the quality of support. Effective social support was associated with the lowest distress levels, regardless of the amount of help received. Conversely, receiving large amounts of low-quality support was detrimental to psychological health. In summary, the ongoing psychosocial challenges of COVID-19 significantly eroded mental health, highlighting the importance of support quality over quantity in coping with significant life adversities.

Citation: Kaniasty K, van der Meulen E (2024) Impact of COVID-19 on psychological distress in subsequent stages of the pandemic: The role of received social support. PLoS ONE 19(9): e0310734. https://doi.org/10.1371/journal.pone.0310734

Editor: Ali B. Mahmoud, St John’s University, UNITED STATES OF AMERICA

Received: December 18, 2023; Accepted: September 5, 2024; Published: September 25, 2024

Copyright: © 2024 Kaniasty, van der Meulen. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are publicly available from the OSF repository ( http://osf.io/xmzw8 ).

Funding: Funding preparation of this paper was supported by Grant OPUS-19 grant No. 2020/37/B/HS6/02957 awarded to Krzysztof Kaniasty from the Polish National Science Centre (Narodowe Centrum Nauki). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

It is reasonable to assert that the COVID-19 pandemic, like no other collective crisis in the world’s history, prompted an unprecedented number of research studies, reviews and meta-analyses attempting to assess its impact on mental health. Many quantitative and qualitative syntheses documented that the heaviest mental health toll on general public, most frequently assessed as symptoms of depression, anxiety, PTSD, or psychological distress, occurred in the early months of the pandemic [ 1 – 4 ]. Similar patterns of findings emerged within different subgroups, such as COVID-19 patients [ 5 ], children and adolescents [ 6 ], college students [ 7 ], elderly [ 8 ] or healthcare workers [ 9 ]. Evidence concerning whether in later months of the first year of the pandemic mental health problems decreased [ 1 , 4 ] or remained stable at moderately elevated levels [ 2 , 3 ] is not yet conclusive.

It is also reasonable to assert that the psychological impact of the SARS-CoV-2 virus would persist through subsequent phases of the pandemic. Few, thus far published, longitudinal investigations with mental health assessments conducted after July 2021 [ 10 – 15 ], evidenced overall improvements in psychological health in various populations since the onset of the pandemic. Nevertheless, mental health issues appear elevated as compared to pre-pandemic times [ 16 ].

The COVID-19 experience should be regarded as a disaster or catastrophe that set off a prolonged series of diverse and stressful hardships. The pandemic encompassed all possible classes of stressors [ 17 ]: traumas (e.g., death, injuries), life events (e.g., lockdowns, job interruptions/loss), daily hassles (e.g., social distancing, mask-wearing), macro-system events (e.g., economic downturns, societal protests/disputes), nonevents (e.g., postponements/cancellations of expected life milestones such as graduations and weddings), and chronic stressors (e.g., ongoing life hardships such as caregiving, environmental challenges). Each of these facets of the COVID-19 catastrophe independently impacted psychological and social well-being, capturing different aspects of the comprehensive spectrum of stress processes [ 17 , 18 ].

The present longitudinal study had two major goals. First, it aimed to assess the impact of the pandemic in its later phases (July 2021—August 2022) on psychological distress assessed as combined symptoms of depression and anxiety. The ongoing presence of the pandemic in people’s lives was measured using three predictor variables. COVID-19 direct exposure for individuals and their significant others was evaluated as probable encounters with the virus. This assessment encompassed a range of experiences from simple testing or mild infection to severe illness, including hospitalization or the death of a significant person. Several COVID-19 studies have documented the association between direct exposure to the SARS-CoV-2 virus and psychological health [ 19 , 20 ]. The second measure, COVID-19 stressors, included a series of significant secondary stressors such as occupational disruptions, financial insecurity, and delays or cancellations. These stressors have also been shown to adversely impact mental health [ 21 , 22 ]. Finally, COVID-19 threats, likely the most frequently assessed indicator of the pandemic’s adversities, evaluated people’s concerns and fears for their own health and the health of their families [ 21 , 23 ].

The second goal of the present study was to investigate the role of social support in the ongoing process of coping with COVID-19 adversities. Social support is a multifaceted construct that encompasses social interactions providing actual assistance and embedding individuals in a network of relationships perceived as loving, caring, and readily available in times of need [ 24 ]. The most central distinction between different forms of social support lies between perceived social support and received social support. Perceived social support refers to subjective appraisals of being reliably connected to others, such as believing that "If I needed it, I can easily find someone to talk to about my troubles, worries, or concerns." In contrast, received social support pertains to the actual support received, such as "How often did someone give, loan, or offer you money?"

Perceived social support, regarded as the principal facet of social support, has consistently been shown to be advantageous for better postcrisis outcomes [see 25 , 26 ]. Conversely, studies assessing received social support have produced inconsistent findings. Some investigations have documented a clear benefit of greater received support in reducing distress. However, many other studies have found no effects, or worse, have shown positive associations between received support and increased mental health problems [ 27 , 28 ]. Accordingly, the stress and coping literature consistently highlights the benefits of social support for psychological adjustment, with an emphasis on perceived social support rather than received support. This focus poses challenges for public health professionals and practitioners who provide aid, support, and psychological interventions to communities recovering from disasters. It also presents difficulties for countless individuals worldwide who have been striving to offer actual support to one another during the challenging times of the COVID-19 pandemic.

The reasons why the efficacy of received social support may be undermined during times of coping with stressors are extensive [ 27 , 29 , 30 ]. Providing and receiving help in times of crisis, whether through personal, charitable, or professional relationships, is a complex and challenging process. Good intentions and sincere concerns often mix with confusion, skepticism, and psychological threats. Simply put, while the desire to relieve the suffering of others is commendable, not all forms of social support prove to be helpful.

A number of recommendations can be found in the social support literature that offer ideas for identifying theoretical pathways, along with empirical and practical prerequisites for detecting the genuinely helpful influence of received social support [ 27 , 30 ]. Rini and Dunkel Schetter [ 31 ] proposed a comprehensive theoretical framework for investigating the efficacy of received social support, which they labeled the “ social support effectiveness model ” (SSE). The SSE model delineates the joint influence of the “ quantity ” and “ quality ” of received social support and the extent to which helping provisions meet recipients’ expectations, needs and demands from the stressors they face.

The quantity dimension of support receipt is determined by the match between the recipient’s needs and the amount of help received, ensuring the support is neither too little nor too much. The quality dimension involves more complex practical and psychological dynamics, including: a) “functional fit”—the type of help aligns with what is needed; b) “skillfulness and sensitivity”–support is delivered in ways that minimize the recipient’s feelings of being a burden; c) “ease of access”–help is not difficult to get; and d) “impact on self-concept”–the support received does not reflect poorly on one’s self-esteem, avoiding blame, feelings of incompetence, or a sense of indebtedness.

Rini and her colleagues [ 32 ] provided strong empirical evidence for the SSE model in a sample of hematopoietic stem cell transplant survivors. When examined together, the quantity of support received was predictive of more distress experienced by survivors, whereas favorable appraisals of the effectiveness of support received were associated with better mental health. Most critically, the two operationalizations of received social support statistically interacted with each other producing a disconcerting pattern revealing that when support was judged as being low in quality, receiving greater quantities of it predicted elevated distress. However, recipients of effective support reported the lowest levels of distress, regardless of the amount of help received. The importance of assessing both the amount and quality of postcrisis received social support for psychological functioning was also evidenced among survivors of disasters [ 33 – 35 ]. Altogether, these findings highlight the importance of enhancing the quality of help provided to people coping with life difficulties. Simply providing "more" support is not necessarily better and can potentially be detrimental if offered in substandard ways. This underscores the need for support that is provided in the right amount and type, delivered with skill and sensitivity, easily accessible, and without negative repercussions for the recipient’s self-image.

In addition to reliance on social support, theory and research on coping with stressful life events repeatedly emphasize the importance of self-efficacy as a critical factor influencing adaptation to significant life challenges, threats, and losses [ 36 , 37 ]. Confidence in one’s own coping abilities and social support resources dynamically influence each other. Received social support may enhance self-efficacy (i.e., enabling path), whereas self-efficacy may mobilize (i.e., cultivation path) social networks to action [ 38 ].

The present study examined the role of social support receipt, measured in terms of both quantity and quality, on psychological distress. The analyses accounted for the influence of sociodemographic factors, perceived social support, and beliefs in coping self-efficacy, which are two crucial resources that routinely promote successful coping with stressors. The uniqueness of the COVID-19 catastrophe for studying received social support stems from the fact that everyone has been subjected to its threats, disruptions, and losses. Nearly everyone has needed support at some point, and nearly everyone has provided support at some point.

Sample and procedure

Wave 1 sample was recruited between July 6 and 19, 2021, from an online survey panel (“Ariadna,” a Polish online research panel with over 150,000 registered and verified users) to be representative of Polish adults in terms of gender, age, and size of municipality. It originally consisted of 3074 respondents who met all quality control requirements established for the study based on answers to attention questions, and times of completion of surveys (i.e., participants with completion times faster than 1 SD from the sample mean were eliminated). Wave 2 data were collected in February 2022, and Wave 3 followed six months later in August 2022.

The sample analyzed in this study comprised 1,245 respondents who completed all three waves of data assessments and met subsequent (Wave 2 and 3) quality control requirements. A comparison of these participants with those who dropped out after earlier waves of assessments ( N = 1829, 59.4%) on Wave 1 variables revealed some significant differences. The drop-out participants were younger, less educated, and more likely to live in villages or smaller towns. They were also less likely to be in relationships and had higher scores on the psychological distress measure.

The study was approved by the Institutional Review Board of the Institute of Psychology, Polish Academy of Sciences (Approvals # Wave 1-13/V/2021, Wave 2-01/1/2022, Wave 3-17/VII/2022). All participants provided written consent prior to each wave of assessments.

Outcome variable—psychological distress.

Symptoms of psychological distress were assessed with 8 items from the Patient Health Questionnaire (PHQ-8) [ 39 ], and 7 items from the Generalized Anxiety Disorder scale (GAD-7) [ 40 ]. These self-reports have been frequently used to assess depressive (e.g., “Little interest or pleasure in doing things”) and anxiety symptoms (e.g., “Feeling nervous, anxious or on edge”). In order to keep our measures consistent across all surveys’ administrations with regards to time frames of responding and response opinions, both instruments asked respondents about how often they were bothered by these symptoms in the last 30 days (instead of the typical for these instruments time frames of “the past two weeks”), with the following five answer choices: 0 ( Never ), 1 ( Rarely ), 2 ( Sometimes ), 3 ( Often ), and 4 ( Very often ). These options were recoded to a four-point scale of the standard PHQ-8 and GAD-7’s response sets (range 0 to 3, with answers “rarely” and “sometimes” both coded as 1). Cronbach’s α reliability coefficients of the PHQ-8 and GAD-7 scores computed as sums were high at all assessment times (0.92–0.94).

The PHQ-8 and GAD-7 are often combined into a single measure of general distress [ 41 ], consequently the total score of psychological distress used in the present analyses was a sum of all 15 items. Confirmatory factor analyses using a Diagonally Weighted Least Square Estimator on the present data showed excellent fit for single factor solutions (see S1 Table ). Cronbach’s alphas of the psychological distress total scores at each measurement wave were all high (> 0.95).

Measurement of focal predictors.

COVID-19 direct exposure index was based on a sum of answers to 11 questions that asked about exposure to SARS-CoV-2 in the past 16 (Wave 1) or 6 months (Waves 2 and 3). Questions referred to the participant (e.g., being tested for the virus, if positive how severe was the illness, hospitalization) and to the family and friends (including deaths). Different answer options were used depending on the content of the question, but all responses were recoded as 0 ( No or minimal exposure ) or 1 ( Moderate to severe exposure ).

COVID-19 stressors was derived from the average of items that evaluated the extent to which pandemic-specific events (i.e., decline in household budget, irreversible cancellation of important personal events, postponement of important events, new/additional burdens with care for children, new/additional burdens with care of elderly) negatively influenced respondents’ lives in the past 16 months (Wave 1, 10 items) or 6 months (Waves 2 and 3, 6 items; 0 = Did not happen or not at all , 4 = To a great extent ). One additional item was included that asked whether a participant and/or someone in their household experienced COVID-19 related job loss that had negative consequences.

COVID-19 threats involved 12 questions asking the participants about their fears and concerns regarding current threats associated with the continuing pandemic (e.g., “I am concerned that someone close to me will get sick with COVID-19, even if it would be a subsequent infection,” “I am worried about difficulties with access to medical personnel with issues not related to COVID-19”). Items were answered using a 7-point Likert-type response option format anchored with 1 ( Definitely disagree ) and 7 ( Definitely agree ). Reliability coefficients of the scores were high at each assessment (>.92).

Quantity of received social support was measured by the Inventory of Postdisaster Social Support [ 42 ]. Respondents were asked to estimate how often they received different types of help within the timeframe of the past 16 (Wave 1) and 6 months (Waves 2 and 3). For example, a question at Wave 1 asked: “How often, in the last 16 months (i.e., since the beginning of the pandemic), did family members give, loan or offer you money? Regardless of the reason, did this happen…? (1 = never , 2 = rarely , 3 = sometimes 4 = quite often , 5 = very often ). Another example question, from Wave 3 (August 2022), read: How often, in the past 6 months (i.e., from the beginning of February until today), did friends help you understand the situation you were in?

Three types of received support were assessed: emotional (4 items), informational (4 items), and tangible (8 items) support [ 43 ]. Each of these 16 items was asked two times to gage amounts of support received from two sources: family/relatives and friends/close acquittances. Thus, the total scale score was an average of 32 items. Reliability coefficients of the scores were high at each assessment wave (>. 96).

Quality of received social support was assessed with 12 items modeled on the instrument developed by Rini and Dunkel Schetter [ 31 , 32 ] based on their SSE model. The same six questions, with varying Likert-type five answer options (all coded 1 thru 5), asked respondents for their appraisals of the support received from family/relatives and friends/close acquittances. Respondents judged the help they received along the following dimensions: quantity (“When family members tried to help you, how well did the amount of help you received match the amount of help you wanted?”), functional fit with needs (“How often have you found yourself wishing the help you received had been different—for instance, a different type of help, or offered in a different way or at a different time?), skillfulness of support delivery (“How often did your friends who gave you help provide it skillfully?), ease of getting help (“When you needed help from family members, how often was it difficult to obtain?”; “How often did friends offer you support without you having to ask for it?”), and the overall appraisal of effectiveness of received help (“Broadly speaking, how effective or useful was the help you received from your family?”). Cronbach’s alphas of average scores computed on 12 items were high at each assessment wave (> .85).

Measurement of additional predictors.

Normative life events index was a sum of answers (0 = No , 1 = Yes ) to questions asking whether, in the past 16 (Wave 1) and past 6 months (Waves 2 and 3), respondents experienced any of 19 major life events (e.g., change in marital status, birth of a child/grandchild, other than COVID-19 illness of self or family, not COVID-19 bereavements). The count of non-COVID events was recoded to range from 0 to 9.

Coping self-efficacy was measured with six items modeled on the Trauma Self-Efficacy scale [ 44 ]. At Wave 1 and 2 the items referred to participants’ perceived capability to cope with challenges and uncertainties of the COVID-19 pandemic (e.g., “Today, how capable are you to successfully deal with your emotions [ anxiety , sadness , disaffection , anger ] related to the pandemic?”; 1 = Not capable , 7 = Very capable ). At Wave 3, the same items were asked about participants’ appraisals of their capability to cope with serious negative life events that might happen to them in the future (e.g., “In the future, when faced with a difficult life circumstance, how capable will you be to successfully deal with emotions [ anxiety , sadness , disaffection , anger ] that you might experience at that time?”). Confirmatory factor analyses with scale items showed acceptable fit for single factor solutions [ 45 ]. Internal reliability coefficients of average scores of this scale were high at each wave (> .93).

Perceived social support was assessed with 12 items from the Interpersonal Support Evaluation List [ 46 ] and 3 items from the Social Provision Scale [ 47 ] that asked about an overall perceived availability of emotional (5 items), informational (4 items) and tangible (6 items) social support (e.g., “If I were sick and needed someone to take me to the doctor, I would have no trouble finding that person;” “I have close relationships that provide me with a sense of emotional security and well-being;”1 = definitely false ; 4 = definitely true ). Cronbach’s alphas of average scores of this 15-item instrument were high at each wave (> .92).

Sense of danger due to the war was also assessed because during the course of this longitudinal research Russia attacked Ukraine (February 24, 2022), a country bordering with Poland. To account for this additional life stressor, participants were asked at Wave 3 (August 2022) to what extent, in the past 30 days, they were afraid, worried, and/or concerned about their own, their family, and the entire country’s safety and welfare due to the ongoing armed conflict (e.g., “To what extent have you felt that life of your family members and relatives were in danger because of the war in Ukraine?”; 1 = Not at all , 5 = To a very great extent; α = .85) [ 48 ].

Sociodemographic variables.

Five sociodemographic factors were also included in all analyses. Participants’ gender and their marital status were scored as dichotomous variables. Age was scored in years, respondents’ educational attainment was classified into four levels and size of municipality was grouped into five categories.

Statistical analysis

The lavaan package (version 0.6–9) [ 49 ] for R was used to conduct latent growth curve modelling (LGCM) with psychological distress at three waves as an outcome. The latent growth was modelled to be a linear process. Distress was normally distributed (skewness < 0.630 and kurtosis < 0.720 at all three measurement waves) making it feasible to use maximum likelihood estimation for our models.

Three models with increasing complexity were fitted. First, a model with only the psychological distress latent intercept and slope without any predictors was tested. In the next model, the time-invariant predictors of age, gender, educational level, marital status and municipality size were added as predictors of the psychological distress latent intercept and slopes.

The final model of interest in was a model with a latent intercept and slope (using psychological distress measured at three waves), and included time-invariant predictors of the latent intercept and slope (gender, age, educational level, marital status, and municipality size), and time-varying predictors that were measured at all three waves predicting trajectory deviations either only concurrently (COVID-19 exposure, COVID-19 stressors, COVID-19 threat, Non-COVID events), and both concurrently and prospectively (coping self-efficacy, perceived social support, received support-quantity, and received support-quality). Fig 1 gives a full overview of the study model.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0310734.g001

A stepwise approach was used to successively fitting models leading up to more complex models, running from a growth curve model only to the addition of both time-invariant and time-varying predictors. First, we fitted a model with only the growth curve, which included a latent intercept and slope. Next, we enhanced the model by adding the time-invariant predictor. Finally, we further refined the model by incorporating the time-varying predictors. All variables were mean-centered before being entered into the conditional models. The following model fit statistics were used: χ 2 (and its significance), RMSEA (and its confidence interval), CFI, NFI and SRMR. Using Hu and Bentler’s [ 50 ] criteria, a CFI and NFI close to .95, an SRMR close to .06 and an RMSEA close to 0.08 were indications of adequate fit.

Post-hoc analyses on the interaction effects were conducted by categorizing the quality of received support into three levels (< - 1 SD , -1 SD to + 1 SD , > + 1 SD ). Subsequently, a simple regression of predicted distress scores (retrieved from the most complex LGCM) on the quantity of received support for each category were conducted.

Table 1 provides an overview of descriptive statistics and S2 Table provides correlations for all variables ( N = 1047; participants who reported receiving no support at any of three measurement times were excluded).

thumbnail

https://doi.org/10.1371/journal.pone.0310734.t001

In total three models were tested (unconditional model, conditional model with only time-invariant predictors and conditional model with time-invariant and time-varying covariates). Before we modelled our intended model, we assessed: 1) potential multicollinearity among predictors and 2) potential overfit of the model (given the number of predictors). Multicollinearity was assessed by examining correlations among the predictor variables. Of the 528 correlations possible among all predictors, 24 were larger than 0.5 or smaller than -0.5 (4.5%).

These stronger correlations existed among the same variables measured at different times and between COVID-19 coping self-efficacy and psychological distress. To determine whether these correlations raised multicollinearity issues in the LGCM, three multiple regression models were run with the predicted distress scores at each wave as dependent variables and the LGCM-corresponding time-varying covariates as independent variables. Independent variable’s variance inflation factors (VIFs) of these models never exceeded values of 2.871 which was well under the threshold of 5 and, thus, signaling no obvious multicollinearity problems.

Overfit of the model was assessed by changes in the Akaike’s Information Criterion (AIC) of the predictors in relation to a model without predictors—a decrease of the AIC was indicative of an enhanced model fit when the particular predictor was added to the model. We examined both the bivariate decreases in AIC for each predictor (i.e. differences in AIC between every predictor separately to a model without any predictors) and hierarchical decreases in AIC (i.e. successively adding predictors and determining the decrease in AIC after each addition). Some variables appeared to add little to the model and caused a slight increase in the AIC. However, these decreases were relatively small and their negative impact on model fit, thereby, was rather minor. For reasons of completeness, these variables were kept in the model, nonetheless. S3 Table gives a full overview of overfit assessment. An additional consideration for overfit is the adequacy of the sample size in relation to model complexity. This can be captured by the ratio of estimated parameters to the number of respondents [ 51 , 52 ]; a minimum is 1 to 5 (i.e. 5 respondents for every estimated parameter), for the current study this was 1 to 18.70 highlighting an exceedingly sufficient sample size. Therefore, our modelling approaches were deemed valid.

Latent growth

All models, one unconditional and two conditional models, yielded significant latent intercepts and non-significant latent slopes. The first rows of Table 2 indicate the latent growth factors (intercept and slope) for each model. The non-significant slopes in all three models reflect a general absence of change over time. Only in the unconditional model, the latent intercept and slope were associated; individuals with higher initial starting values showed a higher decline over time ( cov = .177). In both conditional models, the latent intercept and slope were unrelated ( cov = .113 and .001, respectively).

thumbnail

https://doi.org/10.1371/journal.pone.0310734.t002

Time-invariant predictors.

In the model with only time-invariant predictors (Model 2; see Table 2 ), the latent intercept was associated with gender and age; women and younger respondents were more distressed initially. None of the time-invariant predictors were predictive of the latent slope.

Time-varying predictors: COVID-19 variables, non-COVID life events, and sense of danger due to the war.

The last column of Table 2 (Model 3) conveys the outcomes of time-varying predictors. The COVID-19 experiences variables (COVID-19 exposure, stressors and threats) and the experience of non-COVID events were assessed as concurrent predictors (i.e. i th wave to i th wave) of distress at each wave. Of these variables, the COVID-19 stressors and COVID-19 threats, and non-COVID events were significantly associated with distress at each wave. Higher levels of stressors, threats and other life events were associated with more symptoms of distress. COVID-19 exposure was only significantly positively associated with distress at Wave 2; i.e., more virus exposure was predictive of with more distress. Sense of danger due to the Russian-Ukrainian war significantly predicted higher levels of symptom at Wave 3.

Time-varying predictors: Coping self-efficacy, perceived social support, quantity and quality of received social support.

Coping self-efficacy ratings were strongly both concurrently and prospectively (i.e. i th wave to i+1 th wave) associated with lower distress scores at all waves. Perceived social support was concurrently associated with lower levels of distress symptoms, but never prospectively.

Quantity and quality of received support were concurrently associated with distress at all three measurement moments. Prospectively, both Wave 1 quantity and quality of received support were predictive of later distress only at Wave 2. Received support quantity was positively associated with psychological distress, such that greater amounts of support were associated with more distress. However, appraisals of the quality of received support were negatively associated with distress, such that greater quality of received support was associated with lower levels of distress symptoms.

The interaction between Wave 1 quantity of received support by Wave 1 quality of received support was statistically significant predicting Wave 1 distress. Fig 2 presents the plots of this interaction associated with observed (left panel) and predicted distress scores (right panel). Persons who judged support received as low in quality reported the highest levels of distress, and greater amounts of received help were strongly associated with higher levels of distress (post-hoc slope analyses, B = 1.810, p = .030). The slope for the average quality of received support group was also statistically significant ( B = 0.737, p = .020) but the adverse effect of the amount of received support was less pronounced. Most importantly, however, persons who received most efficacious support reported lowest levels of symptoms compared to the other two groups, and the amount of help they actually received did not influence of their experience of distress (B = 0.607, p = .207). No other quantity by quality interactions were statistically significant.

thumbnail

Interaction Effect of Received Support Quantity with Quality on Observed (left pane) and Predicted Distress Scores (right pane). Predicted scores were retrieved from the Latent Growth Curve Model including all time-invariant and time-varying predictors.

https://doi.org/10.1371/journal.pone.0310734.g002

Fundamentally, the experience of COVID-19 could be considered a total catastrophic event because the pandemic spurred all possible classes of stressors [ 17 ]. It has been a traumatic and/or major life changing event, it created daily hassles, it caused macro-system turbulences, generated a surplus of disappointing nonevents, and many of its repercussions have evolved into identifiable chronic stressors. All these facets of the COVID-19 pandemic represent separate parts of the overall universe of stress processes, each potentially adversely influencing mental health.

The present study examined psychological distress trajectories a sample of adult Poles who were interviewed three times from July 2021 to August 2022, thus during later stages of the pandemic. A Latent Growth Curve Model (LGCM) revealed that respondents differed in their level of psychological distress, although changes in these trajectories were generally absent. In other words, individual growth trajectories only differed in the level of distress, but all trajectories were horizontal. Relative stability of the pandemic-related symptomatology was also documented in the meta-analysis of prevalences of depression reported by studies conducted during the first year of the pandemic [ 2 ]. Similarly to prior COVID-19 studies, the levels of mental health were dependent on gender and age with women and younger respondents exhibiting more symptoms [ 2 , 3 , 6 , 7 , 9 ].

COVID-19 stressors and COVID-19 threats were both strongly and consistently associated with greater distress throughout the study. The influence of COVID-19 direct exposure was limited to one assessment period. Notwithstanding the overall traumatic and grave consequences of the SARS-CoV-2 virus, it can be said that the pandemic’s psychosocial challenges and disturbances have most forcefully eroded mental health [ 21 , 22 ]. Continuing effect of COVID-19 pandemic on distress in the present sample was observed controlling for harmful influences of other normative life events and sense of danger associated with Russia’s invasion of Ukraine [ 48 , 53 ].

There are many psychological and social resources that empower humans to show resilience and recover successfully from adversity. Chief among them are survivors’ sense of trust in their own ability to face demands/losses posed by the stressor [see 36 , 37 , 54 ] and perceptions of being supported [see 55 , 56 ]. In accord with other investigations of the pandemic, results of the present study showed that higher levels of coping self-efficacy [ 57 – 59 ] and perceived social support [ 60 – 63 ] were consistently associated with lower levels of distress symptomatology.

The main interest of this research was focused on mental health influence of the amount of received social support and appraisals of its quality. The few available COVID-19 studies that investigated the quantity of actual receipt of help have produced mixed findings, yielding very limited beneficial effects [ 64 , 65 ], or no effects at all [ 59 , 66 ]. Contradictory evidence was also reported suggesting that the amount of received support was associated with lower distress [ 67 , 68 ], or with greater distress [ 69 ]. On the one hand, the results of the present analyses showing adverse psychological effects of receiving greater levels of support could just add to this confusion. However, more favorable appraisals of effectiveness of received support showed a protective function and, with equal consistency, were associated with lower levels of psychological distress. The pattern of the received support quantity by its quality interaction offers a reasonable and theory-based (SSE model) [ 31 ] interpretation of this apparent inconsistency. Persons who received effective social support exhibited the lowest levels of distress symptoms, irrespective of the amount of help. On the other hand, receiving large amounts of ineffective social support appeared to be detrimental to mental health. These results replicated an interaction pattern reported by Rini et al.’s [ 32 ] and should warn potential social support providers that if they cannot help smart , they should not attempt to help that hard . In other words, as long as it is delivered in an efficacious manner, received social support protects mental health in the context of stressful circumstances [ 33 – 35 ].

Strengths and limitations

The use of LGCM allowed to model psychological distress trajectories and predicting distress trajectory deviations from factors that were both stable and changed over time. In other words, the model depicted individuals’ typical distress trajectories and identified why and when individual’s had a-typical distress levels influenced by a comprehensive set of (possible) experiences along the trajectory, most notably: COVID-19 experiences and received support. Conservative analyses included, as control factors, relevant sociodemographic variables, potentially stressful life events not related to the pandemic, and participants’ concerns about the ongoing war in neighboring Ukraine. The study’s sample was large and randomly selected from a nationally representative internet panel. However, across the study’s three assessments, close to 60% of the initial sample was not retained due to attrition and strict data quality control procedures. In addition, all typical disadvantages associated with longitudinal online surveys apply. Finally, although the quantity by quality of received support interaction was consistent with theoretical underpinnings of the study it reached statistical reliability only one time. Thus, this interactive effect should be viewed with prudence as it requires additional examinations.

Although the rates of severe illness and deaths due to infections with variants of the coronavirus SARS-CoV-2 have gradually decreased and vaccination campaigns continue to reach more and more people, it is not unreasonable to assert that adverse mental health impact of the COVID-19 pandemic will persist. Results of the present study suggest that the ongoing presence of COVID-19 concerns, disturbances and losses have become chronic stressors. Citizens of the world may have to “domesticate” these challenges along with mastering personal and collective strategies to prevent and mitigate harmful psychological consequences of the pandemic. Clearly, beliefs in coping self-efficacy and sense of being reliably connected to others serve as robust contributors to successful coping and adaptation. The conditions under which actually receiving social support are less straightforward, particularly in the context of community-wide emergencies that routinely call for considerable amounts of help and assistance. What appears decisive when aiding people in times of coping with a variety of stressors is the quality, not necessarily quantity, of support provided. In our private as well as professional roles as helpers, it is worth remembering that the benefits of support provided to others may be achieved more readily if we attempt to help smarter rather than harder .

Supporting information

S1 table. confirmatory analysis of single factor distress scale composed of gad-7 and phq-8..

https://doi.org/10.1371/journal.pone.0310734.s001

S2 Table. Correlations among study variables (n = 1047).

https://doi.org/10.1371/journal.pone.0310734.s002

S3 Table. Assessment of model overfit and incremental value of predictors.

https://doi.org/10.1371/journal.pone.0310734.s003

Acknowledgments

The authors would like to thank the members of our research team: Maria Baran, Marta Boczkowska, Katarzyna Hamer, and Beata Urbańska.

  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 17. Wheaton B. The domains and boundaries of stress concepts. In: Kaplan H. editor. Psychosocial stress : perspectives on structure , theory , life-course , and methods . Academic Press, New York. 1996; p. 29–70.
  • 24. Hobfoll SE, Stokes JP. The process and mechanics of social support. In: Duck DF Fray SE Hobfoll B, Ickes B. Montgomery B. (Eds.), The handbook of research in personal relationships. John Wiley & Sons, Hoboken, New Jersey.1988. p. 497–517.
  • 26. Kaniasty K, Norris F. Distinctions that matter: Received social support, perceived social support, and social embeddedness after disasters. In: Neria Y, Galea S, Norris F. (Eds.), Mental Health and Disasters. Cambridge University Press. Cambridge, UK. 2009. p. 175–202.
  • 27. Kaniasty K. Social support and psychological trauma. In: Reyes G, Elhai J, Ford J. (Eds.), Encyclopedia of Psychological Trauma. John Wiley & Sons, Hoboken, New Jersey. 2008. p. 607–612.
  • 28. Kaniasty K, Urbańska B. Social support mobilization and deterioration following disasters resulting from natural and human-induced hazards. In: Williams R, Kemp V. Porter K, Healing T, Drury J, editors. Major incidents, pandemics and mental health: the psychosocial aspects of health emergencies, incidents, disasters and disease outbreaks. Cambridge University Press, Cambridge, UK. 2024.
  • 30. Dunkel Schetter C, Bennett T. Differentiating the cognitive and behavioral aspects of social support. In: Sarason B, Sarason I, Pierce G. editors. Social support : an interactional view . John Wiley & Sons. 1990. New York, pp. 267–296.
  • 31. Rini C, Dunkel Schetter C. The effectiveness of social support attempts in intimate relationships. In: Sullivan K, Davila J. editors. Support processes in intimate relationships. Oxford University Press, Oxford, UK. 2010; pp 26–68 https://doi.org/10.1093/acprof:oso/9780195380170.003.0002 .
  • 35. Kaniasty, K. Effectiveness of received social support following a natural disaster . Paper presented at: 39th Conference of the Stress and Anxiety Research Society (STAR); 2016 July 10–13; Lublin, Poland.
  • 51. Preacher K. J. Latent growth curve models. In: Hancock G, Stapleton L, Mueller R. editors. The reviewer’s guide to quantitative methods in the social sciences. Routledge, New York. 2018; p. 178–192.
  • 52. Mueller R. O, Hancock G. R. Structural equation modeling. In: Hancock G, Stapleton L, Mueller R. editors. The reviewer’s guide to quantitative methods in the social sciences. Routledge, New York. 2018; p. 445–456.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Indian J Psychol Med
  • v.43(2); 2021 Mar

A Student’s Guide to the Classification and Operationalization of Variables in the Conceptualization and Design of a Clinical Study: Part 1

Chittaranjan andrade.

1 Dept. of Clinical Psychopharmacology and Neurotoxicology, National Institute of Mental Health and Neurosciences, Bengaluru, Karnataka, India.

Students without prior research experience may not know how to conceptualize and design a study. This article explains how an understanding of the classification and operationalization of variables is the key to the process. Variables describe aspects of the sample that is under study; they are so called because they vary in value from subject to subject in the sample. Variables may be independent or dependent. Independent variables influence the value of other variables; dependent variables are influenced in value by other variables. A hypothesis states an expected relationship between variables. A significant relationship between an independent and dependent variable does not prove cause and effect; the relationship may partly or wholly be explained by one or more confounding variables. Variables need to be operationalized; that is, defined in a way that permits their accurate measurement. These and other concepts are explained with the help of clinically relevant examples.

Key Message:

This article explains the following concepts: Independent variables, dependent variables, confounding variables, operationalization of variables, and construction of hypotheses.

In any body of research, the subject of study requires to be described and understood. For example, if we wish to study predictors of response to antidepressant drugs (ADs) in patients with major depressive disorder (MDD), we might select patient age, sex, age at onset of MDD, number of previous episodes of depression, duration of current depressive episode, presence of psychotic symptoms, past history of response to ADs, and other patient and illness characteristics as potential predictors. These characteristics or descriptors are called variables. Whether or not the patient responds to AD treatment is also a variable. A solid understanding of variables is the cornerstone in the conceptualization and preparation of a research protocol, and in the framing of study hypotheses. This subject is presented in two parts. This article, Part 1, explains what independent and dependent variables are, how an understanding of these is important in framing hypotheses, and what operationalization of a variable entails.

Variables are defined as characteristics of the sample that are examined, measured, described, and interpreted. Variables are so called because they vary in value from subject to subject in the study. As an example, if we wish to examine the relationship between age and height in a sample of children, age and height are the variables of interest; their values vary from child to child. In the earlier example, patients vary in age, sex, duration of current depressive episode, and response to ADs. Variables are classified as dependent and independent variables and are usually analyzed as categorical or continuous variables.

Independent and Dependent Variables

Independent variables are defined as those the values of which influence other variables. For example, age, sex, current smoking, LDL cholesterol level, and blood pressure are independent variables because their values (e.g., greater age, positive for current smoking, and higher LDL cholesterol level) influence the risk of myocardial infarction. Dependent variables are defined as those the values of which are influenced by other variables. For example, the risk of myocardial infarction is a dependent variable the value of which is influenced by variables such as age, sex, current smoking, LDL cholesterol level, and blood pressure. The risk is higher in older persons, in men, in current smokers, and so on.

There may be a cause–effect relationship between independent and dependent variables. For example, consider a clinical trial with treatment (iron supplement vs placebo) as the independent variable and hemoglobin level as the dependent variable. In children with anemia, an iron supplement will raise the hemoglobin level to a greater extent than will placebo; this is a cause–effect relationship because iron is necessary for the synthesis of hemoglobin. However, consider the variables teeth and weight . An alien from outer space who has no knowledge of human physiology may study human children below the age of 5 years and find that, as the number of teeth increases, weight increases. Should the alien conclude that there is a cause–effect relationship here, and that growing teeth causes weight gain? No, because a third variable, age, is a confounding variable 1 – 3 that is responsible for both increase in the number of teeth and increase in weight. In general, therefore, it is more proper to state that independent variables are associated with variations in the values of the dependent variables rather than state that independent variables cause variations in the values of the dependent variables. For causality to be asserted, other criteria must be fulfilled; this is out of the scope of the present article, and interested readers may refer to Schunemann et al. 4

As a side note, here, whether a particular variable is independent or dependent will depend on the question that is being asked. For example, in a study of factors influencing patient satisfaction with outpatient department (OPD) services, patient satisfaction is the dependent variable. But, in a study of factors influencing OPD attendance at a hospital, OPD attendance is the dependent variable, and patient satisfaction is merely one of many possible independent variables that can influence OPD attendance.

Importance of Variables in Stating the Research Objectives

Students must have a clear idea about what they want to study in order to conceptualize and frame a research protocol. The first matters that they need to address are “What are my research questions?” and “What are my hypotheses?” Both questions can be answered only after choosing the dependent variables and then the independent variables for study.

In the case of a student who is interested in studying predictors of AD outcomes in patients with MDD, treatment response is the dependent variable and patient and clinical characteristics are possible independent variables. So, the selection of dependent and independent variables helps defines the objectives of the study:

  • To determine whether sociodemographic variables, such as age and sex, predict the outcome of an episode of depression in MDD patients who are treated with an AD.
  • To determine whether clinical variables, such as age at onset of depression, number of previous depressive episodes, duration of current depressive episode, and the presence of soft neurological signs, predict the outcome of an episode of depression in MDD patients who are treated with an AD.

Note that in a formal research protocol, the student will need to state all the independent variables and not merely list examples. The student may also choose to include additional independent variables, such as baseline biochemical, psychophysiological, and neuroradiological measures.

Importance of Variables in Framing Hypotheses

A hypothesis is a clear statement of what the researcher expects to find in the study. As an example, a researcher may hypothesize that longer duration of current depression is associated with poorer response to ADs. In this hypothesis, the duration of the current episode of depression is the independent variable and treatment response is the dependent variable. It should be obvious, now, that a hypothesis can also be defined as the statement of an expected relationship between an independent and a dependent variable . Or, expressed visually, (independent variable) (arrow) (dependent variable) = hypothesis.

It would be a waste of time and energy to do a study to examine only one question: whether duration of current depression predicts treatment response. So, it is usual for research protocols to include many independent variables and many dependent variables in the generation of many hypotheses, as shown in Table 1 . Pairing each variable in the “independent variable” column with each variable in the “dependent variable” column would result in the generation of these hypotheses. Table 2 shows how this is done for age. Sets of hypotheses can likewise be constructed for the remaining independent and dependent variables in Table 1 . Importantly, the student must select one of these hypotheses as the primary hypothesis; the remaining hypotheses, no matter how many they are, would be secondary hypotheses. It is necessary to have only one hypothesis as the primary hypothesis in order to calculate the sample size necessary for an adequately powered study and to reduce the risk of false positive findings in the analysis. 5 In rare situations, two hypotheses may be considered equally important and may be stated as coprimary hypotheses.

Independent Variables and Dependent Variables in a Study on Sociodemographic and Clinical Prediction of Response of Major Depressive Disorder to Antidepressant Drug Treatment


• Age
• Sex
• Age at onset of major depressive disorder
• Number of past episodes of depression
• Past history of response to antidepressant drugs
• Duration of current depressive episode
• Baseline severity of depression
• Baseline suicidality
• Baseline melancholia
• Baseline psychotic symptoms
• Baseline soft neurological signs

• Severity of depression
• Global severity of illness
• Subjective well-being
• Quality of life
• Everyday functioning

Combinations of Age with Dependent Variables in the Generation of Hypotheses


1. Older age is associated with less attenuation in the severity of depression.
2. Older age is associated with less attenuation in the global severity of illness.
3. Older age is associated with less improvement in subjective well-being.
4. Older age is associated with less improvement in quality of life.
5. Older age is associated with less improvement in everyday functioning.

Operationalization of Variables

In Table 1 , suicidality is listed as an independent variable and severity of depression, as a dependent variable. These variables need to be operationalized; that is, stated in a way that explains how they will be measured. Table 3 presents three ways in which suicidality can be measured and four ways in which (reduction in) the severity of depression can be measured. Now, each way of measurement in the “independent variable” column can be paired with a way of measurement in the “dependent variable” column, making a total of 12 possible hypotheses. In like manner, the many variables listed in Table 1 can each be operationalized in several different ways, resulting in the generation of a very large number of hypotheses. As already stated, the student must select only one hypothesis as the primary hypothesis.

Possible Ways of Operationalization of Suicidality and Depression

Independent Variable: SuicidalityDependent Variable: Severity of Depression
• Item score on the HAM-D
• Item score on the MADRS
• Beck scale for Suicide ideation total score
• MADRS total score
• HAM-D total score
• HAM-D response rate
• HAM-D remission rate

HAM-D: Hamilton Depression Rating Scale, MADRS: Montgomery–Asberg Depression Rating Scale.

Much thought should be given to the operationalization of variables because variables that are carelessly operationalized will be poorly measured; the data collected will then be of poor quality, and the study will yield unreliable results. For example, socioeconomic status may be operationalized as lower, middle, or upper class, depending on the patient’s monthly income, on the total monthly income of the family, or using a validated socioeconomic status assessment scale that takes into consideration income, education, occupation, and place of residence. The student must choose the method that would best suit the needs of the study, and the method that has the greatest scientific acceptability. However, it is also permissible to operationalize the same variable in many different ways and to include all these different operationalizations in the study, as shown in Table 3 . This is because conceptualizing variables in different ways can help understand the subject of the study in different ways.

Operationalization of variables requires a consideration of the reliability and validity of the method of operationalization; discussions on reliability and validity are out of the scope of this article. Operationalization of variables also requires specification of the scale of measurement: nominal, ordinal, interval, or ratio; this is also out of the scope of the present article. Finally, operationalization of variables can also specify details of the measurement procedure. As an example, in a study on the use of metformin to reduce olanzapine-associated weight gain, we may state that we will obtain the weight of the patient but fail to explain how we will do it. Better would be to state that the same weighing scale will be used. Still better would be to state that we will use a weighing instrument that works on the principle of moving weights on a levered arm, and that the same instrument will be used for all patients. And best would be to add that we will weigh patients, dressed in standard hospital gowns, after they have voided their bladder but before they have eaten breakfast. When the way in which a variable will be measured is defined, measurement of that variable becomes more objective and uniform

Concluding Notes

The next article, Part 2, will address what categorical and continuous variables are, why continuous variables should not be converted into categorical variables and when this rule can be broken, and what confounding variables are.

Declaration of Conflicting Interests: The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author received no financial support for the research, authorship, and/or publication of this article.

The University of Chicago The Law School

Handbook menu, 3.9 credit adjustments.

Certain classes and clinics offer variable credit, where credits earned will depend on the nature and amount of coursework/type of assessment. For example, some seminars will award two credits if you take the final exam but three credits if you write a research paper instead, clinics award credits commensurate with the amount of work performed (see the Clinical and Experiential Programs section ), independent research credits may range from one to three credits, etc.

By default, initially, you will be registered for the minimum credits available. If you intend to earn a higher number of credits, make certain that you are aware of the requirements for the additional credit(s), and inform both the instructor and the Office of the Registrar of your intentions via email, prior to the end of the Add/Drop Period/ Last Day to Adjust Enrollment/Credits . You must send an email to [email protected] including the instructor’s approval to adjust your credits by the deadline to adjust enrollment/credits in the Academic Calendar. Each quarter has its own deadline, which is generally the end of the second week of each quarter. After the deadline, credit adjustments will not be approved. The only exception may be with clinic work where the clinic faculty member and the Dean of Students has approved the late credit adjustment. Bear in mind the 14 credit per term limit.

After a grade has been posted for any class or clinic, no credit adjustments are allowed.

IMAGES

  1. SOLUTION: What are examples of variables in research simplyeducate

    variables in research paper example

  2. How to write variables in research paper

    variables in research paper example

  3. Types of variables in scientific research

    variables in research paper example

  4. Types of Research Variable in Research with Example

    variables in research paper example

  5. Types Of Variables In Research Ppt

    variables in research paper example

  6. 27 Types of Variables in Research and Statistics (2024)

    variables in research paper example

VIDEO

  1. How Technology Has Affected Education?

  2. Practical Research 2 Quarter 1 Module 3: Kinds of Variables and Their Uses

  3. Variables in Psychological Research

  4. Research Variables

  5. Types of variables in research|Controlled & extragenous variables|Intervening & moderating variables

  6. Critically Analyzing a Research Paper

COMMENTS

  1. Variables in Research

    Types of Variables in Research. Types of Variables in Research are as follows: Independent Variable. This is the variable that is manipulated by the researcher. It is also known as the predictor variable, as it is used to predict changes in the dependent variable. Examples of independent variables include age, gender, dosage, and treatment type.

  2. Types of Variables in Research & Statistics

    Example (salt tolerance experiment) Independent variables (aka treatment variables) Variables you manipulate in order to affect the outcome of an experiment. The amount of salt added to each plant's water. Dependent variables (aka response variables) Variables that represent the outcome of the experiment.

  3. Examples of Variables in Research: 6 Noteworthy Phenomena

    Introduction. Definition of Variable. Examples of Variables in Research: 6 Phenomena. Phenomenon 1: Climate change. Phenomenon 2: Crime and violence in the streets. Phenomenon 3: Poor performance of students in college entrance exams. Phenomenon 4: Fish kill. Phenomenon 5: Poor crop growth. Phenomenon 6: How Content Goes Viral.

  4. Independent & Dependent Variables (With Examples)

    While the independent variable is the " cause ", the dependent variable is the " effect " - or rather, the affected variable. In other words, the dependent variable is the variable that is assumed to change as a result of a change in the independent variable. Keeping with the previous example, let's look at some dependent variables ...

  5. Independent vs. Dependent Variables

    The independent variable is the cause. Its value is independent of other variables in your study. The dependent variable is the effect. Its value depends on changes in the independent variable. Example: Independent and dependent variables. You design a study to test whether changes in room temperature have an effect on math test scores.

  6. Types of Variables in Research

    Examples. Discrete variables (aka integer variables) Counts of individual items or values. Number of students in a class. Number of different tree species in a forest. Continuous variables (aka ratio variables) Measurements of continuous or non-finite values. Distance.

  7. Organizing Your Social Sciences Research Paper

    Designation of the dependent and independent variable involves unpacking the research problem in a way that identifies a general cause and effect and classifying these variables as either independent or dependent. The variables should be outlined in the introduction of your paper and explained in more detail in the methods section. There are no ...

  8. Variables in Research

    Examples of categorical variables include gender (male, female, other), type of vehicle (car, truck, motorcycle), or marital status (single, married, divorced). These categories help researchers organize data into groups for comparison and analysis. Categorical variables can be further classified into two subtypes: nominal and ordinal.

  9. Variables in Research

    Variables in Research. The definition of a variable in the context of a research study is some feature with the potential to change, typically one that may influence or reflect a relationship or ...

  10. Variables in Research: Breaking Down the ...

    The Role of Variables in Research. In scientific research, variables serve several key functions: Define Relationships: Variables allow researchers to investigate the relationships between different factors and characteristics, providing insights into the underlying mechanisms that drive phenomena and outcomes. Establish Comparisons: By manipulating and comparing variables, scientists can ...

  11. Types of Variables and Commonly Used Statistical Designs

    Suitable statistical design represents a critical factor in permitting inferences from any research or scientific study.[1] Numerous statistical designs are implementable due to the advancement of software available for extensive data analysis.[1] Healthcare providers must possess some statistical knowledge to interpret new studies and provide up-to-date patient care. We present an overview of ...

  12. Types of Variables, Descriptive Statistics, and Sample Size

    A variable is an essential component of any statistical data. It is a feature of a member of a given sample or population, which is unique, and can differ in quantity or quantity from another member of the same sample or population. Variables either are the primary quantities of interest or act as practical substitutes for the same.

  13. Explanatory and Response Variables

    An explanatory variable is what you manipulate or observe changes in (e.g., caffeine dose), while a response variable is what changes as a result (e.g., reaction times). The words "explanatory variable" and "response variable" are often interchangeable with other terms used in research. Cause (what changes)

  14. Variables: Definition, Examples, Types of Variables in Research

    Quantitative Variables. Quantitative variables, also called numeric variables, are those variables that are measured in terms of numbers. A simple example of a quantitative variable is a person's age. Age can take on different values because a person can be 20 years old, 35 years old, and so on.

  15. Research Variables: Types, Uses and Definition of Terms

    The purpose of research is to describe and explain variance in the world, that is, variance that. occurs naturally in the world or chang e that we create due to manipulation. Variables are ...

  16. Types of Variables in Research

    A variable is an attribute of an item of analysis in research. The types of variables in research can be categorized into: independent vs. dependent, or categorical vs. quantitative. The types of variables in research (correlational) can be classified into predictor or outcome variables. Other types of variables in research are confounding ...

  17. A Practical Guide to Writing Quantitative and Qualitative Research

    The answer is written in length in the discussion section of the paper. Thus, the research question gives a preview of the different parts and variables of the study meant to address the problem posed in the research question.1 An excellent research question clarifies the research writing while facilitating understanding of the research topic ...

  18. Quantifying potential selection bias in observational research

    Note that, although some baseline variables had missing data, the proportion of missing data in these variables was small (<15% missing; e.g., of the potential 13,807 G0 mothers, 11,892 had data regarding religious beliefs and behaviors [14% missing]); throughout this paper we assume that these small levels of missing data do not result in ...

  19. Confounding Variables

    Confounding variables (a.k.a. confounders or confounding factors) are a type of extraneous variable that are related to a study's independent and dependent variables. A variable must meet two conditions to be a confounder: It must be correlated with the independent variable. This may be a causal relationship, but it does not have to be.

  20. Impact of COVID-19 on psychological distress in subsequent stages of

    This longitudinal study examined a sample of adult Poles (N = 1245), who were interviewed three times from July 2021 to August 2022, during the later stages of the COVID-19 pandemic. The study had two primary objectives. The first was to assess the impact of the pandemic on psychological distress, measured through symptoms of depression and anxiety. The pandemic's effects were evaluated ...

  21. Importance of Variables in Stating the Research Objectives

    Independent variables are defined as those the values of which influence other variables. For example, age, sex, current smoking, LDL cholesterol level, and blood pressure are independent variables because their values (e.g., greater age, positive for current smoking, and higher LDL cholesterol level) influence the risk of myocardial infarction.

  22. 3.9 Credit Adjustments

    Certain classes and clinics offer variable credit, where credits earned will depend on the nature and amount of coursework/type of assessment. For example, some seminars will award two credits if you take the final exam but three credits if you write a research paper instead, clinics award credits commensurate with the amount of work performed (see the Clinical and Experiential Programs ...

  23. Present but absent in the digital age: Testing a conceptual model of

    This study is the first to explain the determinant factors of phubbing—checking cell phone during a conversation or while spending time with a significant other—and its effect on the relationship satisfaction of both partners. It is also aimed at determining whether gender and relationship length play moderating roles in a relationship. The study adopted the media displacement theory and ...