Null Hypothesis Examples

ThoughtCo / Hilary Allison

  • Scientific Method
  • Chemical Laws
  • Periodic Table
  • Projects & Experiments
  • Biochemistry
  • Physical Chemistry
  • Medical Chemistry
  • Chemistry In Everyday Life
  • Famous Chemists
  • Activities for Kids
  • Abbreviations & Acronyms
  • Weather & Climate
  • Ph.D., Biomedical Sciences, University of Tennessee at Knoxville
  • B.A., Physics and Mathematics, Hastings College

In statistical analysis, the null hypothesis assumes there is no meaningful relationship between two variables. Testing the null hypothesis can tell you whether your results are due to the effect of manipulating ​a dependent variable or due to chance. It's often used in conjunction with an alternative hypothesis, which assumes there is, in fact, a relationship between two variables.

The null hypothesis is among the easiest hypothesis to test using statistical analysis, making it perhaps the most valuable hypothesis for the scientific method. By evaluating a null hypothesis in addition to another hypothesis, researchers can support their conclusions with a higher level of confidence. Below are examples of how you might formulate a null hypothesis to fit certain questions.

What Is the Null Hypothesis?

The null hypothesis states there is no relationship between the measured phenomenon (the dependent variable ) and the independent variable , which is the variable an experimenter typically controls or changes. You do not​ need to believe that the null hypothesis is true to test it. On the contrary, you will likely suspect there is a relationship between a set of variables. One way to prove that this is the case is to reject the null hypothesis. Rejecting a hypothesis does not mean an experiment was "bad" or that it didn't produce results. In fact, it is often one of the first steps toward further inquiry.

To distinguish it from other hypotheses , the null hypothesis is written as ​ H 0  (which is read as “H-nought,” "H-null," or "H-zero"). A significance test is used to determine the likelihood that the results supporting the null hypothesis are not due to chance. A confidence level of 95% or 99% is common. Keep in mind, even if the confidence level is high, there is still a small chance the null hypothesis is not true, perhaps because the experimenter did not account for a critical factor or because of chance. This is one reason why it's important to repeat experiments.

Examples of the Null Hypothesis

To write a null hypothesis, first start by asking a question. Rephrase that question in a form that assumes no relationship between the variables. In other words, assume a treatment has no effect. Write your hypothesis in a way that reflects this.

Are teens better at math than adults? Age has no effect on mathematical ability.
Does taking aspirin every day reduce the chance of having a heart attack? Taking aspirin daily does not affect heart attack risk.
Do teens use cell phones to access the internet more than adults? Age has no effect on how cell phones are used for internet access.
Do cats care about the color of their food? Cats express no food preference based on color.
Does chewing willow bark relieve pain? There is no difference in pain relief after chewing willow bark versus taking a placebo.

Other Types of Hypotheses

In addition to the null hypothesis, the alternative hypothesis is also a staple in traditional significance tests . It's essentially the opposite of the null hypothesis because it assumes the claim in question is true. For the first item in the table above, for example, an alternative hypothesis might be "Age does have an effect on mathematical ability."

Key Takeaways

  • In hypothesis testing, the null hypothesis assumes no relationship between two variables, providing a baseline for statistical analysis.
  • Rejecting the null hypothesis suggests there is evidence of a relationship between variables.
  • By formulating a null hypothesis, researchers can systematically test assumptions and draw more reliable conclusions from their experiments.
  • What Are Examples of a Hypothesis?
  • Random Error vs. Systematic Error
  • Six Steps of the Scientific Method
  • What Is a Hypothesis? (Science)
  • Scientific Method Flow Chart
  • What Are the Elements of a Good Hypothesis?
  • Scientific Method Vocabulary Terms
  • Understanding Simple vs Controlled Experiments
  • The Role of a Controlled Variable in an Experiment
  • What Is an Experimental Constant?
  • What Is a Testable Hypothesis?
  • Scientific Hypothesis Examples
  • What Is the Difference Between a Control Variable and Control Group?
  • DRY MIX Experiment Variables Acronym
  • What Is a Controlled Experiment?
  • Scientific Variable
  • PRO Courses Guides New Tech Help Pro Expert Videos About wikiHow Pro Upgrade Sign In
  • EDIT Edit this Article
  • EXPLORE Tech Help Pro About Us Random Article Quizzes Request a New Article Community Dashboard This Or That Game Happiness Hub Popular Categories Arts and Entertainment Artwork Books Movies Computers and Electronics Computers Phone Skills Technology Hacks Health Men's Health Mental Health Women's Health Relationships Dating Love Relationship Issues Hobbies and Crafts Crafts Drawing Games Education & Communication Communication Skills Personal Development Studying Personal Care and Style Fashion Hair Care Personal Hygiene Youth Personal Care School Stuff Dating All Categories Arts and Entertainment Finance and Business Home and Garden Relationship Quizzes Cars & Other Vehicles Food and Entertaining Personal Care and Style Sports and Fitness Computers and Electronics Health Pets and Animals Travel Education & Communication Hobbies and Crafts Philosophy and Religion Work World Family Life Holidays and Traditions Relationships Youth
  • Browse Articles
  • Learn Something New
  • Quizzes Hot
  • Happiness Hub
  • This Or That Game
  • Train Your Brain
  • Explore More
  • Support wikiHow
  • About wikiHow
  • Log in / Sign up
  • Education and Communications
  • College University and Postgraduate
  • Academic Writing

Writing Null Hypotheses in Research and Statistics

Last Updated: September 2, 2024 Fact Checked

This article was co-authored by Joseph Quinones and by wikiHow staff writer, Jennifer Mueller, JD . Joseph Quinones is a Physics Teacher working at South Bronx Community Charter High School. Joseph specializes in astronomy and astrophysics and is interested in science education and science outreach, currently practicing ways to make physics accessible to more students with the goal of bringing more students of color into the STEM fields. He has experience working on Astrophysics research projects at the Museum of Natural History (AMNH). Joseph recieved his Bachelor's degree in Physics from Lehman College and his Masters in Physics Education from City College of New York (CCNY). He is also a member of a network called New York City Men Teach. There are 7 references cited in this article, which can be found at the bottom of the page. This article has been fact-checked, ensuring the accuracy of any cited facts and confirming the authority of its sources. This article has been viewed 29,922 times.

Are you working on a research project and struggling with how to write a null hypothesis? Well, you've come to the right place! Keep reading to learn everything you need to know about the null hypothesis, including a review of what it is, how it relates to your research question and your alternative hypothesis, as well as how to use it in different types of studies.

Things You Should Know

  • Write a research null hypothesis as a statement that the studied variables have no relationship to each other, or that there's no difference between 2 groups.

{\displaystyle \mu _{1}=\mu _{2}}

  • Adjust the format of your null hypothesis to match the statistical method you used to test it, such as using "mean" if you're comparing the mean between 2 groups.

What is a null hypothesis?

A null hypothesis states that there's no relationship between 2 variables.

  • Research hypothesis: States in plain language that there's no relationship between the 2 variables or there's no difference between the 2 groups being studied.
  • Statistical hypothesis: States the predicted outcome of statistical analysis through a mathematical equation related to the statistical method you're using.

Examples of Null Hypotheses

Step 1 Research question:

Null Hypothesis vs. Alternative Hypothesis

Step 1 Null hypotheses and alternative hypotheses are mutually exclusive.

  • For example, your alternative hypothesis could state a positive correlation between 2 variables while your null hypothesis states there's no relationship. If there's a negative correlation, then both hypotheses are false.

Step 2 Proving the null hypothesis false is a precursor to proving the alternative.

  • You need additional data or evidence to show that your alternative hypothesis is correct—proving the null hypothesis false is just the first step.
  • In smaller studies, sometimes it's enough to show that there's some relationship and your hypothesis could be correct—you can leave the additional proof as an open question for other researchers to tackle.

How do I test a null hypothesis?

Use statistical methods on collected data to test the null hypothesis.

  • Group means: Compare the mean of the variable in your sample with the mean of the variable in the general population. [6] X Research source
  • Group proportions: Compare the proportion of the variable in your sample with the proportion of the variable in the general population. [7] X Research source
  • Correlation: Correlation analysis looks at the relationship between 2 variables—specifically, whether they tend to happen together. [8] X Research source
  • Regression: Regression analysis reveals the correlation between 2 variables while also controlling for the effect of other, interrelated variables. [9] X Research source

Templates for Null Hypotheses

Step 1 Group means

  • Research null hypothesis: There is no difference in the mean [dependent variable] between [group 1] and [group 2].

{\displaystyle \mu _{1}+\mu _{2}=0}

  • Research null hypothesis: The proportion of [dependent variable] in [group 1] and [group 2] is the same.

{\displaystyle p_{1}=p_{2}}

  • Research null hypothesis: There is no correlation between [independent variable] and [dependent variable] in the population.

\rho =0

  • Research null hypothesis: There is no relationship between [independent variable] and [dependent variable] in the population.

{\displaystyle \beta =0}

Expert Q&A

Joseph Quinones

You Might Also Like

Write an Essay

Expert Interview

how to formulate null hypothesis in research

Thanks for reading our article! If you’d like to learn more about physics, check out our in-depth interview with Joseph Quinones .

  • ↑ https://online.stat.psu.edu/stat100/lesson/10/10.1
  • ↑ https://online.stat.psu.edu/stat501/lesson/2/2.12
  • ↑ https://support.minitab.com/en-us/minitab/21/help-and-how-to/statistics/basic-statistics/supporting-topics/basics/null-and-alternative-hypotheses/
  • ↑ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5635437/
  • ↑ https://online.stat.psu.edu/statprogram/reviews/statistical-concepts/hypothesis-testing
  • ↑ https://education.arcus.chop.edu/null-hypothesis-testing/
  • ↑ https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_hypothesistest-means-proportions/bs704_hypothesistest-means-proportions_print.html

About This Article

Joseph Quinones

  • Send fan mail to authors

Reader Success Stories

Mogens Get

Dec 3, 2022

Did this article help you?

how to formulate null hypothesis in research

Featured Articles

Enjoy Your Preteen Years

Trending Articles

Pirate Name Generator

Watch Articles

Make Fluffy Pancakes

  • Terms of Use
  • Privacy Policy
  • Do Not Sell or Share My Info
  • Not Selling Info

Don’t miss out! Sign up for

wikiHow’s newsletter

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Null Hypothesis: Definition, Rejecting & Examples

By Jim Frost 6 Comments

What is a Null Hypothesis?

The null hypothesis in statistics states that there is no difference between groups or no relationship between variables. It is one of two mutually exclusive hypotheses about a population in a hypothesis test.

Photograph of Rodin's statue, The Thinker who is pondering the null hypothesis.

  • Null Hypothesis H 0 : No effect exists in the population.
  • Alternative Hypothesis H A : The effect exists in the population.

In every study or experiment, researchers assess an effect or relationship. This effect can be the effectiveness of a new drug, building material, or other intervention that has benefits. There is a benefit or connection that the researchers hope to identify. Unfortunately, no effect may exist. In statistics, we call this lack of an effect the null hypothesis. Researchers assume that this notion of no effect is correct until they have enough evidence to suggest otherwise, similar to how a trial presumes innocence.

In this context, the analysts don’t necessarily believe the null hypothesis is correct. In fact, they typically want to reject it because that leads to more exciting finds about an effect or relationship. The new vaccine works!

You can think of it as the default theory that requires sufficiently strong evidence to reject. Like a prosecutor, researchers must collect sufficient evidence to overturn the presumption of no effect. Investigators must work hard to set up a study and a data collection system to obtain evidence that can reject the null hypothesis.

Related post : What is an Effect in Statistics?

Null Hypothesis Examples

Null hypotheses start as research questions that the investigator rephrases as a statement indicating there is no effect or relationship.

Does the vaccine prevent infections? The vaccine does not affect the infection rate.
Does the new additive increase product strength? The additive does not affect mean product strength.
Does the exercise intervention increase bone mineral density? The intervention does not affect bone mineral density.
As screen time increases, does test performance decrease? There is no relationship between screen time and test performance.

After reading these examples, you might think they’re a bit boring and pointless. However, the key is to remember that the null hypothesis defines the condition that the researchers need to discredit before suggesting an effect exists.

Let’s see how you reject the null hypothesis and get to those more exciting findings!

When to Reject the Null Hypothesis

So, you want to reject the null hypothesis, but how and when can you do that? To start, you’ll need to perform a statistical test on your data. The following is an overview of performing a study that uses a hypothesis test.

The first step is to devise a research question and the appropriate null hypothesis. After that, the investigators need to formulate an experimental design and data collection procedures that will allow them to gather data that can answer the research question. Then they collect the data. For more information about designing a scientific study that uses statistics, read my post 5 Steps for Conducting Studies with Statistics .

After data collection is complete, statistics and hypothesis testing enter the picture. Hypothesis testing takes your sample data and evaluates how consistent they are with the null hypothesis. The p-value is a crucial part of the statistical results because it quantifies how strongly the sample data contradict the null hypothesis.

When the sample data provide sufficient evidence, you can reject the null hypothesis. In a hypothesis test, this process involves comparing the p-value to your significance level .

Rejecting the Null Hypothesis

Reject the null hypothesis when the p-value is less than or equal to your significance level. Your sample data favor the alternative hypothesis, which suggests that the effect exists in the population. For a mnemonic device, remember—when the p-value is low, the null must go!

When you can reject the null hypothesis, your results are statistically significant. Learn more about Statistical Significance: Definition & Meaning .

Failing to Reject the Null Hypothesis

Conversely, when the p-value is greater than your significance level, you fail to reject the null hypothesis. The sample data provides insufficient data to conclude that the effect exists in the population. When the p-value is high, the null must fly!

Note that failing to reject the null is not the same as proving it. For more information about the difference, read my post about Failing to Reject the Null .

That’s a very general look at the process. But I hope you can see how the path to more exciting findings depends on being able to rule out the less exciting null hypothesis that states there’s nothing to see here!

Let’s move on to learning how to write the null hypothesis for different types of effects, relationships, and tests.

Related posts : How Hypothesis Tests Work and Interpreting P-values

How to Write a Null Hypothesis

The null hypothesis varies by the type of statistic and hypothesis test. Remember that inferential statistics use samples to draw conclusions about populations. Consequently, when you write a null hypothesis, it must make a claim about the relevant population parameter . Further, that claim usually indicates that the effect does not exist in the population. Below are typical examples of writing a null hypothesis for various parameters and hypothesis tests.

Related posts : Descriptive vs. Inferential Statistics and Populations, Parameters, and Samples in Inferential Statistics

Group Means

T-tests and ANOVA assess the differences between group means. For these tests, the null hypothesis states that there is no difference between group means in the population. In other words, the experimental conditions that define the groups do not affect the mean outcome. Mu (µ) is the population parameter for the mean, and you’ll need to include it in the statement for this type of study.

For example, an experiment compares the mean bone density changes for a new osteoporosis medication. The control group does not receive the medicine, while the treatment group does. The null states that the mean bone density changes for the control and treatment groups are equal.

  • Null Hypothesis H 0 : Group means are equal in the population: µ 1 = µ 2 , or µ 1 – µ 2 = 0
  • Alternative Hypothesis H A : Group means are not equal in the population: µ 1 ≠ µ 2 , or µ 1 – µ 2 ≠ 0.

Group Proportions

Proportions tests assess the differences between group proportions. For these tests, the null hypothesis states that there is no difference between group proportions. Again, the experimental conditions did not affect the proportion of events in the groups. P is the population proportion parameter that you’ll need to include.

For example, a vaccine experiment compares the infection rate in the treatment group to the control group. The treatment group receives the vaccine, while the control group does not. The null states that the infection rates for the control and treatment groups are equal.

  • Null Hypothesis H 0 : Group proportions are equal in the population: p 1 = p 2 .
  • Alternative Hypothesis H A : Group proportions are not equal in the population: p 1 ≠ p 2 .

Correlation and Regression Coefficients

Some studies assess the relationship between two continuous variables rather than differences between groups.

In these studies, analysts often use either correlation or regression analysis . For these tests, the null states that there is no relationship between the variables. Specifically, it says that the correlation or regression coefficient is zero. As one variable increases, there is no tendency for the other variable to increase or decrease. Rho (ρ) is the population correlation parameter and beta (β) is the regression coefficient parameter.

For example, a study assesses the relationship between screen time and test performance. The null states that there is no correlation between this pair of variables. As screen time increases, test performance does not tend to increase or decrease.

  • Null Hypothesis H 0 : The correlation in the population is zero: ρ = 0.
  • Alternative Hypothesis H A : The correlation in the population is not zero: ρ ≠ 0.

For all these cases, the analysts define the hypotheses before the study. After collecting the data, they perform a hypothesis test to determine whether they can reject the null hypothesis.

The preceding examples are all for two-tailed hypothesis tests. To learn about one-tailed tests and how to write a null hypothesis for them, read my post One-Tailed vs. Two-Tailed Tests .

Related post : Understanding Correlation

Neyman, J; Pearson, E. S. (January 1, 1933).  On the Problem of the most Efficient Tests of Statistical Hypotheses .  Philosophical Transactions of the Royal Society A .  231  (694–706): 289–337.

Share this:

how to formulate null hypothesis in research

Reader Interactions

' src=

January 11, 2024 at 2:57 pm

Thanks for the reply.

January 10, 2024 at 1:23 pm

Hi Jim, In your comment you state that equivalence test null and alternate hypotheses are reversed. For hypothesis tests of data fits to a probability distribution, the null hypothesis is that the probability distribution fits the data. Is this correct?

' src=

January 10, 2024 at 2:15 pm

Those two separate things, equivalence testing and normality tests. But, yes, you’re correct for both.

Hypotheses are switched for equivalence testing. You need to “work” (i.e., collect a large sample of good quality data) to be able to reject the null that the groups are different to be able to conclude they’re the same.

With typical hypothesis tests, if you have low quality data and a low sample size, you’ll fail to reject the null that they’re the same, concluding they’re equivalent. But that’s more a statement about the low quality and small sample size than anything to do with the groups being equal.

So, equivalence testing make you work to obtain a finding that the groups are the same (at least within some amount you define as a trivial difference).

For normality testing, and other distribution tests, the null states that the data follow the distribution (normal or whatever). If you reject the null, you have sufficient evidence to conclude that your sample data don’t follow the probability distribution. That’s a rare case where you hope to fail to reject the null. And it suffers from the problem I describe above where you might fail to reject the null simply because you have a small sample size. In that case, you’d conclude the data follow the probability distribution but it’s more that you don’t have enough data for the test to register the deviation. In this scenario, if you had a larger sample size, you’d reject the null and conclude it doesn’t follow that distribution.

I don’t know of any equivalence testing type approach for distribution fit tests where you’d need to work to show the data follow a distribution, although I haven’t looked for one either!

' src=

February 20, 2022 at 9:26 pm

Is a null hypothesis regularly (always) stated in the negative? “there is no” or “does not”

February 23, 2022 at 9:21 pm

Typically, the null hypothesis includes an equal sign. The null hypothesis states that the population parameter equals a particular value. That value is usually one that represents no effect. In the case of a one-sided hypothesis test, the null still contains an equal sign but it’s “greater than or equal to” or “less than or equal to.” If you wanted to translate the null hypothesis from its native mathematical expression, you could use the expression “there is no effect.” But the mathematical form more specifically states what it’s testing.

It’s the alternative hypothesis that typically contains does not equal.

There are some exceptions. For example, in an equivalence test where the researchers want to show that two things are equal, the null hypothesis states that they’re not equal.

In short, the null hypothesis states the condition that the researchers hope to reject. They need to work hard to set up an experiment and data collection that’ll gather enough evidence to be able to reject the null condition.

' src=

February 15, 2022 at 9:32 am

Dear sir I always read your notes on Research methods.. Kindly tell is there any available Book on all these..wonderfull Urgent

Comments and Questions Cancel reply

Null Hypothesis Definition and Examples, How to State

What is the null hypothesis, how to state the null hypothesis, null hypothesis overview.

how to formulate null hypothesis in research

Why is it Called the “Null”?

The word “null” in this context means that it’s a commonly accepted fact that researchers work to nullify . It doesn’t mean that the statement is null (i.e. amounts to nothing) itself! (Perhaps the term should be called the “nullifiable hypothesis” as that might cause less confusion).

Why Do I need to Test it? Why not just prove an alternate one?

The short answer is, as a scientist, you are required to ; It’s part of the scientific process. Science uses a battery of processes to prove or disprove theories, making sure than any new hypothesis has no flaws. Including both a null and an alternate hypothesis is one safeguard to ensure your research isn’t flawed. Not including the null hypothesis in your research is considered very bad practice by the scientific community. If you set out to prove an alternate hypothesis without considering it, you are likely setting yourself up for failure. At a minimum, your experiment will likely not be taken seriously.

null hypothesis

  • Null hypothesis : H 0 : The world is flat.
  • Alternate hypothesis: The world is round.

Several scientists, including Copernicus , set out to disprove the null hypothesis. This eventually led to the rejection of the null and the acceptance of the alternate. Most people accepted it — the ones that didn’t created the Flat Earth Society !. What would have happened if Copernicus had not disproved the it and merely proved the alternate? No one would have listened to him. In order to change people’s thinking, he first had to prove that their thinking was wrong .

How to State the Null Hypothesis from a Word Problem

You’ll be asked to convert a word problem into a hypothesis statement in statistics that will include a null hypothesis and an alternate hypothesis . Breaking your problem into a few small steps makes these problems much easier to handle.

how to state the null hypothesis

Step 2: Convert the hypothesis to math . Remember that the average is sometimes written as μ.

H 1 : μ > 8.2

Broken down into (somewhat) English, that’s H 1 (The hypothesis): μ (the average) > (is greater than) 8.2

Step 3: State what will happen if the hypothesis doesn’t come true. If the recovery time isn’t greater than 8.2 weeks, there are only two possibilities, that the recovery time is equal to 8.2 weeks or less than 8.2 weeks.

H 0 : μ ≤ 8.2

Broken down again into English, that’s H 0 (The null hypothesis): μ (the average) ≤ (is less than or equal to) 8.2

How to State the Null Hypothesis: Part Two

But what if the researcher doesn’t have any idea what will happen.

Example Problem: A researcher is studying the effects of radical exercise program on knee surgery patients. There is a good chance the therapy will improve recovery time, but there’s also the possibility it will make it worse. Average recovery times for knee surgery patients is 8.2 weeks. 

Step 1: State what will happen if the experiment doesn’t make any difference. That’s the null hypothesis–that nothing will happen. In this experiment, if nothing happens, then the recovery time will stay at 8.2 weeks.

H 0 : μ = 8.2

Broken down into English, that’s H 0 (The null hypothesis): μ (the average) = (is equal to) 8.2

Step 2: Figure out the alternate hypothesis . The alternate hypothesis is the opposite of the null hypothesis. In other words, what happens if our experiment makes a difference?

H 1 : μ ≠ 8.2

In English again, that’s H 1 (The  alternate hypothesis): μ (the average) ≠ (is not equal to) 8.2

That’s How to State the Null Hypothesis!

Check out our Youtube channel for more stats tips!

Gonick, L. (1993). The Cartoon Guide to Statistics . HarperPerennial. Kotz, S.; et al., eds. (2006), Encyclopedia of Statistical Sciences , Wiley.

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

5.2 - writing hypotheses.

The first step in conducting a hypothesis test is to write the hypothesis statements that are going to be tested. For each test you will have a null hypothesis (\(H_0\)) and an alternative hypothesis (\(H_a\)).

When writing hypotheses there are three things that we need to know: (1) the parameter that we are testing (2) the direction of the test (non-directional, right-tailed or left-tailed), and (3) the value of the hypothesized parameter.

  • At this point we can write hypotheses for a single mean (\(\mu\)), paired means(\(\mu_d\)), a single proportion (\(p\)), the difference between two independent means (\(\mu_1-\mu_2\)), the difference between two proportions (\(p_1-p_2\)), a simple linear regression slope (\(\beta\)), and a correlation (\(\rho\)). 
  • The research question will give us the information necessary to determine if the test is two-tailed (e.g., "different from," "not equal to"), right-tailed (e.g., "greater than," "more than"), or left-tailed (e.g., "less than," "fewer than").
  • The research question will also give us the hypothesized parameter value. This is the number that goes in the hypothesis statements (i.e., \(\mu_0\) and \(p_0\)). For the difference between two groups, regression, and correlation, this value is typically 0.

Hypotheses are always written in terms of population parameters (e.g., \(p\) and \(\mu\)).  The tables below display all of the possible hypotheses for the parameters that we have learned thus far. Note that the null hypothesis always includes the equality (i.e., =).

One Group Mean
Research Question Is the population mean different from \( \mu_{0} \)? Is the population mean greater than \(\mu_{0}\)? Is the population mean less than \(\mu_{0}\)?
Null Hypothesis, \(H_{0}\) \(\mu=\mu_{0} \) \(\mu=\mu_{0} \) \(\mu=\mu_{0} \)
Alternative Hypothesis, \(H_{a}\) \(\mu\neq \mu_{0} \) \(\mu> \mu_{0} \) \(\mu<\mu_{0} \)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional
Paired Means
Research Question Is there a difference in the population? Is there a mean increase in the population? Is there a mean decrease in the population?
Null Hypothesis, \(H_{0}\) \(\mu_d=0 \) \(\mu_d =0 \) \(\mu_d=0 \)
Alternative Hypothesis, \(H_{a}\) \(\mu_d \neq 0 \) \(\mu_d> 0 \) \(\mu_d<0 \)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional
One Group Proportion
Research Question Is the population proportion different from \(p_0\)? Is the population proportion greater than \(p_0\)? Is the population proportion less than \(p_0\)?
Null Hypothesis, \(H_{0}\) \(p=p_0\) \(p= p_0\) \(p= p_0\)
Alternative Hypothesis, \(H_{a}\) \(p\neq p_0\) \(p> p_0\) \(p< p_0\)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional
Difference between Two Independent Means
Research Question Are the population means different? Is the population mean in group 1 greater than the population mean in group 2? Is the population mean in group 1 less than the population mean in groups 2?
Null Hypothesis, \(H_{0}\) \(\mu_1=\mu_2\) \(\mu_1 = \mu_2 \) \(\mu_1 = \mu_2 \)
Alternative Hypothesis, \(H_{a}\) \(\mu_1 \ne \mu_2 \) \(\mu_1 \gt \mu_2 \) \(\mu_1 \lt \mu_2\)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional
Difference between Two Proportions
Research Question Are the population proportions different? Is the population proportion in group 1 greater than the population proportion in groups 2? Is the population proportion in group 1 less than the population proportion in group 2?
Null Hypothesis, \(H_{0}\) \(p_1 = p_2 \) \(p_1 = p_2 \) \(p_1 = p_2 \)
Alternative Hypothesis, \(H_{a}\) \(p_1 \ne p_2\) \(p_1 \gt p_2 \) \(p_1 \lt p_2\)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional
Simple Linear Regression: Slope
Research Question Is the slope in the population different from 0? Is the slope in the population positive? Is the slope in the population negative?
Null Hypothesis, \(H_{0}\) \(\beta =0\) \(\beta= 0\) \(\beta = 0\)
Alternative Hypothesis, \(H_{a}\) \(\beta\neq 0\) \(\beta> 0\) \(\beta< 0\)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional
Correlation (Pearson's )
Research Question Is the correlation in the population different from 0? Is the correlation in the population positive? Is the correlation in the population negative?
Null Hypothesis, \(H_{0}\) \(\rho=0\) \(\rho= 0\) \(\rho = 0\)
Alternative Hypothesis, \(H_{a}\) \(\rho \neq 0\) \(\rho > 0\) \(\rho< 0\)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

  • State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a  or H 1 ).
  • Collect data in a way designed to test the hypothesis.
  • Perform an appropriate statistical test .
  • Decide whether to reject or fail to reject your null hypothesis.
  • Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Table of contents

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

  • H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Prevent plagiarism. Run a free check.

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

  • an estimate of the difference in average height between the two groups.
  • a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved September 9, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

What is The Null Hypothesis & When Do You Reject The Null Hypothesis

Julia Simkus

Editor at Simply Psychology

BA (Hons) Psychology, Princeton University

Julia Simkus is a graduate of Princeton University with a Bachelor of Arts in Psychology. She is currently studying for a Master's Degree in Counseling for Mental Health and Wellness in September 2023. Julia's research has been published in peer reviewed journals.

Learn about our Editorial Process

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

A null hypothesis is a statistical concept suggesting no significant difference or relationship between measured variables. It’s the default assumption unless empirical evidence proves otherwise.

The null hypothesis states no relationship exists between the two variables being studied (i.e., one variable does not affect the other).

The null hypothesis is the statement that a researcher or an investigator wants to disprove.

Testing the null hypothesis can tell you whether your results are due to the effects of manipulating ​ the dependent variable or due to random chance. 

How to Write a Null Hypothesis

Null hypotheses (H0) start as research questions that the investigator rephrases as statements indicating no effect or relationship between the independent and dependent variables.

It is a default position that your research aims to challenge or confirm.

For example, if studying the impact of exercise on weight loss, your null hypothesis might be:

There is no significant difference in weight loss between individuals who exercise daily and those who do not.

Examples of Null Hypotheses

Research QuestionNull Hypothesis
Do teenagers use cell phones more than adults?Teenagers and adults use cell phones the same amount.
Do tomato plants exhibit a higher rate of growth when planted in compost rather than in soil?Tomato plants show no difference in growth rates when planted in compost rather than soil.
Does daily meditation decrease the incidence of depression?Daily meditation does not decrease the incidence of depression.
Does daily exercise increase test performance?There is no relationship between daily exercise time and test performance.
Does the new vaccine prevent infections?The vaccine does not affect the infection rate.
Does flossing your teeth affect the number of cavities?Flossing your teeth has no effect on the number of cavities.

When Do We Reject The Null Hypothesis? 

We reject the null hypothesis when the data provide strong enough evidence to conclude that it is likely incorrect. This often occurs when the p-value (probability of observing the data given the null hypothesis is true) is below a predetermined significance level.

If the collected data does not meet the expectation of the null hypothesis, a researcher can conclude that the data lacks sufficient evidence to back up the null hypothesis, and thus the null hypothesis is rejected. 

Rejecting the null hypothesis means that a relationship does exist between a set of variables and the effect is statistically significant ( p > 0.05).

If the data collected from the random sample is not statistically significance , then the null hypothesis will be accepted, and the researchers can conclude that there is no relationship between the variables. 

You need to perform a statistical test on your data in order to evaluate how consistent it is with the null hypothesis. A p-value is one statistical measurement used to validate a hypothesis against observed data.

Calculating the p-value is a critical part of null-hypothesis significance testing because it quantifies how strongly the sample data contradicts the null hypothesis.

The level of statistical significance is often expressed as a  p  -value between 0 and 1. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.

Probability and statistical significance in ab testing. Statistical significance in a b experiments

Usually, a researcher uses a confidence level of 95% or 99% (p-value of 0.05 or 0.01) as general guidelines to decide if you should reject or keep the null.

When your p-value is less than or equal to your significance level, you reject the null hypothesis.

In other words, smaller p-values are taken as stronger evidence against the null hypothesis. Conversely, when the p-value is greater than your significance level, you fail to reject the null hypothesis.

In this case, the sample data provides insufficient data to conclude that the effect exists in the population.

Because you can never know with complete certainty whether there is an effect in the population, your inferences about a population will sometimes be incorrect.

When you incorrectly reject the null hypothesis, it’s called a type I error. When you incorrectly fail to reject it, it’s called a type II error.

Why Do We Never Accept The Null Hypothesis?

The reason we do not say “accept the null” is because we are always assuming the null hypothesis is true and then conducting a study to see if there is evidence against it. And, even if we don’t find evidence against it, a null hypothesis is not accepted.

A lack of evidence only means that you haven’t proven that something exists. It does not prove that something doesn’t exist. 

It is risky to conclude that the null hypothesis is true merely because we did not find evidence to reject it. It is always possible that researchers elsewhere have disproved the null hypothesis, so we cannot accept it as true, but instead, we state that we failed to reject the null. 

One can either reject the null hypothesis, or fail to reject it, but can never accept it.

Why Do We Use The Null Hypothesis?

We can never prove with 100% certainty that a hypothesis is true; We can only collect evidence that supports a theory. However, testing a hypothesis can set the stage for rejecting or accepting this hypothesis within a certain confidence level.

The null hypothesis is useful because it can tell us whether the results of our study are due to random chance or the manipulation of a variable (with a certain level of confidence).

A null hypothesis is rejected if the measured data is significantly unlikely to have occurred and a null hypothesis is accepted if the observed outcome is consistent with the position held by the null hypothesis.

Rejecting the null hypothesis sets the stage for further experimentation to see if a relationship between two variables exists. 

Hypothesis testing is a critical part of the scientific method as it helps decide whether the results of a research study support a particular theory about a given population. Hypothesis testing is a systematic way of backing up researchers’ predictions with statistical analysis.

It helps provide sufficient statistical evidence that either favors or rejects a certain hypothesis about the population parameter. 

Purpose of a Null Hypothesis 

  • The primary purpose of the null hypothesis is to disprove an assumption. 
  • Whether rejected or accepted, the null hypothesis can help further progress a theory in many scientific cases.
  • A null hypothesis can be used to ascertain how consistent the outcomes of multiple studies are.

Do you always need both a Null Hypothesis and an Alternative Hypothesis?

The null (H0) and alternative (Ha or H1) hypotheses are two competing claims that describe the effect of the independent variable on the dependent variable. They are mutually exclusive, which means that only one of the two hypotheses can be true. 

While the null hypothesis states that there is no effect in the population, an alternative hypothesis states that there is statistical significance between two variables. 

The goal of hypothesis testing is to make inferences about a population based on a sample. In order to undertake hypothesis testing, you must express your research hypothesis as a null and alternative hypothesis. Both hypotheses are required to cover every possible outcome of the study. 

What is the difference between a null hypothesis and an alternative hypothesis?

The alternative hypothesis is the complement to the null hypothesis. The null hypothesis states that there is no effect or no relationship between variables, while the alternative hypothesis claims that there is an effect or relationship in the population.

It is the claim that you expect or hope will be true. The null hypothesis and the alternative hypothesis are always mutually exclusive, meaning that only one can be true at a time.

What are some problems with the null hypothesis?

One major problem with the null hypothesis is that researchers typically will assume that accepting the null is a failure of the experiment. However, accepting or rejecting any hypothesis is a positive result. Even if the null is not refuted, the researchers will still learn something new.

Why can a null hypothesis not be accepted?

We can either reject or fail to reject a null hypothesis, but never accept it. If your test fails to detect an effect, this is not proof that the effect doesn’t exist. It just means that your sample did not have enough evidence to conclude that it exists.

We can’t accept a null hypothesis because a lack of evidence does not prove something that does not exist. Instead, we fail to reject it.

Failing to reject the null indicates that the sample did not provide sufficient enough evidence to conclude that an effect exists.

If the p-value is greater than the significance level, then you fail to reject the null hypothesis.

Is a null hypothesis directional or non-directional?

A hypothesis test can either contain an alternative directional hypothesis or a non-directional alternative hypothesis. A directional hypothesis is one that contains the less than (“<“) or greater than (“>”) sign.

A nondirectional hypothesis contains the not equal sign (“≠”).  However, a null hypothesis is neither directional nor non-directional.

A null hypothesis is a prediction that there will be no change, relationship, or difference between two variables.

The directional hypothesis or nondirectional hypothesis would then be considered alternative hypotheses to the null hypothesis.

Gill, J. (1999). The insignificance of null hypothesis significance testing.  Political research quarterly ,  52 (3), 647-674.

Krueger, J. (2001). Null hypothesis significance testing: On the survival of a flawed method.  American Psychologist ,  56 (1), 16.

Masson, M. E. (2011). A tutorial on a practical Bayesian alternative to null-hypothesis significance testing.  Behavior research methods ,  43 , 679-690.

Nickerson, R. S. (2000). Null hypothesis significance testing: a review of an old and continuing controversy.  Psychological methods ,  5 (2), 241.

Rozeboom, W. W. (1960). The fallacy of the null-hypothesis significance test.  Psychological bulletin ,  57 (5), 416.

Print Friendly, PDF & Email

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • How to Write a Strong Hypothesis | Guide & Examples

How to Write a Strong Hypothesis | Guide & Examples

Published on 6 May 2022 by Shona McCombes .

A hypothesis is a statement that can be tested by scientific research. If you want to test a relationship between two or more variables, you need to write hypotheses before you start your experiment or data collection.

Table of contents

What is a hypothesis, developing a hypothesis (with example), hypothesis examples, frequently asked questions about writing hypotheses.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess – it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations, and statistical analysis of data).

Variables in hypotheses

Hypotheses propose a relationship between two or more variables . An independent variable is something the researcher changes or controls. A dependent variable is something the researcher observes and measures.

In this example, the independent variable is exposure to the sun – the assumed cause . The dependent variable is the level of happiness – the assumed effect .

Prevent plagiarism, run a free check.

Step 1: ask a question.

Writing a hypothesis begins with a research question that you want to answer. The question should be focused, specific, and researchable within the constraints of your project.

Step 2: Do some preliminary research

Your initial answer to the question should be based on what is already known about the topic. Look for theories and previous studies to help you form educated assumptions about what your research will find.

At this stage, you might construct a conceptual framework to identify which variables you will study and what you think the relationships are between them. Sometimes, you’ll have to operationalise more complex constructs.

Step 3: Formulate your hypothesis

Now you should have some idea of what you expect to find. Write your initial answer to the question in a clear, concise sentence.

Step 4: Refine your hypothesis

You need to make sure your hypothesis is specific and testable. There are various ways of phrasing a hypothesis, but all the terms you use should have clear definitions, and the hypothesis should contain:

  • The relevant variables
  • The specific group being studied
  • The predicted outcome of the experiment or analysis

Step 5: Phrase your hypothesis in three ways

To identify the variables, you can write a simple prediction in if … then form. The first part of the sentence states the independent variable and the second part states the dependent variable.

In academic research, hypotheses are more commonly phrased in terms of correlations or effects, where you directly state the predicted relationship between variables.

If you are comparing two groups, the hypothesis can state what difference you expect to find between them.

Step 6. Write a null hypothesis

If your research involves statistical hypothesis testing , you will also have to write a null hypothesis. The null hypothesis is the default position that there is no association between the variables. The null hypothesis is written as H 0 , while the alternative hypothesis is H 1 or H a .

Research question Hypothesis Null hypothesis
What are the health benefits of eating an apple a day? Increasing apple consumption in over-60s will result in decreasing frequency of doctor’s visits. Increasing apple consumption in over-60s will have no effect on frequency of doctor’s visits.
Which airlines have the most delays? Low-cost airlines are more likely to have delays than premium airlines. Low-cost and premium airlines are equally likely to have delays.
Can flexible work arrangements improve job satisfaction? Employees who have flexible working hours will report greater job satisfaction than employees who work fixed hours. There is no relationship between working hour flexibility and job satisfaction.
How effective is secondary school sex education at reducing teen pregnancies? Teenagers who received sex education lessons throughout secondary school will have lower rates of unplanned pregnancy than teenagers who did not receive any sex education. Secondary school sex education has no effect on teen pregnancy rates.
What effect does daily use of social media have on the attention span of under-16s? There is a negative correlation between time spent on social media and attention span in under-16s. There is no relationship between social media use and attention span in under-16s.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis is not just a guess. It should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations, and statistical analysis of data).

A research hypothesis is your proposed answer to your research question. The research hypothesis usually includes an explanation (‘ x affects y because …’).

A statistical hypothesis, on the other hand, is a mathematical statement about a population parameter. Statistical hypotheses always come in pairs: the null and alternative hypotheses. In a well-designed study , the statistical hypotheses correspond logically to the research hypothesis.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

McCombes, S. (2022, May 06). How to Write a Strong Hypothesis | Guide & Examples. Scribbr. Retrieved 9 September 2024, from https://www.scribbr.co.uk/research-methods/hypothesis-writing/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, operationalisation | a guide with examples, pros & cons, what is a conceptual framework | tips & examples, a quick guide to experimental design | 5 steps & examples.

  • Thesis Action Plan New
  • Academic Project Planner

Literature Navigator

Thesis dialogue blueprint, writing wizard's template, research proposal compass.

  • Why students love us
  • Rebels Blog
  • Why we are different
  • All Products
  • Coming Soon

Formulating a Null Hypothesis: The Foundation of Your Research

Formulating a Null Hypothesis: The Foundation of Your Research

The concept of the null hypothesis is a central tenet in statistical testing and research design. It is the hypothesis that there is no significant difference or effect in a given set of observations, serving as a starting point for testing and a benchmark against which alternative hypotheses are measured. Understanding how to formulate and test a null hypothesis is crucial for researchers to draw meaningful conclusions from their data. This article delves into the intricacies of the null hypothesis, providing insights into its definition, contrast with alternative hypotheses, and its pivotal role in research.

Key Takeaways

  • The null hypothesis is a foundational element of statistical testing, positing no effect or difference, against which research findings are compared.
  • Formulating a null hypothesis requires a clear understanding of the research question and a precise statement that can be empirically tested.
  • Testing the null hypothesis involves collecting data and using statistical methods to determine whether to reject or fail to reject the null hypothesis.

Understanding the Null Hypothesis in Research

Defining the null hypothesis and its role in statistical testing.

When embarking on the journey of research, you'll encounter the cornerstone of statistical testing: the null hypothesis. It is a statement positing the absence of a specific effect or difference. In essence, it serves as a default position that reflects no change or relationship. The null hypothesis is crucial because it provides a starting point for testing and is the hypothesis that is directly tested and potentially rejected.

Contrasting the Null and Alternative Hypotheses

In research, there are two types of hypotheses: null and alternative. They work as a complementary pair, with the null hypothesis suggesting no significant difference or effect, and the alternative hypothesis positing the contrary. Understanding this dichotomy is essential for interpreting the results of your statistical tests accurately.

The Importance of the Null Hypothesis in Research Design

The null hypothesis is not merely a formality in research design; it is a fundamental component that guides the entire study. It helps in determining the appropriate statistical tests and informs the interpretation of results. When you know how to write a thesis or how to write a thesis proposal , formulating a strong null hypothesis can alleviate much of the writing anxiety associated with research. It sets clear expectations and provides a benchmark against which the alternative hypothesis is measured.

Operationalizing the Null Hypothesis

Formulating a null hypothesis: practical examples.

When you embark on research, formulating a null hypothesis is a critical step. It is the assumption that there is no significant difference or effect in your study. For instance, if you're investigating a new teaching method's effectiveness, your null hypothesis would state that there is no difference in student outcomes between the new method and the traditional one. Operationalization is essential in research , turning abstract concepts into measurable variables. To avoid common pitfalls, ensure that all variables in the hypothesis can be measured or manipulated with existing research methods.

Testing the Null Hypothesis: Methodology and Interpretation

Once your null hypothesis is established, the next step is to test it using appropriate statistical methods. This involves collecting data, performing statistical tests, and interpreting the results. If the data shows a statistically significant effect, you may reject the null hypothesis. Otherwise, you may not reject it. It's crucial to understand that not rejecting the null hypothesis does not prove it true; it simply indicates insufficient evidence to do so.

Challenges and Common Misconceptions in Null Hypothesis Formulation

Formulating a null hypothesis comes with its challenges. Defining variables, selecting measurement techniques, and ensuring validity and reliability are all critical steps that can be fraught with difficulties. A common misconception is that a failed test of the null hypothesis is a failure of the research. In reality, it provides valuable information about the lack of evidence for an effect and helps refine future research questions.

Embarking on the journey of thesis writing can be daunting, but with Research Rebels, you're not alone. Our step-by-step Thesis Action Plan is designed to operationalize the null hypothesis, transforming the overwhelming into the manageable. If you're ready to conquer thesis anxiety and reclaim your peace of mind, visit our website and take the first step towards academic success. Don't wait any longer, claim your special offer now and join the ranks of students who've turned the tide on sleepless nights and anxiety.

In conclusion, the formulation of a null hypothesis is a pivotal step in the research process, providing a foundation for statistical testing and interpretation of data. It serves as the default assumption that there is no effect or difference, against which the alternative hypothesis is tested. The null hypothesis is not a claim of truth but a starting point for analysis, aiming to be either rejected or not rejected based on empirical evidence. By understanding and effectively formulating a null hypothesis, researchers can approach their inquiries with a structured and scientific mindset, ensuring that their findings are robust and their conclusions are grounded in statistical rigor. As we have explored throughout this article, the null hypothesis is not merely a procedural formality but a critical component that shapes the trajectory of research and the integrity of its outcomes.

Frequently Asked Questions

What exactly is a null hypothesis in research.

The null hypothesis is a statement used in statistical testing that proposes there is no significant effect or difference between certain groups or variables. It serves as a starting point for testing and is typically denoted as H0.

How does the null hypothesis differ from the alternative hypothesis?

The null hypothesis (H0) posits no effect or difference, suggesting any observed variation is due to chance. The alternative hypothesis (Ha or H1) contradicts the null, proposing a significant effect or difference that the research aims to support.

Why is the null hypothesis important in research design?

The null hypothesis is crucial because it establishes a baseline for comparison. By testing the null hypothesis, researchers can determine if their findings are statistically significant and whether they can reject H0 in favor of Ha.

Crafting a Null Hypothesis: A Guide to Writing it Right

Discovering Statistics Using IBM SPSS Statistics: A Fun and Informative Guide

Unlocking the Power of Data: A Review of 'Essentials of Modern Business Statistics with Microsoft Excel'

Unlocking the Power of Data: A Review of 'Essentials of Modern Business Statistics with Microsoft Excel'

Discovering Statistics Using SAS: A Comprehensive Review

Discovering Statistics Using SAS: A Comprehensive Review

Trending Topics for Your Thesis: What's Hot in 2024

Trending Topics for Your Thesis: What's Hot in 2024

How to Deal with a Total Lack of Motivation, Stress, and Anxiety When Finishing Your Master's Thesis

How to Deal with a Total Lack of Motivation, Stress, and Anxiety When Finishing Your Master's Thesis

Confident student with laptop and colorful books

Mastering the First Step: How to Start Your Thesis with Confidence

Thesis Action Plan

Thesis Action Plan

Research Proposal Compass

  • Blog Articles
  • Affiliate Program
  • Terms and Conditions
  • Payment and Shipping Terms
  • Privacy Policy
  • Return Policy

© 2024 Research Rebels, All rights reserved.

Your cart is currently empty.

  • Null hypothesis

by Marco Taboga , PhD

In a test of hypothesis , a sample of data is used to decide whether to reject or not to reject a hypothesis about the probability distribution from which the sample was extracted.

The hypothesis is called the null hypothesis, or simply "the null".

Things a data scientist should know: 1) the criminal trial analogy; 2) the role of the test statistic; 3) failure to reject may be due to lack of power; 4) Rejection may be due to misspecification.

Table of contents

The null is like the defendant in a criminal trial

How is the null hypothesis tested, example 1 - proportion of defective items, measurement, test statistic, critical region, interpretation, example 2 - reliability of a production plant, rejection and failure to reject, not rejecting and accepting are not the same thing, failure to reject can be due to lack of power, rejections are easier to interpret, but be careful, takeaways - how to (and not to) formulate a null hypothesis, more examples, more details, best practices in science, keep reading the glossary.

Formulating null hypotheses and subjecting them to statistical testing is one of the workhorses of the scientific method.

Scientists in all fields make conjectures about the phenomena they study, translate them into null hypotheses and gather data to test them.

This process resembles a trial:

the defendant (the null hypothesis) is accused of being guilty (wrong);

evidence (data) is gathered in order to prove the defendant guilty (reject the null);

if there is evidence beyond any reasonable doubt, the defendant is found guilty (the null is rejected);

otherwise, the defendant is found not guilty (the null is not rejected).

Keep this analogy in mind because it helps to better understand statistical tests, their limitations, use and misuse, and frequent misinterpretation.

The null hypothesis is like the defendant in a criminal trial.

Before collecting the data:

we decide how to summarize the relevant characteristics of the sample data in a single number, the so-called test statistic ;

we derive the probability distribution of the test statistic under the hypothesis that the null is true (the data is regarded as random; therefore, the test statistic is a random variable);

we decide what probability of incorrectly rejecting the null we are willing to tolerate (the level of significance , or size of the test ); the level of significance is typically a small number, such as 5% or 1%.

we choose one or more intervals of values (collectively called rejection region) such that the probability that the test statistic falls within these intervals is equal to the desired level of significance; the rejection region is often a tail of the distribution of the test statistic (one-tailed test) or the union of the left and right tails (two-tailed test).

The rejection region is a set of values that the test statistic is unlikely to take if the null hypothesis is true.

Then, the data is collected and used to compute the value of the test statistic.

A decision is taken as follows:

if the test statistic falls within the rejection region, then the null hypothesis is rejected;

otherwise, it is not rejected.

The probability distribution of the test statistic and the rejection region depend on the null hypothesis.

We now make two examples of practical problems that lead to formulate and test a null hypothesis.

A new method is proposed to produce light bulbs.

The proponents claim that it produces less defective bulbs than the method currently in use.

To check the claim, we can set up a statistical test as follows.

We keep the light bulbs on for 10 consecutive days, and then we record whether they are still working at the end of the test period.

The probability that a light bulb produced with the new method is still working at the end of the test period is the same as that of a light bulb produced with the old method.

100 light bulbs are tested:

50 of them are produced with the new method (group A)

the remaining 50 are produced with the old method (group B).

The final data comprises 100 observations of:

an indicator variable which is equal to 1 if the light bulb is still working at the end of the test period and 0 otherwise;

a categorical variable that records the group (A or B) to which each light bulb belongs.

We use the data to compute the proportions of working light bulbs in groups A and B.

The proportions are estimates of the probabilities of not being defective, which are equal for the two groups under the null hypothesis.

We then compute a z-statistic (see here for details) by:

taking the difference between the proportion in group A and the proportion in group B;

standardizing the difference:

we subtract the expected value (which is zero under the null hypothesis);

we divide by the standard deviation (it can be derived analytically).

The distribution of the z-statistic can be approximated by a standard normal distribution .

The z-statistic has a normal distribution with zero mean and variance equal to one.

We decide that the level of confidence must be 5%. In other words, we are going to tolerate a 5% probability of incorrectly rejecting the null hypothesis.

The critical region is the right 5%-tail of the normal distribution, that is, the set of all values greater than 1.645 (see the glossary entry on critical values if you are wondering how this value was obtained).

If the test statistic is greater than 1.645, then the null hypothesis is rejected; otherwise, it is not rejected.

A rejection is interpreted as significant evidence that the new production method produces less defective items; failure to reject is interpreted as insufficient evidence that the new method is better.

The null hypothesis is rejected when the test statistic falls in the tails of the distribution.

A production plant incurs high costs when production needs to be halted because some machinery fails.

The plant manager has decided that he is not willing to tolerate more than one halt per year on average.

If the expected number of halts per year is greater than 1, he will make new investments in order to improve the reliability of the plant.

A statistical test is set up as follows.

The reliability of the plant is measured by the number of halts.

The number of halts in a year is assumed to have a Poisson distribution with expected value equal to 1 (using the Poisson distribution is common in reliability testing).

The manager cannot wait more than one year before taking a decision.

There will be a single datum at his disposal: the number of halts observed during one year.

The number of halts is used as a test statistic. By assumption, it has a Poisson distribution under the null hypothesis.

The manager decides that the probability of incorrectly rejecting the null can be at most 10%.

A Poisson random variable with expected value equal to 1 takes values:

larger than 1 with probability 26.42%;

larger than 2 with probability 8.03%.

Therefore, it is decided that the critical region will be the set of all values greater than or equal to 3.

If the test statistic is strictly greater than or equal to 3, then the null is rejected; otherwise, it is not rejected.

A rejection is interpreted as significant evidence that the production plant is not reliable enough (the average number of halts per year is significantly larger than tolerated).

Failure to reject is interpreted as insufficient evidence that the plant is unreliable.

Failure to reject the null hypothesis is interpreted as insufficient evidence.

This section discusses the main problems that arise in the interpretation of the outcome of a statistical test (reject / not reject).

When the test statistic does not fall within the critical region, then we do not reject the null hypothesis.

Does this mean that we accept the null? Not really.

In general, failure to reject does not constitute, per se, strong evidence that the null hypothesis is true .

Remember the analogy between hypothesis testing and a criminal trial. In a trial, when the defendant is declared not guilty, this does not mean that the defendant is innocent. It only means that there was not enough evidence (not beyond any reasonable doubt) against the defendant.

In turn, lack of evidence can be due:

either to the fact that the defendant is innocent ;

or to the fact that the prosecution has not been able to provide enough evidence against the defendant, even if the latter is guilty .

This is the very reason why courts do not declare defendants innocent, but they use the locution "not guilty".

In a similar fashion, statisticians do not say that the null hypothesis has been accepted, but they say that it has not been rejected.

Failure to reject does not imply acceptance.

To better understand why failure to reject does not in general constitute strong evidence that the null hypothesis is true, we need to use the concept of statistical power .

The power of a test is the probability (calculated ex-ante, i.e., before observing the data) that the null will be rejected when another hypothesis (called the alternative hypothesis ) is true.

Let's consider the first of the two examples above (the production of light bulbs).

In that example, the null hypothesis is: the probability that a light bulb is defective does not decrease after introducing a new production method.

Let's make the alternative hypothesis that the probability of being defective is 1% smaller after changing the production process (assume that a 1% decrease is considered a meaningful improvement by engineers).

How much is the ex-ante probability of rejecting the null if the alternative hypothesis is true?

If this probability (the power of the test) is small, then it is very likely that we will not reject the null even if it is wrong.

If we use the analogy with criminal trials, low power means that most likely the prosecution will not be able to provide sufficient evidence, even if the defendant is guilty.

Thus, in the case of lack of power, failure to reject is almost meaningless (it was anyway highly likely).

This is why, before performing a test, it is good statistical practice to compute its power against a relevant alternative .

If the power is found to be too small, there are usually remedies. In particular, statistical power can usually be increased by increasing the sample size (see, e.g., the lecture on hypothesis tests about the mean ).

The best practice is to compute the power of the test, that is, the probability of rejecting the null hypothesis when the alternative is true.

As we have explained above, interpreting a failure to reject the null hypothesis is not always straightforward. Instead, interpreting a rejection is somewhat easier.

When we reject the null, we know that the data has provided a lot of evidence against the null. In other words, it is unlikely (how unlikely depends on the size of the test) that the null is true given the data we have observed.

There is an important caveat though. The null hypothesis is often made up of several assumptions, including:

the main assumption (the one we are testing);

other assumptions (e.g., technical assumptions) that we need to make in order to set up the hypothesis test.

For instance, in Example 2 above (reliability of a production plant), the main assumption is that the expected number of production halts per year is equal to 1. But there is also a technical assumption: the number of production halts has a Poisson distribution.

It must be kept in mind that a rejection is always a joint rejection of the main assumption and all the other assumptions .

Therefore, we should always ask ourselves whether the null has been rejected because the main assumption is wrong or because the other assumptions are violated.

In the case of Example 2 above, is a rejection of the null due to the fact that the expected number of halts is greater than 1 or is it due to the fact that the distribution of the number of halts is very different from a Poisson distribution?

When we suspect that a rejection is due to the inappropriateness of some technical assumption (e.g., assuming a Poisson distribution in the example), we say that the rejection could be due to misspecification of the model .

The right thing to do when these kind of suspicions arise is to conduct so-called robustness checks , that is, to change the technical assumptions and carry out the test again.

In our example, we could re-run the test by assuming a different probability distribution for the number of halts (e.g., a negative binomial or a compound Poisson - do not worry if you have never heard about these distributions).

If we keep obtaining a rejection of the null even after changing the technical assumptions several times, the we say that our rejection is robust to several different specifications of the model .

Even if the null hypothesis is true, a wrong technical assumption can lead to reject the null too often.

What are the main practical implications of everything we have said thus far? How does the theory above help us to set up and test a null hypothesis?

What we said can be summarized in the following guiding principles:

A test of hypothesis is like a criminal trial and you are the prosecutor . You want to find evidence that the defendant (the null hypothesis) is guilty. Your job is not to prove that the defendant is innocent. If you find yourself hoping that the defendant is found not guilty (i.e., the null is not rejected) then something is wrong with the way you set up the test. Remember: you are the prosecutor.

Compute the power of your test against one or more relevant alternative hypotheses. Do not run a test if you know ex-ante that it is unlikely to reject the null when the alternative hypothesis is true.

Beware of technical assumptions that you add to the main assumption you want to test. Make robustness checks in order to verify that the outcome of the test is not biased by model misspecification.

$H_{0}$

More examples of null hypotheses and how to test them can be found in the following lectures.

Where the example is found Null hypothesis
The mean of a normal distribution is equal to a certain value
The variance of a normal distribution is equal to a certain value
A vector of parameters estimated by MLE satisfies a set of linear or non-linear restrictions
A regression coefficient is equal to a certain value

The lecture on Hypothesis testing provides a more detailed mathematical treatment of null hypotheses and how they are tested.

This lecture on the null hypothesis was featured in Stanford University's Best practices in science .

Stanford University Best Practices in Science.

Previous entry: Normal equations

Next entry: Parameter

How to cite

Please cite as:

Taboga, Marco (2021). "Null hypothesis", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/glossary/null-hypothesis.

Most of the learning materials found on this website are now available in a traditional textbook format.

  • Permutations
  • Characteristic function
  • Almost sure convergence
  • Likelihood ratio test
  • Uniform distribution
  • Bernoulli distribution
  • Multivariate normal distribution
  • Chi-square distribution
  • Maximum likelihood
  • Mathematical tools
  • Fundamentals of probability
  • Probability distributions
  • Asymptotic theory
  • Fundamentals of statistics
  • About Statlect
  • Cookies, privacy and terms of use
  • Precision matrix
  • Distribution function
  • Mean squared error
  • IID sequence
  • To enhance your privacy,
  • we removed the social buttons,
  • but don't forget to share .

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

7.3: The Research Hypothesis and the Null Hypothesis

  • Last updated
  • Save as PDF
  • Page ID 18038

  • Michelle Oja
  • Taft College

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

Hypotheses are predictions of expected findings.

The Research Hypothesis

A research hypothesis is a mathematical way of stating a research question.  A research hypothesis names the groups (we'll start with a sample and a population), what was measured, and which we think will have a higher mean.  The last one gives the research hypothesis a direction.  In other words, a research hypothesis should include:

  • The name of the groups being compared.  This is sometimes considered the IV.
  • What was measured.  This is the DV.
  • Which group are we predicting will have the higher mean.  

There are two types of research hypotheses related to sample means and population means:  Directional Research Hypotheses and Non-Directional Research Hypotheses

Directional Research Hypothesis

If we expect our obtained sample mean to be above or below the other group's mean (the population mean, for example), we have a directional hypothesis. There are two options:

  • Symbol:       \( \displaystyle \bar{X} > \mu \)
  • (The mean of the sample is greater than than the mean of the population.)
  • Symbol:     \( \displaystyle \bar{X} < \mu \)
  • (The mean of the sample is less than than mean of the population.)

Example \(\PageIndex{1}\)

A study by Blackwell, Trzesniewski, and Dweck (2007) measured growth mindset and how long the junior high student participants spent on their math homework.  What’s a directional hypothesis for how scoring higher on growth mindset (compared to the population of junior high students) would be related to how long students spent on their homework?  Write this out in words and symbols.

Answer in Words:            Students who scored high on growth mindset would spend more time on their homework than the population of junior high students.

Answer in Symbols:         \( \displaystyle \bar{X} > \mu \) 

Non-Directional Research Hypothesis

A non-directional hypothesis states that the means will be different, but does not specify which will be higher.  In reality, there is rarely a situation in which we actually don't want one group to be higher than the other, so we will focus on directional research hypotheses.  There is only one option for a non-directional research hypothesis: "The sample mean differs from the population mean."  These types of research hypotheses don’t give a direction, the hypothesis doesn’t say which will be higher or lower.

A non-directional research hypothesis in symbols should look like this:    \( \displaystyle \bar{X} \neq \mu \) (The mean of the sample is not equal to the mean of the population).

Exercise \(\PageIndex{1}\)

What’s a non-directional hypothesis for how scoring higher on growth mindset higher on growth mindset (compared to the population of junior high students) would be related to how long students spent on their homework (Blackwell, Trzesniewski, & Dweck, 2007)?  Write this out in words and symbols.

Answer in Words:            Students who scored high on growth mindset would spend a different amount of time on their homework than the population of junior high students.

Answer in Symbols:        \( \displaystyle \bar{X} \neq \mu \) 

See how a non-directional research hypothesis doesn't really make sense?  The big issue is not if the two groups differ, but if one group seems to improve what was measured (if having a growth mindset leads to more time spent on math homework).  This textbook will only use directional research hypotheses because researchers almost always have a predicted direction (meaning that we almost always know which group we think will score higher).

The Null Hypothesis

The hypothesis that an apparent effect is due to chance is called the null hypothesis, written \(H_0\) (“H-naught”). We usually test this through comparing an experimental group to a comparison (control) group.  This null hypothesis can be written as:

\[\mathrm{H}_{0}: \bar{X} = \mu \nonumber \]

For most of this textbook, the null hypothesis is that the means of the two groups are similar.  Much later, the null hypothesis will be that there is no relationship between the two groups.  Either way, remember that a null hypothesis is always saying that nothing is different.  

This is where descriptive statistics diverge from inferential statistics.  We know what the value of \(\overline{\mathrm{X}}\) is – it’s not a mystery or a question, it is what we observed from the sample.  What we are using inferential statistics to do is infer whether this sample's descriptive statistics probably represents the population's descriptive statistics.  This is the null hypothesis, that the two groups are similar.  

Keep in mind that the null hypothesis is typically the opposite of the research hypothesis. A research hypothesis for the ESP example is that those in my sample who say that they have ESP would get more correct answers than the population would get correct, while the null hypothesis is that the average number correct for the two groups will be similar. 

In general, the null hypothesis is the idea that nothing is going on: there is no effect of our treatment, no relation between our variables, and no difference in our sample mean from what we expected about the population mean. This is always our baseline starting assumption, and it is what we seek to reject. If we are trying to treat depression, we want to find a difference in average symptoms between our treatment and control groups. If we are trying to predict job performance, we want to find a relation between conscientiousness and evaluation scores. However, until we have evidence against it, we must use the null hypothesis as our starting point.

In sum, the null hypothesis is always : There is no difference between the groups’ means OR There is no relationship between the variables .

In the next chapter, the null hypothesis is that there’s no difference between the sample mean   and population mean.  In other words:

  • There is no mean difference between the sample and population.
  • The mean of the sample is the same as the mean of a specific population.
  • \(\mathrm{H}_{0}: \bar{X} = \mu \nonumber \)
  • We expect our sample’s mean to be same as the population mean.

Exercise \(\PageIndex{2}\)

A study by Blackwell, Trzesniewski, and Dweck (2007) measured growth mindset and how long the junior high student participants spent on their math homework.  What’s the null hypothesis for scoring higher on growth mindset (compared to the population of junior high students) and how long students spent on their homework?  Write this out in words and symbols.

Answer in Words:            Students who scored high on growth mindset would spend a similar amount of time on their homework as the population of junior high students.

Answer in Symbols:    \( \bar{X} = \mu \)

Contributors and Attributions

Foster et al.  (University of Missouri-St. Louis, Rice University, & University of Houston, Downtown Campus)

Dr. MO ( Taft College )

Educational resources and simple solutions for your research journey

Research hypothesis: What it is, how to write it, types, and examples

What is a Research Hypothesis: How to Write it, Types, and Examples

how to formulate null hypothesis in research

Any research begins with a research question and a research hypothesis . A research question alone may not suffice to design the experiment(s) needed to answer it. A hypothesis is central to the scientific method. But what is a hypothesis ? A hypothesis is a testable statement that proposes a possible explanation to a phenomenon, and it may include a prediction. Next, you may ask what is a research hypothesis ? Simply put, a research hypothesis is a prediction or educated guess about the relationship between the variables that you want to investigate.  

It is important to be thorough when developing your research hypothesis. Shortcomings in the framing of a hypothesis can affect the study design and the results. A better understanding of the research hypothesis definition and characteristics of a good hypothesis will make it easier for you to develop your own hypothesis for your research. Let’s dive in to know more about the types of research hypothesis , how to write a research hypothesis , and some research hypothesis examples .  

Table of Contents

What is a hypothesis ?  

A hypothesis is based on the existing body of knowledge in a study area. Framed before the data are collected, a hypothesis states the tentative relationship between independent and dependent variables, along with a prediction of the outcome.  

What is a research hypothesis ?  

Young researchers starting out their journey are usually brimming with questions like “ What is a hypothesis ?” “ What is a research hypothesis ?” “How can I write a good research hypothesis ?”   

A research hypothesis is a statement that proposes a possible explanation for an observable phenomenon or pattern. It guides the direction of a study and predicts the outcome of the investigation. A research hypothesis is testable, i.e., it can be supported or disproven through experimentation or observation.     

how to formulate null hypothesis in research

Characteristics of a good hypothesis  

Here are the characteristics of a good hypothesis :  

  • Clearly formulated and free of language errors and ambiguity  
  • Concise and not unnecessarily verbose  
  • Has clearly defined variables  
  • Testable and stated in a way that allows for it to be disproven  
  • Can be tested using a research design that is feasible, ethical, and practical   
  • Specific and relevant to the research problem  
  • Rooted in a thorough literature search  
  • Can generate new knowledge or understanding.  

How to create an effective research hypothesis  

A study begins with the formulation of a research question. A researcher then performs background research. This background information forms the basis for building a good research hypothesis . The researcher then performs experiments, collects, and analyzes the data, interprets the findings, and ultimately, determines if the findings support or negate the original hypothesis.  

Let’s look at each step for creating an effective, testable, and good research hypothesis :  

  • Identify a research problem or question: Start by identifying a specific research problem.   
  • Review the literature: Conduct an in-depth review of the existing literature related to the research problem to grasp the current knowledge and gaps in the field.   
  • Formulate a clear and testable hypothesis : Based on the research question, use existing knowledge to form a clear and testable hypothesis . The hypothesis should state a predicted relationship between two or more variables that can be measured and manipulated. Improve the original draft till it is clear and meaningful.  
  • State the null hypothesis: The null hypothesis is a statement that there is no relationship between the variables you are studying.   
  • Define the population and sample: Clearly define the population you are studying and the sample you will be using for your research.  
  • Select appropriate methods for testing the hypothesis: Select appropriate research methods, such as experiments, surveys, or observational studies, which will allow you to test your research hypothesis .  

Remember that creating a research hypothesis is an iterative process, i.e., you might have to revise it based on the data you collect. You may need to test and reject several hypotheses before answering the research problem.  

How to write a research hypothesis  

When you start writing a research hypothesis , you use an “if–then” statement format, which states the predicted relationship between two or more variables. Clearly identify the independent variables (the variables being changed) and the dependent variables (the variables being measured), as well as the population you are studying. Review and revise your hypothesis as needed.  

An example of a research hypothesis in this format is as follows:  

“ If [athletes] follow [cold water showers daily], then their [endurance] increases.”  

Population: athletes  

Independent variable: daily cold water showers  

Dependent variable: endurance  

You may have understood the characteristics of a good hypothesis . But note that a research hypothesis is not always confirmed; a researcher should be prepared to accept or reject the hypothesis based on the study findings.  

how to formulate null hypothesis in research

Research hypothesis checklist  

Following from above, here is a 10-point checklist for a good research hypothesis :  

  • Testable: A research hypothesis should be able to be tested via experimentation or observation.  
  • Specific: A research hypothesis should clearly state the relationship between the variables being studied.  
  • Based on prior research: A research hypothesis should be based on existing knowledge and previous research in the field.  
  • Falsifiable: A research hypothesis should be able to be disproven through testing.  
  • Clear and concise: A research hypothesis should be stated in a clear and concise manner.  
  • Logical: A research hypothesis should be logical and consistent with current understanding of the subject.  
  • Relevant: A research hypothesis should be relevant to the research question and objectives.  
  • Feasible: A research hypothesis should be feasible to test within the scope of the study.  
  • Reflects the population: A research hypothesis should consider the population or sample being studied.  
  • Uncomplicated: A good research hypothesis is written in a way that is easy for the target audience to understand.  

By following this research hypothesis checklist , you will be able to create a research hypothesis that is strong, well-constructed, and more likely to yield meaningful results.  

Research hypothesis: What it is, how to write it, types, and examples

Types of research hypothesis  

Different types of research hypothesis are used in scientific research:  

1. Null hypothesis:

A null hypothesis states that there is no change in the dependent variable due to changes to the independent variable. This means that the results are due to chance and are not significant. A null hypothesis is denoted as H0 and is stated as the opposite of what the alternative hypothesis states.   

Example: “ The newly identified virus is not zoonotic .”  

2. Alternative hypothesis:

This states that there is a significant difference or relationship between the variables being studied. It is denoted as H1 or Ha and is usually accepted or rejected in favor of the null hypothesis.  

Example: “ The newly identified virus is zoonotic .”  

3. Directional hypothesis :

This specifies the direction of the relationship or difference between variables; therefore, it tends to use terms like increase, decrease, positive, negative, more, or less.   

Example: “ The inclusion of intervention X decreases infant mortality compared to the original treatment .”   

4. Non-directional hypothesis:

While it does not predict the exact direction or nature of the relationship between the two variables, a non-directional hypothesis states the existence of a relationship or difference between variables but not the direction, nature, or magnitude of the relationship. A non-directional hypothesis may be used when there is no underlying theory or when findings contradict previous research.  

Example, “ Cats and dogs differ in the amount of affection they express .”  

5. Simple hypothesis :

A simple hypothesis only predicts the relationship between one independent and another independent variable.  

Example: “ Applying sunscreen every day slows skin aging .”  

6 . Complex hypothesis :

A complex hypothesis states the relationship or difference between two or more independent and dependent variables.   

Example: “ Applying sunscreen every day slows skin aging, reduces sun burn, and reduces the chances of skin cancer .” (Here, the three dependent variables are slowing skin aging, reducing sun burn, and reducing the chances of skin cancer.)  

7. Associative hypothesis:  

An associative hypothesis states that a change in one variable results in the change of the other variable. The associative hypothesis defines interdependency between variables.  

Example: “ There is a positive association between physical activity levels and overall health .”  

8 . Causal hypothesis:

A causal hypothesis proposes a cause-and-effect interaction between variables.  

Example: “ Long-term alcohol use causes liver damage .”  

Note that some of the types of research hypothesis mentioned above might overlap. The types of hypothesis chosen will depend on the research question and the objective of the study.  

how to formulate null hypothesis in research

Research hypothesis examples  

Here are some good research hypothesis examples :  

“The use of a specific type of therapy will lead to a reduction in symptoms of depression in individuals with a history of major depressive disorder.”  

“Providing educational interventions on healthy eating habits will result in weight loss in overweight individuals.”  

“Plants that are exposed to certain types of music will grow taller than those that are not exposed to music.”  

“The use of the plant growth regulator X will lead to an increase in the number of flowers produced by plants.”  

Characteristics that make a research hypothesis weak are unclear variables, unoriginality, being too general or too vague, and being untestable. A weak hypothesis leads to weak research and improper methods.   

Some bad research hypothesis examples (and the reasons why they are “bad”) are as follows:  

“This study will show that treatment X is better than any other treatment . ” (This statement is not testable, too broad, and does not consider other treatments that may be effective.)  

“This study will prove that this type of therapy is effective for all mental disorders . ” (This statement is too broad and not testable as mental disorders are complex and different disorders may respond differently to different types of therapy.)  

“Plants can communicate with each other through telepathy . ” (This statement is not testable and lacks a scientific basis.)  

Importance of testable hypothesis  

If a research hypothesis is not testable, the results will not prove or disprove anything meaningful. The conclusions will be vague at best. A testable hypothesis helps a researcher focus on the study outcome and understand the implication of the question and the different variables involved. A testable hypothesis helps a researcher make precise predictions based on prior research.  

To be considered testable, there must be a way to prove that the hypothesis is true or false; further, the results of the hypothesis must be reproducible.  

Research hypothesis: What it is, how to write it, types, and examples

Frequently Asked Questions (FAQs) on research hypothesis  

1. What is the difference between research question and research hypothesis ?  

A research question defines the problem and helps outline the study objective(s). It is an open-ended statement that is exploratory or probing in nature. Therefore, it does not make predictions or assumptions. It helps a researcher identify what information to collect. A research hypothesis , however, is a specific, testable prediction about the relationship between variables. Accordingly, it guides the study design and data analysis approach.

2. When to reject null hypothesis ?

A null hypothesis should be rejected when the evidence from a statistical test shows that it is unlikely to be true. This happens when the test statistic (e.g., p -value) is less than the defined significance level (e.g., 0.05). Rejecting the null hypothesis does not necessarily mean that the alternative hypothesis is true; it simply means that the evidence found is not compatible with the null hypothesis.  

3. How can I be sure my hypothesis is testable?  

A testable hypothesis should be specific and measurable, and it should state a clear relationship between variables that can be tested with data. To ensure that your hypothesis is testable, consider the following:  

  • Clearly define the key variables in your hypothesis. You should be able to measure and manipulate these variables in a way that allows you to test the hypothesis.  
  • The hypothesis should predict a specific outcome or relationship between variables that can be measured or quantified.   
  • You should be able to collect the necessary data within the constraints of your study.  
  • It should be possible for other researchers to replicate your study, using the same methods and variables.   
  • Your hypothesis should be testable by using appropriate statistical analysis techniques, so you can draw conclusions, and make inferences about the population from the sample data.  
  • The hypothesis should be able to be disproven or rejected through the collection of data.  

4. How do I revise my research hypothesis if my data does not support it?  

If your data does not support your research hypothesis , you will need to revise it or develop a new one. You should examine your data carefully and identify any patterns or anomalies, re-examine your research question, and/or revisit your theory to look for any alternative explanations for your results. Based on your review of the data, literature, and theories, modify your research hypothesis to better align it with the results you obtained. Use your revised hypothesis to guide your research design and data collection. It is important to remain objective throughout the process.  

5. I am performing exploratory research. Do I need to formulate a research hypothesis?  

As opposed to “confirmatory” research, where a researcher has some idea about the relationship between the variables under investigation, exploratory research (or hypothesis-generating research) looks into a completely new topic about which limited information is available. Therefore, the researcher will not have any prior hypotheses. In such cases, a researcher will need to develop a post-hoc hypothesis. A post-hoc research hypothesis is generated after these results are known.  

6. How is a research hypothesis different from a research question?

A research question is an inquiry about a specific topic or phenomenon, typically expressed as a question. It seeks to explore and understand a particular aspect of the research subject. In contrast, a research hypothesis is a specific statement or prediction that suggests an expected relationship between variables. It is formulated based on existing knowledge or theories and guides the research design and data analysis.

7. Can a research hypothesis change during the research process?

Yes, research hypotheses can change during the research process. As researchers collect and analyze data, new insights and information may emerge that require modification or refinement of the initial hypotheses. This can be due to unexpected findings, limitations in the original hypotheses, or the need to explore additional dimensions of the research topic. Flexibility is crucial in research, allowing for adaptation and adjustment of hypotheses to align with the evolving understanding of the subject matter.

8. How many hypotheses should be included in a research study?

The number of research hypotheses in a research study varies depending on the nature and scope of the research. It is not necessary to have multiple hypotheses in every study. Some studies may have only one primary hypothesis, while others may have several related hypotheses. The number of hypotheses should be determined based on the research objectives, research questions, and the complexity of the research topic. It is important to ensure that the hypotheses are focused, testable, and directly related to the research aims.

9. Can research hypotheses be used in qualitative research?

Yes, research hypotheses can be used in qualitative research, although they are more commonly associated with quantitative research. In qualitative research, hypotheses may be formulated as tentative or exploratory statements that guide the investigation. Instead of testing hypotheses through statistical analysis, qualitative researchers may use the hypotheses to guide data collection and analysis, seeking to uncover patterns, themes, or relationships within the qualitative data. The emphasis in qualitative research is often on generating insights and understanding rather than confirming or rejecting specific research hypotheses through statistical testing.

Editage All Access is a subscription-based platform that unifies the best AI tools and services designed to speed up, simplify, and streamline every step of a researcher’s journey. The Editage All Access Pack is a one-of-a-kind subscription that unlocks full access to an AI writing assistant, literature recommender, journal finder, scientific illustration tool, and exclusive discounts on professional publication services from Editage.  

Based on 22+ years of experience in academia, Editage All Access empowers researchers to put their best research forward and move closer to success. Explore our top AI Tools pack, AI Tools + Publication Services pack, or Build Your Own Plan. Find everything a researcher needs to succeed, all in one place –  Get All Access now starting at just $14 a month !    

Related Posts

Peer Review Basics: Who is Reviewer 2?

How to Write a Dissertation: A Beginner’s Guide 

Back to school 2024 sale

Back to School – Lock-in All Access Pack for a Year at the Best Price

Statology

Understanding the Null Hypothesis for Linear Regression

Linear regression is a technique we can use to understand the relationship between one or more predictor variables and a response variable .

If we only have one predictor variable and one response variable, we can use simple linear regression , which uses the following formula to estimate the relationship between the variables:

ŷ = β 0 + β 1 x

  • ŷ: The estimated response value.
  • β 0 : The average value of y when x is zero.
  • β 1 : The average change in y associated with a one unit increase in x.
  • x: The value of the predictor variable.

Simple linear regression uses the following null and alternative hypotheses:

  • H 0 : β 1 = 0
  • H A : β 1 ≠ 0

The null hypothesis states that the coefficient β 1 is equal to zero. In other words, there is no statistically significant relationship between the predictor variable, x, and the response variable, y.

The alternative hypothesis states that β 1 is not equal to zero. In other words, there is a statistically significant relationship between x and y.

If we have multiple predictor variables and one response variable, we can use multiple linear regression , which uses the following formula to estimate the relationship between the variables:

ŷ = β 0 + β 1 x 1 + β 2 x 2 + … + β k x k

  • β 0 : The average value of y when all predictor variables are equal to zero.
  • β i : The average change in y associated with a one unit increase in x i .
  • x i : The value of the predictor variable x i .

Multiple linear regression uses the following null and alternative hypotheses:

  • H 0 : β 1 = β 2 = … = β k = 0
  • H A : β 1 = β 2 = … = β k ≠ 0

The null hypothesis states that all coefficients in the model are equal to zero. In other words, none of the predictor variables have a statistically significant relationship with the response variable, y.

The alternative hypothesis states that not every coefficient is simultaneously equal to zero.

The following examples show how to decide to reject or fail to reject the null hypothesis in both simple linear regression and multiple linear regression models.

Example 1: Simple Linear Regression

Suppose a professor would like to use the number of hours studied to predict the exam score that students will receive in his class. He collects data for 20 students and fits a simple linear regression model.

The following screenshot shows the output of the regression model:

Output of simple linear regression in Excel

The fitted simple linear regression model is:

Exam Score = 67.1617 + 5.2503*(hours studied)

To determine if there is a statistically significant relationship between hours studied and exam score, we need to analyze the overall F value of the model and the corresponding p-value:

  • Overall F-Value:  47.9952
  • P-value:  0.000

Since this p-value is less than .05, we can reject the null hypothesis. In other words, there is a statistically significant relationship between hours studied and exam score received.

Example 2: Multiple Linear Regression

Suppose a professor would like to use the number of hours studied and the number of prep exams taken to predict the exam score that students will receive in his class. He collects data for 20 students and fits a multiple linear regression model.

Multiple linear regression output in Excel

The fitted multiple linear regression model is:

Exam Score = 67.67 + 5.56*(hours studied) – 0.60*(prep exams taken)

To determine if there is a jointly statistically significant relationship between the two predictor variables and the response variable, we need to analyze the overall F value of the model and the corresponding p-value:

  • Overall F-Value:  23.46
  • P-value:  0.00

Since this p-value is less than .05, we can reject the null hypothesis. In other words, hours studied and prep exams taken have a jointly statistically significant relationship with exam score.

Note: Although the p-value for prep exams taken (p = 0.52) is not significant, prep exams combined with hours studied has a significant relationship with exam score.

Additional Resources

Understanding the F-Test of Overall Significance in Regression How to Read and Interpret a Regression Table How to Report Regression Results How to Perform Simple Linear Regression in Excel How to Perform Multiple Linear Regression in Excel

Featured Posts

how to formulate null hypothesis in research

Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike.  My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

2 Replies to “Understanding the Null Hypothesis for Linear Regression”

Thank you Zach, this helped me on homework!

Great articles, Zach.

I would like to cite your work in a research paper.

Could you provide me with your last name and initials.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Join the Statology Community

Sign up to receive Statology's exclusive study resource: 100 practice problems with step-by-step solutions. Plus, get our latest insights, tutorials, and data analysis tips straight to your inbox!

By subscribing you accept Statology's Privacy Policy.

Our websites may use cookies to personalize and enhance your experience. By continuing without changing your cookie settings, you agree to this collection. For more information, please see our University Websites Privacy Notice .

Neag School of Education

Educational Research Basics by Del Siegle

Null and alternative hypotheses.

Converting research questions to hypothesis is a simple task. Take the questions and make it a positive statement that says a relationship exists (correlation studies) or a difference exists between the groups (experiment study) and you have the alternative hypothesis. Write the statement such that a relationship does not exist or a difference does not exist and you have the null hypothesis. You can reverse the process if you have a hypothesis and wish to write a research question.

When you are comparing two groups, the groups are the independent variable. When you are testing whether something affects something else, the cause is the independent variable. The independent variable is the one you manipulate.

Teachers given higher pay will have more positive attitudes toward children than teachers given lower pay. The first step is to ask yourself “Are there two or more groups being compared?” The answer is “Yes.” What are the groups? Teachers who are given higher pay and teachers who are given lower pay. The independent variable is teacher pay. The dependent variable (the outcome) is attitude towards school.

You could also approach is another way. “Is something causing something else?” The answer is “Yes.”  What is causing what? Teacher pay is causing attitude towards school. Therefore, teacher pay is the independent variable (cause) and attitude towards school is the dependent variable (outcome).

By tradition, we try to disprove (reject) the null hypothesis. We can never prove a null hypothesis, because it is impossible to prove something does not exist. We can disprove something does not exist by finding an example of it. Therefore, in research we try to disprove the null hypothesis. When we do find that a relationship (or difference) exists then we reject the null and accept the alternative. If we do not find that a relationship (or difference) exists, we fail to reject the null hypothesis (and go with it). We never say we accept the null hypothesis because it is never possible to prove something does not exist. That is why we say that we failed to reject the null hypothesis, rather than we accepted it.

Del Siegle, Ph.D. Neag School of Education – University of Connecticut [email protected] www.delsiegle.com

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Korean Med Sci
  • v.37(16); 2022 Apr 25

Logo of jkms

A Practical Guide to Writing Quantitative and Qualitative Research Questions and Hypotheses in Scholarly Articles

Edward barroga.

1 Department of General Education, Graduate School of Nursing Science, St. Luke’s International University, Tokyo, Japan.

Glafera Janet Matanguihan

2 Department of Biological Sciences, Messiah University, Mechanicsburg, PA, USA.

The development of research questions and the subsequent hypotheses are prerequisites to defining the main research purpose and specific objectives of a study. Consequently, these objectives determine the study design and research outcome. The development of research questions is a process based on knowledge of current trends, cutting-edge studies, and technological advances in the research field. Excellent research questions are focused and require a comprehensive literature search and in-depth understanding of the problem being investigated. Initially, research questions may be written as descriptive questions which could be developed into inferential questions. These questions must be specific and concise to provide a clear foundation for developing hypotheses. Hypotheses are more formal predictions about the research outcomes. These specify the possible results that may or may not be expected regarding the relationship between groups. Thus, research questions and hypotheses clarify the main purpose and specific objectives of the study, which in turn dictate the design of the study, its direction, and outcome. Studies developed from good research questions and hypotheses will have trustworthy outcomes with wide-ranging social and health implications.

INTRODUCTION

Scientific research is usually initiated by posing evidenced-based research questions which are then explicitly restated as hypotheses. 1 , 2 The hypotheses provide directions to guide the study, solutions, explanations, and expected results. 3 , 4 Both research questions and hypotheses are essentially formulated based on conventional theories and real-world processes, which allow the inception of novel studies and the ethical testing of ideas. 5 , 6

It is crucial to have knowledge of both quantitative and qualitative research 2 as both types of research involve writing research questions and hypotheses. 7 However, these crucial elements of research are sometimes overlooked; if not overlooked, then framed without the forethought and meticulous attention it needs. Planning and careful consideration are needed when developing quantitative or qualitative research, particularly when conceptualizing research questions and hypotheses. 4

There is a continuing need to support researchers in the creation of innovative research questions and hypotheses, as well as for journal articles that carefully review these elements. 1 When research questions and hypotheses are not carefully thought of, unethical studies and poor outcomes usually ensue. Carefully formulated research questions and hypotheses define well-founded objectives, which in turn determine the appropriate design, course, and outcome of the study. This article then aims to discuss in detail the various aspects of crafting research questions and hypotheses, with the goal of guiding researchers as they develop their own. Examples from the authors and peer-reviewed scientific articles in the healthcare field are provided to illustrate key points.

DEFINITIONS AND RELATIONSHIP OF RESEARCH QUESTIONS AND HYPOTHESES

A research question is what a study aims to answer after data analysis and interpretation. The answer is written in length in the discussion section of the paper. Thus, the research question gives a preview of the different parts and variables of the study meant to address the problem posed in the research question. 1 An excellent research question clarifies the research writing while facilitating understanding of the research topic, objective, scope, and limitations of the study. 5

On the other hand, a research hypothesis is an educated statement of an expected outcome. This statement is based on background research and current knowledge. 8 , 9 The research hypothesis makes a specific prediction about a new phenomenon 10 or a formal statement on the expected relationship between an independent variable and a dependent variable. 3 , 11 It provides a tentative answer to the research question to be tested or explored. 4

Hypotheses employ reasoning to predict a theory-based outcome. 10 These can also be developed from theories by focusing on components of theories that have not yet been observed. 10 The validity of hypotheses is often based on the testability of the prediction made in a reproducible experiment. 8

Conversely, hypotheses can also be rephrased as research questions. Several hypotheses based on existing theories and knowledge may be needed to answer a research question. Developing ethical research questions and hypotheses creates a research design that has logical relationships among variables. These relationships serve as a solid foundation for the conduct of the study. 4 , 11 Haphazardly constructed research questions can result in poorly formulated hypotheses and improper study designs, leading to unreliable results. Thus, the formulations of relevant research questions and verifiable hypotheses are crucial when beginning research. 12

CHARACTERISTICS OF GOOD RESEARCH QUESTIONS AND HYPOTHESES

Excellent research questions are specific and focused. These integrate collective data and observations to confirm or refute the subsequent hypotheses. Well-constructed hypotheses are based on previous reports and verify the research context. These are realistic, in-depth, sufficiently complex, and reproducible. More importantly, these hypotheses can be addressed and tested. 13

There are several characteristics of well-developed hypotheses. Good hypotheses are 1) empirically testable 7 , 10 , 11 , 13 ; 2) backed by preliminary evidence 9 ; 3) testable by ethical research 7 , 9 ; 4) based on original ideas 9 ; 5) have evidenced-based logical reasoning 10 ; and 6) can be predicted. 11 Good hypotheses can infer ethical and positive implications, indicating the presence of a relationship or effect relevant to the research theme. 7 , 11 These are initially developed from a general theory and branch into specific hypotheses by deductive reasoning. In the absence of a theory to base the hypotheses, inductive reasoning based on specific observations or findings form more general hypotheses. 10

TYPES OF RESEARCH QUESTIONS AND HYPOTHESES

Research questions and hypotheses are developed according to the type of research, which can be broadly classified into quantitative and qualitative research. We provide a summary of the types of research questions and hypotheses under quantitative and qualitative research categories in Table 1 .

Quantitative research questionsQuantitative research hypotheses
Descriptive research questionsSimple hypothesis
Comparative research questionsComplex hypothesis
Relationship research questionsDirectional hypothesis
Non-directional hypothesis
Associative hypothesis
Causal hypothesis
Null hypothesis
Alternative hypothesis
Working hypothesis
Statistical hypothesis
Logical hypothesis
Hypothesis-testing
Qualitative research questionsQualitative research hypotheses
Contextual research questionsHypothesis-generating
Descriptive research questions
Evaluation research questions
Explanatory research questions
Exploratory research questions
Generative research questions
Ideological research questions
Ethnographic research questions
Phenomenological research questions
Grounded theory questions
Qualitative case study questions

Research questions in quantitative research

In quantitative research, research questions inquire about the relationships among variables being investigated and are usually framed at the start of the study. These are precise and typically linked to the subject population, dependent and independent variables, and research design. 1 Research questions may also attempt to describe the behavior of a population in relation to one or more variables, or describe the characteristics of variables to be measured ( descriptive research questions ). 1 , 5 , 14 These questions may also aim to discover differences between groups within the context of an outcome variable ( comparative research questions ), 1 , 5 , 14 or elucidate trends and interactions among variables ( relationship research questions ). 1 , 5 We provide examples of descriptive, comparative, and relationship research questions in quantitative research in Table 2 .

Quantitative research questions
Descriptive research question
- Measures responses of subjects to variables
- Presents variables to measure, analyze, or assess
What is the proportion of resident doctors in the hospital who have mastered ultrasonography (response of subjects to a variable) as a diagnostic technique in their clinical training?
Comparative research question
- Clarifies difference between one group with outcome variable and another group without outcome variable
Is there a difference in the reduction of lung metastasis in osteosarcoma patients who received the vitamin D adjunctive therapy (group with outcome variable) compared with osteosarcoma patients who did not receive the vitamin D adjunctive therapy (group without outcome variable)?
- Compares the effects of variables
How does the vitamin D analogue 22-Oxacalcitriol (variable 1) mimic the antiproliferative activity of 1,25-Dihydroxyvitamin D (variable 2) in osteosarcoma cells?
Relationship research question
- Defines trends, association, relationships, or interactions between dependent variable and independent variable
Is there a relationship between the number of medical student suicide (dependent variable) and the level of medical student stress (independent variable) in Japan during the first wave of the COVID-19 pandemic?

Hypotheses in quantitative research

In quantitative research, hypotheses predict the expected relationships among variables. 15 Relationships among variables that can be predicted include 1) between a single dependent variable and a single independent variable ( simple hypothesis ) or 2) between two or more independent and dependent variables ( complex hypothesis ). 4 , 11 Hypotheses may also specify the expected direction to be followed and imply an intellectual commitment to a particular outcome ( directional hypothesis ) 4 . On the other hand, hypotheses may not predict the exact direction and are used in the absence of a theory, or when findings contradict previous studies ( non-directional hypothesis ). 4 In addition, hypotheses can 1) define interdependency between variables ( associative hypothesis ), 4 2) propose an effect on the dependent variable from manipulation of the independent variable ( causal hypothesis ), 4 3) state a negative relationship between two variables ( null hypothesis ), 4 , 11 , 15 4) replace the working hypothesis if rejected ( alternative hypothesis ), 15 explain the relationship of phenomena to possibly generate a theory ( working hypothesis ), 11 5) involve quantifiable variables that can be tested statistically ( statistical hypothesis ), 11 6) or express a relationship whose interlinks can be verified logically ( logical hypothesis ). 11 We provide examples of simple, complex, directional, non-directional, associative, causal, null, alternative, working, statistical, and logical hypotheses in quantitative research, as well as the definition of quantitative hypothesis-testing research in Table 3 .

Quantitative research hypotheses
Simple hypothesis
- Predicts relationship between single dependent variable and single independent variable
If the dose of the new medication (single independent variable) is high, blood pressure (single dependent variable) is lowered.
Complex hypothesis
- Foretells relationship between two or more independent and dependent variables
The higher the use of anticancer drugs, radiation therapy, and adjunctive agents (3 independent variables), the higher would be the survival rate (1 dependent variable).
Directional hypothesis
- Identifies study direction based on theory towards particular outcome to clarify relationship between variables
Privately funded research projects will have a larger international scope (study direction) than publicly funded research projects.
Non-directional hypothesis
- Nature of relationship between two variables or exact study direction is not identified
- Does not involve a theory
Women and men are different in terms of helpfulness. (Exact study direction is not identified)
Associative hypothesis
- Describes variable interdependency
- Change in one variable causes change in another variable
A larger number of people vaccinated against COVID-19 in the region (change in independent variable) will reduce the region’s incidence of COVID-19 infection (change in dependent variable).
Causal hypothesis
- An effect on dependent variable is predicted from manipulation of independent variable
A change into a high-fiber diet (independent variable) will reduce the blood sugar level (dependent variable) of the patient.
Null hypothesis
- A negative statement indicating no relationship or difference between 2 variables
There is no significant difference in the severity of pulmonary metastases between the new drug (variable 1) and the current drug (variable 2).
Alternative hypothesis
- Following a null hypothesis, an alternative hypothesis predicts a relationship between 2 study variables
The new drug (variable 1) is better on average in reducing the level of pain from pulmonary metastasis than the current drug (variable 2).
Working hypothesis
- A hypothesis that is initially accepted for further research to produce a feasible theory
Dairy cows fed with concentrates of different formulations will produce different amounts of milk.
Statistical hypothesis
- Assumption about the value of population parameter or relationship among several population characteristics
- Validity tested by a statistical experiment or analysis
The mean recovery rate from COVID-19 infection (value of population parameter) is not significantly different between population 1 and population 2.
There is a positive correlation between the level of stress at the workplace and the number of suicides (population characteristics) among working people in Japan.
Logical hypothesis
- Offers or proposes an explanation with limited or no extensive evidence
If healthcare workers provide more educational programs about contraception methods, the number of adolescent pregnancies will be less.
Hypothesis-testing (Quantitative hypothesis-testing research)
- Quantitative research uses deductive reasoning.
- This involves the formation of a hypothesis, collection of data in the investigation of the problem, analysis and use of the data from the investigation, and drawing of conclusions to validate or nullify the hypotheses.

Research questions in qualitative research

Unlike research questions in quantitative research, research questions in qualitative research are usually continuously reviewed and reformulated. The central question and associated subquestions are stated more than the hypotheses. 15 The central question broadly explores a complex set of factors surrounding the central phenomenon, aiming to present the varied perspectives of participants. 15

There are varied goals for which qualitative research questions are developed. These questions can function in several ways, such as to 1) identify and describe existing conditions ( contextual research question s); 2) describe a phenomenon ( descriptive research questions ); 3) assess the effectiveness of existing methods, protocols, theories, or procedures ( evaluation research questions ); 4) examine a phenomenon or analyze the reasons or relationships between subjects or phenomena ( explanatory research questions ); or 5) focus on unknown aspects of a particular topic ( exploratory research questions ). 5 In addition, some qualitative research questions provide new ideas for the development of theories and actions ( generative research questions ) or advance specific ideologies of a position ( ideological research questions ). 1 Other qualitative research questions may build on a body of existing literature and become working guidelines ( ethnographic research questions ). Research questions may also be broadly stated without specific reference to the existing literature or a typology of questions ( phenomenological research questions ), may be directed towards generating a theory of some process ( grounded theory questions ), or may address a description of the case and the emerging themes ( qualitative case study questions ). 15 We provide examples of contextual, descriptive, evaluation, explanatory, exploratory, generative, ideological, ethnographic, phenomenological, grounded theory, and qualitative case study research questions in qualitative research in Table 4 , and the definition of qualitative hypothesis-generating research in Table 5 .

Qualitative research questions
Contextual research question
- Ask the nature of what already exists
- Individuals or groups function to further clarify and understand the natural context of real-world problems
What are the experiences of nurses working night shifts in healthcare during the COVID-19 pandemic? (natural context of real-world problems)
Descriptive research question
- Aims to describe a phenomenon
What are the different forms of disrespect and abuse (phenomenon) experienced by Tanzanian women when giving birth in healthcare facilities?
Evaluation research question
- Examines the effectiveness of existing practice or accepted frameworks
How effective are decision aids (effectiveness of existing practice) in helping decide whether to give birth at home or in a healthcare facility?
Explanatory research question
- Clarifies a previously studied phenomenon and explains why it occurs
Why is there an increase in teenage pregnancy (phenomenon) in Tanzania?
Exploratory research question
- Explores areas that have not been fully investigated to have a deeper understanding of the research problem
What factors affect the mental health of medical students (areas that have not yet been fully investigated) during the COVID-19 pandemic?
Generative research question
- Develops an in-depth understanding of people’s behavior by asking ‘how would’ or ‘what if’ to identify problems and find solutions
How would the extensive research experience of the behavior of new staff impact the success of the novel drug initiative?
Ideological research question
- Aims to advance specific ideas or ideologies of a position
Are Japanese nurses who volunteer in remote African hospitals able to promote humanized care of patients (specific ideas or ideologies) in the areas of safe patient environment, respect of patient privacy, and provision of accurate information related to health and care?
Ethnographic research question
- Clarifies peoples’ nature, activities, their interactions, and the outcomes of their actions in specific settings
What are the demographic characteristics, rehabilitative treatments, community interactions, and disease outcomes (nature, activities, their interactions, and the outcomes) of people in China who are suffering from pneumoconiosis?
Phenomenological research question
- Knows more about the phenomena that have impacted an individual
What are the lived experiences of parents who have been living with and caring for children with a diagnosis of autism? (phenomena that have impacted an individual)
Grounded theory question
- Focuses on social processes asking about what happens and how people interact, or uncovering social relationships and behaviors of groups
What are the problems that pregnant adolescents face in terms of social and cultural norms (social processes), and how can these be addressed?
Qualitative case study question
- Assesses a phenomenon using different sources of data to answer “why” and “how” questions
- Considers how the phenomenon is influenced by its contextual situation.
How does quitting work and assuming the role of a full-time mother (phenomenon assessed) change the lives of women in Japan?
Qualitative research hypotheses
Hypothesis-generating (Qualitative hypothesis-generating research)
- Qualitative research uses inductive reasoning.
- This involves data collection from study participants or the literature regarding a phenomenon of interest, using the collected data to develop a formal hypothesis, and using the formal hypothesis as a framework for testing the hypothesis.
- Qualitative exploratory studies explore areas deeper, clarifying subjective experience and allowing formulation of a formal hypothesis potentially testable in a future quantitative approach.

Qualitative studies usually pose at least one central research question and several subquestions starting with How or What . These research questions use exploratory verbs such as explore or describe . These also focus on one central phenomenon of interest, and may mention the participants and research site. 15

Hypotheses in qualitative research

Hypotheses in qualitative research are stated in the form of a clear statement concerning the problem to be investigated. Unlike in quantitative research where hypotheses are usually developed to be tested, qualitative research can lead to both hypothesis-testing and hypothesis-generating outcomes. 2 When studies require both quantitative and qualitative research questions, this suggests an integrative process between both research methods wherein a single mixed-methods research question can be developed. 1

FRAMEWORKS FOR DEVELOPING RESEARCH QUESTIONS AND HYPOTHESES

Research questions followed by hypotheses should be developed before the start of the study. 1 , 12 , 14 It is crucial to develop feasible research questions on a topic that is interesting to both the researcher and the scientific community. This can be achieved by a meticulous review of previous and current studies to establish a novel topic. Specific areas are subsequently focused on to generate ethical research questions. The relevance of the research questions is evaluated in terms of clarity of the resulting data, specificity of the methodology, objectivity of the outcome, depth of the research, and impact of the study. 1 , 5 These aspects constitute the FINER criteria (i.e., Feasible, Interesting, Novel, Ethical, and Relevant). 1 Clarity and effectiveness are achieved if research questions meet the FINER criteria. In addition to the FINER criteria, Ratan et al. described focus, complexity, novelty, feasibility, and measurability for evaluating the effectiveness of research questions. 14

The PICOT and PEO frameworks are also used when developing research questions. 1 The following elements are addressed in these frameworks, PICOT: P-population/patients/problem, I-intervention or indicator being studied, C-comparison group, O-outcome of interest, and T-timeframe of the study; PEO: P-population being studied, E-exposure to preexisting conditions, and O-outcome of interest. 1 Research questions are also considered good if these meet the “FINERMAPS” framework: Feasible, Interesting, Novel, Ethical, Relevant, Manageable, Appropriate, Potential value/publishable, and Systematic. 14

As we indicated earlier, research questions and hypotheses that are not carefully formulated result in unethical studies or poor outcomes. To illustrate this, we provide some examples of ambiguous research question and hypotheses that result in unclear and weak research objectives in quantitative research ( Table 6 ) 16 and qualitative research ( Table 7 ) 17 , and how to transform these ambiguous research question(s) and hypothesis(es) into clear and good statements.

VariablesUnclear and weak statement (Statement 1) Clear and good statement (Statement 2) Points to avoid
Research questionWhich is more effective between smoke moxibustion and smokeless moxibustion?“Moreover, regarding smoke moxibustion versus smokeless moxibustion, it remains unclear which is more effective, safe, and acceptable to pregnant women, and whether there is any difference in the amount of heat generated.” 1) Vague and unfocused questions
2) Closed questions simply answerable by yes or no
3) Questions requiring a simple choice
HypothesisThe smoke moxibustion group will have higher cephalic presentation.“Hypothesis 1. The smoke moxibustion stick group (SM group) and smokeless moxibustion stick group (-SLM group) will have higher rates of cephalic presentation after treatment than the control group.1) Unverifiable hypotheses
Hypothesis 2. The SM group and SLM group will have higher rates of cephalic presentation at birth than the control group.2) Incompletely stated groups of comparison
Hypothesis 3. There will be no significant differences in the well-being of the mother and child among the three groups in terms of the following outcomes: premature birth, premature rupture of membranes (PROM) at < 37 weeks, Apgar score < 7 at 5 min, umbilical cord blood pH < 7.1, admission to neonatal intensive care unit (NICU), and intrauterine fetal death.” 3) Insufficiently described variables or outcomes
Research objectiveTo determine which is more effective between smoke moxibustion and smokeless moxibustion.“The specific aims of this pilot study were (a) to compare the effects of smoke moxibustion and smokeless moxibustion treatments with the control group as a possible supplement to ECV for converting breech presentation to cephalic presentation and increasing adherence to the newly obtained cephalic position, and (b) to assess the effects of these treatments on the well-being of the mother and child.” 1) Poor understanding of the research question and hypotheses
2) Insufficient description of population, variables, or study outcomes

a These statements were composed for comparison and illustrative purposes only.

b These statements are direct quotes from Higashihara and Horiuchi. 16

VariablesUnclear and weak statement (Statement 1)Clear and good statement (Statement 2)Points to avoid
Research questionDoes disrespect and abuse (D&A) occur in childbirth in Tanzania?How does disrespect and abuse (D&A) occur and what are the types of physical and psychological abuses observed in midwives’ actual care during facility-based childbirth in urban Tanzania?1) Ambiguous or oversimplistic questions
2) Questions unverifiable by data collection and analysis
HypothesisDisrespect and abuse (D&A) occur in childbirth in Tanzania.Hypothesis 1: Several types of physical and psychological abuse by midwives in actual care occur during facility-based childbirth in urban Tanzania.1) Statements simply expressing facts
Hypothesis 2: Weak nursing and midwifery management contribute to the D&A of women during facility-based childbirth in urban Tanzania.2) Insufficiently described concepts or variables
Research objectiveTo describe disrespect and abuse (D&A) in childbirth in Tanzania.“This study aimed to describe from actual observations the respectful and disrespectful care received by women from midwives during their labor period in two hospitals in urban Tanzania.” 1) Statements unrelated to the research question and hypotheses
2) Unattainable or unexplorable objectives

a This statement is a direct quote from Shimoda et al. 17

The other statements were composed for comparison and illustrative purposes only.

CONSTRUCTING RESEARCH QUESTIONS AND HYPOTHESES

To construct effective research questions and hypotheses, it is very important to 1) clarify the background and 2) identify the research problem at the outset of the research, within a specific timeframe. 9 Then, 3) review or conduct preliminary research to collect all available knowledge about the possible research questions by studying theories and previous studies. 18 Afterwards, 4) construct research questions to investigate the research problem. Identify variables to be accessed from the research questions 4 and make operational definitions of constructs from the research problem and questions. Thereafter, 5) construct specific deductive or inductive predictions in the form of hypotheses. 4 Finally, 6) state the study aims . This general flow for constructing effective research questions and hypotheses prior to conducting research is shown in Fig. 1 .

An external file that holds a picture, illustration, etc.
Object name is jkms-37-e121-g001.jpg

Research questions are used more frequently in qualitative research than objectives or hypotheses. 3 These questions seek to discover, understand, explore or describe experiences by asking “What” or “How.” The questions are open-ended to elicit a description rather than to relate variables or compare groups. The questions are continually reviewed, reformulated, and changed during the qualitative study. 3 Research questions are also used more frequently in survey projects than hypotheses in experiments in quantitative research to compare variables and their relationships.

Hypotheses are constructed based on the variables identified and as an if-then statement, following the template, ‘If a specific action is taken, then a certain outcome is expected.’ At this stage, some ideas regarding expectations from the research to be conducted must be drawn. 18 Then, the variables to be manipulated (independent) and influenced (dependent) are defined. 4 Thereafter, the hypothesis is stated and refined, and reproducible data tailored to the hypothesis are identified, collected, and analyzed. 4 The hypotheses must be testable and specific, 18 and should describe the variables and their relationships, the specific group being studied, and the predicted research outcome. 18 Hypotheses construction involves a testable proposition to be deduced from theory, and independent and dependent variables to be separated and measured separately. 3 Therefore, good hypotheses must be based on good research questions constructed at the start of a study or trial. 12

In summary, research questions are constructed after establishing the background of the study. Hypotheses are then developed based on the research questions. Thus, it is crucial to have excellent research questions to generate superior hypotheses. In turn, these would determine the research objectives and the design of the study, and ultimately, the outcome of the research. 12 Algorithms for building research questions and hypotheses are shown in Fig. 2 for quantitative research and in Fig. 3 for qualitative research.

An external file that holds a picture, illustration, etc.
Object name is jkms-37-e121-g002.jpg

EXAMPLES OF RESEARCH QUESTIONS FROM PUBLISHED ARTICLES

  • EXAMPLE 1. Descriptive research question (quantitative research)
  • - Presents research variables to be assessed (distinct phenotypes and subphenotypes)
  • “BACKGROUND: Since COVID-19 was identified, its clinical and biological heterogeneity has been recognized. Identifying COVID-19 phenotypes might help guide basic, clinical, and translational research efforts.
  • RESEARCH QUESTION: Does the clinical spectrum of patients with COVID-19 contain distinct phenotypes and subphenotypes? ” 19
  • EXAMPLE 2. Relationship research question (quantitative research)
  • - Shows interactions between dependent variable (static postural control) and independent variable (peripheral visual field loss)
  • “Background: Integration of visual, vestibular, and proprioceptive sensations contributes to postural control. People with peripheral visual field loss have serious postural instability. However, the directional specificity of postural stability and sensory reweighting caused by gradual peripheral visual field loss remain unclear.
  • Research question: What are the effects of peripheral visual field loss on static postural control ?” 20
  • EXAMPLE 3. Comparative research question (quantitative research)
  • - Clarifies the difference among groups with an outcome variable (patients enrolled in COMPERA with moderate PH or severe PH in COPD) and another group without the outcome variable (patients with idiopathic pulmonary arterial hypertension (IPAH))
  • “BACKGROUND: Pulmonary hypertension (PH) in COPD is a poorly investigated clinical condition.
  • RESEARCH QUESTION: Which factors determine the outcome of PH in COPD?
  • STUDY DESIGN AND METHODS: We analyzed the characteristics and outcome of patients enrolled in the Comparative, Prospective Registry of Newly Initiated Therapies for Pulmonary Hypertension (COMPERA) with moderate or severe PH in COPD as defined during the 6th PH World Symposium who received medical therapy for PH and compared them with patients with idiopathic pulmonary arterial hypertension (IPAH) .” 21
  • EXAMPLE 4. Exploratory research question (qualitative research)
  • - Explores areas that have not been fully investigated (perspectives of families and children who receive care in clinic-based child obesity treatment) to have a deeper understanding of the research problem
  • “Problem: Interventions for children with obesity lead to only modest improvements in BMI and long-term outcomes, and data are limited on the perspectives of families of children with obesity in clinic-based treatment. This scoping review seeks to answer the question: What is known about the perspectives of families and children who receive care in clinic-based child obesity treatment? This review aims to explore the scope of perspectives reported by families of children with obesity who have received individualized outpatient clinic-based obesity treatment.” 22
  • EXAMPLE 5. Relationship research question (quantitative research)
  • - Defines interactions between dependent variable (use of ankle strategies) and independent variable (changes in muscle tone)
  • “Background: To maintain an upright standing posture against external disturbances, the human body mainly employs two types of postural control strategies: “ankle strategy” and “hip strategy.” While it has been reported that the magnitude of the disturbance alters the use of postural control strategies, it has not been elucidated how the level of muscle tone, one of the crucial parameters of bodily function, determines the use of each strategy. We have previously confirmed using forward dynamics simulations of human musculoskeletal models that an increased muscle tone promotes the use of ankle strategies. The objective of the present study was to experimentally evaluate a hypothesis: an increased muscle tone promotes the use of ankle strategies. Research question: Do changes in the muscle tone affect the use of ankle strategies ?” 23

EXAMPLES OF HYPOTHESES IN PUBLISHED ARTICLES

  • EXAMPLE 1. Working hypothesis (quantitative research)
  • - A hypothesis that is initially accepted for further research to produce a feasible theory
  • “As fever may have benefit in shortening the duration of viral illness, it is plausible to hypothesize that the antipyretic efficacy of ibuprofen may be hindering the benefits of a fever response when taken during the early stages of COVID-19 illness .” 24
  • “In conclusion, it is plausible to hypothesize that the antipyretic efficacy of ibuprofen may be hindering the benefits of a fever response . The difference in perceived safety of these agents in COVID-19 illness could be related to the more potent efficacy to reduce fever with ibuprofen compared to acetaminophen. Compelling data on the benefit of fever warrant further research and review to determine when to treat or withhold ibuprofen for early stage fever for COVID-19 and other related viral illnesses .” 24
  • EXAMPLE 2. Exploratory hypothesis (qualitative research)
  • - Explores particular areas deeper to clarify subjective experience and develop a formal hypothesis potentially testable in a future quantitative approach
  • “We hypothesized that when thinking about a past experience of help-seeking, a self distancing prompt would cause increased help-seeking intentions and more favorable help-seeking outcome expectations .” 25
  • “Conclusion
  • Although a priori hypotheses were not supported, further research is warranted as results indicate the potential for using self-distancing approaches to increasing help-seeking among some people with depressive symptomatology.” 25
  • EXAMPLE 3. Hypothesis-generating research to establish a framework for hypothesis testing (qualitative research)
  • “We hypothesize that compassionate care is beneficial for patients (better outcomes), healthcare systems and payers (lower costs), and healthcare providers (lower burnout). ” 26
  • Compassionomics is the branch of knowledge and scientific study of the effects of compassionate healthcare. Our main hypotheses are that compassionate healthcare is beneficial for (1) patients, by improving clinical outcomes, (2) healthcare systems and payers, by supporting financial sustainability, and (3) HCPs, by lowering burnout and promoting resilience and well-being. The purpose of this paper is to establish a scientific framework for testing the hypotheses above . If these hypotheses are confirmed through rigorous research, compassionomics will belong in the science of evidence-based medicine, with major implications for all healthcare domains.” 26
  • EXAMPLE 4. Statistical hypothesis (quantitative research)
  • - An assumption is made about the relationship among several population characteristics ( gender differences in sociodemographic and clinical characteristics of adults with ADHD ). Validity is tested by statistical experiment or analysis ( chi-square test, Students t-test, and logistic regression analysis)
  • “Our research investigated gender differences in sociodemographic and clinical characteristics of adults with ADHD in a Japanese clinical sample. Due to unique Japanese cultural ideals and expectations of women's behavior that are in opposition to ADHD symptoms, we hypothesized that women with ADHD experience more difficulties and present more dysfunctions than men . We tested the following hypotheses: first, women with ADHD have more comorbidities than men with ADHD; second, women with ADHD experience more social hardships than men, such as having less full-time employment and being more likely to be divorced.” 27
  • “Statistical Analysis
  • ( text omitted ) Between-gender comparisons were made using the chi-squared test for categorical variables and Students t-test for continuous variables…( text omitted ). A logistic regression analysis was performed for employment status, marital status, and comorbidity to evaluate the independent effects of gender on these dependent variables.” 27

EXAMPLES OF HYPOTHESIS AS WRITTEN IN PUBLISHED ARTICLES IN RELATION TO OTHER PARTS

  • EXAMPLE 1. Background, hypotheses, and aims are provided
  • “Pregnant women need skilled care during pregnancy and childbirth, but that skilled care is often delayed in some countries …( text omitted ). The focused antenatal care (FANC) model of WHO recommends that nurses provide information or counseling to all pregnant women …( text omitted ). Job aids are visual support materials that provide the right kind of information using graphics and words in a simple and yet effective manner. When nurses are not highly trained or have many work details to attend to, these job aids can serve as a content reminder for the nurses and can be used for educating their patients (Jennings, Yebadokpo, Affo, & Agbogbe, 2010) ( text omitted ). Importantly, additional evidence is needed to confirm how job aids can further improve the quality of ANC counseling by health workers in maternal care …( text omitted )” 28
  • “ This has led us to hypothesize that the quality of ANC counseling would be better if supported by job aids. Consequently, a better quality of ANC counseling is expected to produce higher levels of awareness concerning the danger signs of pregnancy and a more favorable impression of the caring behavior of nurses .” 28
  • “This study aimed to examine the differences in the responses of pregnant women to a job aid-supported intervention during ANC visit in terms of 1) their understanding of the danger signs of pregnancy and 2) their impression of the caring behaviors of nurses to pregnant women in rural Tanzania.” 28
  • EXAMPLE 2. Background, hypotheses, and aims are provided
  • “We conducted a two-arm randomized controlled trial (RCT) to evaluate and compare changes in salivary cortisol and oxytocin levels of first-time pregnant women between experimental and control groups. The women in the experimental group touched and held an infant for 30 min (experimental intervention protocol), whereas those in the control group watched a DVD movie of an infant (control intervention protocol). The primary outcome was salivary cortisol level and the secondary outcome was salivary oxytocin level.” 29
  • “ We hypothesize that at 30 min after touching and holding an infant, the salivary cortisol level will significantly decrease and the salivary oxytocin level will increase in the experimental group compared with the control group .” 29
  • EXAMPLE 3. Background, aim, and hypothesis are provided
  • “In countries where the maternal mortality ratio remains high, antenatal education to increase Birth Preparedness and Complication Readiness (BPCR) is considered one of the top priorities [1]. BPCR includes birth plans during the antenatal period, such as the birthplace, birth attendant, transportation, health facility for complications, expenses, and birth materials, as well as family coordination to achieve such birth plans. In Tanzania, although increasing, only about half of all pregnant women attend an antenatal clinic more than four times [4]. Moreover, the information provided during antenatal care (ANC) is insufficient. In the resource-poor settings, antenatal group education is a potential approach because of the limited time for individual counseling at antenatal clinics.” 30
  • “This study aimed to evaluate an antenatal group education program among pregnant women and their families with respect to birth-preparedness and maternal and infant outcomes in rural villages of Tanzania.” 30
  • “ The study hypothesis was if Tanzanian pregnant women and their families received a family-oriented antenatal group education, they would (1) have a higher level of BPCR, (2) attend antenatal clinic four or more times, (3) give birth in a health facility, (4) have less complications of women at birth, and (5) have less complications and deaths of infants than those who did not receive the education .” 30

Research questions and hypotheses are crucial components to any type of research, whether quantitative or qualitative. These questions should be developed at the very beginning of the study. Excellent research questions lead to superior hypotheses, which, like a compass, set the direction of research, and can often determine the successful conduct of the study. Many research studies have floundered because the development of research questions and subsequent hypotheses was not given the thought and meticulous attention needed. The development of research questions and hypotheses is an iterative process based on extensive knowledge of the literature and insightful grasp of the knowledge gap. Focused, concise, and specific research questions provide a strong foundation for constructing hypotheses which serve as formal predictions about the research outcomes. Research questions and hypotheses are crucial elements of research that should not be overlooked. They should be carefully thought of and constructed when planning research. This avoids unethical studies and poor outcomes by defining well-founded objectives that determine the design, course, and outcome of the study.

Disclosure: The authors have no potential conflicts of interest to disclose.

Author Contributions:

  • Conceptualization: Barroga E, Matanguihan GJ.
  • Methodology: Barroga E, Matanguihan GJ.
  • Writing - original draft: Barroga E, Matanguihan GJ.
  • Writing - review & editing: Barroga E, Matanguihan GJ.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 09 September 2024

Long-term longitudinal analysis of 4,187 participants reveals insights into determinants of clonal hematopoiesis

  • Md Mesbah Uddin   ORCID: orcid.org/0000-0003-1846-0411 1 , 2   na1 ,
  • Seyedmohammad Saadatagah 3 , 4   na1 ,
  • Abhishek Niroula 5 , 6 , 7 ,
  • Bing Yu   ORCID: orcid.org/0000-0003-4818-1077 8 ,
  • Whitney E. Hornsby 1 , 2 ,
  • Shriienidhie Ganesh   ORCID: orcid.org/0000-0002-4226-7526 1 , 2 ,
  • Kim Lannery 1 , 2 ,
  • Art Schuermans 1 , 2 , 9 ,
  • Michael C. Honigberg   ORCID: orcid.org/0000-0001-8630-5021 1 , 2 , 10 ,
  • Alexander G. Bick   ORCID: orcid.org/0000-0001-5824-9595 11 ,
  • Peter Libby   ORCID: orcid.org/0000-0002-1502-502X 10 , 12 ,
  • Benjamin L. Ebert   ORCID: orcid.org/0000-0003-0197-5451 5 , 6 , 13 ,
  • Christie M. Ballantyne   ORCID: orcid.org/0000-0002-6432-1730 3   na2 &
  • Pradeep Natarajan   ORCID: orcid.org/0000-0001-8402-7435 1 , 2 , 10   na2  

Nature Communications volume  15 , Article number:  7858 ( 2024 ) Cite this article

Metrics details

  • Genetic variation
  • Personalized medicine
  • Risk factors

Clonal hematopoiesis of indeterminate potential (CHIP) is linked to diverse aging-related diseases, including hematologic malignancy and atherosclerotic cardiovascular disease (ASCVD). While CHIP is common among older adults, the underlying factors driving its development are largely unknown. To address this, we performed whole-exome sequencing on 8,374 blood DNA samples collected from 4,187 Atherosclerosis Risk in Communities Study (ARIC) participants over a median follow-up of 21 years. During this period, 735 participants developed incident CHIP. Splicing factor genes ( SF3B1 , SRSF2 , U2AF1 , and ZRSR2 ) and TET2 CHIP grow significantly faster than DNMT3A non-R882 clones. We find that age at baseline and sex significantly influence the incidence of CHIP, while ASCVD and other traditional ASCVD risk factors do not exhibit such associations. Additionally, baseline synonymous passenger mutations are strongly associated with CHIP status and are predictive of new CHIP clone acquisition and clonal growth over extended follow-up, providing valuable insights into clonal dynamics of aging hematopoietic stem and progenitor cells. This study also reveals associations between germline genetic variants and incident CHIP. Our comprehensive longitudinal assessment yields insights into cell-intrinsic and -extrinsic factors contributing to the development and progression of CHIP clones in older adults.

Introduction

Clonal hematopoiesis (CH) is a common aging-related phenomenon whereby blood cells are predominantly derived from a few hematopoietic stem and progenitor cells (HSPC) with acquired somatic mutation(s) in known leukemia driver genes that foster clonal expansion. CH with a variant allele fraction (VAF) ≥ 2% is termed clonal hematopoiesis of indeterminate potential (CHIP) 1 , 2 . The most frequently mutated genes in CHIP include epigenetic regulators DNMT3A , TET2 , and ASXL1 , splicing factor genes SF3B1 , SRSF2 , and U2AF1 , and DNA damage response genes TP53 and PPM1D , and JAK2 3 . CHIP is associated with many age-related conditions, including hematologic malignancy 4 , 5 , atherosclerotic cardiovascular disease (ASCVD) 6 , stroke 7 , chronic liver disease 8 , and heart failure 9 .

Clinical consequences of CHIP differ depending on driver genes, types of mutations, growth rate, and the size of the clones 10 , 11 , 12 , 13 . Cell-intrinsic factors such as driver gene, mutation type, genetic background, and cell-extrinsic factors such as chronological age and several putative exposures and disease states across the life course may influence clonal expansion. Additionally, genetic 3 , 14 , 15 , 16 , 17 and environmental factors 18 , 19 , 20 , 21 , 22 , 23 are associated with increased odds of CHIP in cross-sectional analyses 24 , but their roles in the emergence or expansion of CHIP clones remain elusive. Small studies have evaluated determinants of incident CHIP or progression of CHIP clones—defined as the emergence or expansion of clones (VAF ≥ 2%)—through short-term serial sequencing 12 , 25 , 26 , a comprehensive understanding of the risk factors is yet to be established firmly.

In this work, we profile incident CHIP in 4187 middle-aged participants from the Atherosclerosis Risk in Communities Study (ARIC) over a median follow-up of 21 years via serial blood-based whole-exome sequencing to identify the determinants of incidence and progression of CHIP clones in older adults. We show that both cell-intrinsic factors, such as driver gene and genetic background, and cell-extrinsic factors, such as chronological age and (self-reported) sex, contribute to the incidence and progression of CHIP clones in older adults, providing valuable insight into clonal dynamics of aging HSPC.

CHIP at baseline visit

We investigated CHIP in a cohort of 10,871 participants from the ARIC Study baseline visits using whole-exome sequencing (WES) with the HiSeq 2000 platform (Supplementary Table  1 ). After excluding individuals with prevalent hematologic malignancy and those without WES data at a follow-up visit (Supplementary Fig.  1 ), our analysis focused on 4187 study participants. Table  1 and Supplementary Table  1 presents the characteristics of these participants, among whom 2478 (59.2%) were female, 951 (22.7%) were African American, and the mean (SD) age was 55.5 (5.5) years at the time of enrollment. A total of 576 CHIP clones with mutations in 51 driver genes were detected among 457 participants (Supplementary Data  1 ). Baseline CHIP prevalence was 10.9% (457/4187) at VAF ≥ 2% and 3.8% (161/4,187) at VAF ≥ 10% (Fig.  1a ). Notably, in all age categories, most of the participants with CHIP (383/457; 83.8%) carried a single mutated clone (Fig.  1b ). The frequently mutated genes included DNMT3A , TET2 , and ASXL1 , with median VAF ranging from 6.6 to 15.3% (Fig.  1c, d ). Our study provides comprehensive insights into the prevalence and characteristics of CHIP in middle-aged participants.

figure 1

Distributions of clonal hematopoiesis of indeterminate potential (CHIP) at baseline ( a – d ) and follow-up ( e – h ) visits. CHIP prevalence increases with age, reaching approximately 30% when individuals reach 80 years ( a , e ). Error bands in ( a , e ) show a 95% confidence interval around the fitted logistic regression line. Most individuals with CHIP typically carry a single clone ( b , f ). In the earlier decades of life (under 70 years), many individuals carry mutations in DNMT3A or other less commonly mutated CHIP genes ( c , g ). However, individuals aged 70 and older show a higher incidence of CHIP involving genes such as TET2 , ASXL1 , and splicing factors ( c , g ), and the size of these clones tends to be relatively larger in later years ( d , h ). Box plots display median (center line), 25th and 75th percentiles (box edges), and whiskers extend to ±1.5 * interquartile range. VAF variant allele fraction.

CHIP at the follow-up visit

We further ascertained CHIP at a later visit, with a median follow-up duration of 21 years (range = 5–27; mean = 20.3; SD = 2.03). The mean (SD) age of participants at follow-up blood draw was 75.8 (5.2) years. For this analysis, we performed WES on 4187 samples using the NovaSeq 6000 platform and identified 1302 CHIP clones at VAF ≥ 2% in 50 driver genes among 1047 participants (Supplementary Data  2 ). The prevalence of CHIP showed an age-dependent increase, reaching an overall prevalence of 25.0% (1047/4187) and 11.8% (496/4187) at VAF ≥ 2% and ≥10%, respectively (Fig.  1e, f ). With advancing age, we observed a shift in the proportions of individuals carrying specific CHIP subtypes. While the prevalence of DNMT3A CHIP and mutations in other less-frequently mutated genes decreased, mutations in TET2 , ASXL1 , and splicing factor genes increased (Fig.  1g ). Additionally, clone sizes tended to increase with advancing age, with a median VAF ranging from 7.7 to 10.2% (Fig.  1h ). The shifting patterns and increasing clone sizes of CHIP subtypes in older adults show a dynamic nature of CHIP over time. However, a total of 269 mutations detected at baseline at VAF ≥ 2% were lost during follow-up (Supplementary Fig.  2 ), whereas 972 clones newly emerged at the follow-up visit (Supplementary Fig.  3 ). We detected 352 clones at both visits at a VAF ≥ 2%. Among these clones, 233 (66%) were growing, 33 (9%) were shrinking, and 86 (24%) remained static during the follow-up period (Fig.  2 ).

figure 2

Here, 233 clones are growing, 33 are shrinking, and 86 are static. CHIP clonal hematopoiesis of indeterminate potential; VAF variant allele fraction.

Concordance of CHIP calls from the HiSeq vs NovaSeq sequencing platforms

Here, baseline and follow-up visits were sequenced using two different sequencing platforms, HiSeq 2000 and NovaSeq 6000. We re-sequenced 786 samples from the same baseline visit using NovaSeq to assess the systematic difference in estimated VAF and the concordance of CHIP ascertainment between the two sequencing platforms. Concordance estimates for CHIP clones detected “yes/no” by the two sequencing platforms were 83% (654/786) and 93% (731/786) at VAF ≥ 2% and VAF ≥ 10%, respectively. We also observed a strong correlation (Pearson’s r = 0.80) between the VAF estimates from these two platforms (Supplementary Fig.  4 ). As the VAF did not correlate perfectly, our primary analysis was on determinants of incident CHIP, defined as a CHIP clone at VAF ≥ 2% detected at the follow-up visit only without any prevalent CHIP clone at baseline (i.e. VAF < 2% at baseline). In a secondary analysis, we considered factors affecting clonal growth rate.

Incident CHIP in the ARIC Study

We identified 3730 participants without prevalent CHIP, of which 59.7% (2226/3730) were female, 23.2% (865/3730) were African American, and 53.7% (2004/3730) had a history of smoking (Table  1 ). A total of 735 (19.7%) participants developed incident CHIP (VAF ≥ 2%) during the follow-up, of which 37% (272/735) had large clones (VAF ≥ 10%). Individuals with incident CHIP were relatively older at the baseline visit (median age of 56 vs 54 years; Wilcoxon rank sum test P = 2.4E−6). Here, 876 incident clones were detected in 735 participants, where the majority (615/735; ~84%) acquired a single clone during follow-up (Fig.  3a ). Most incident CHIP mutations occurred in DTA ( DNMT3A, TET2 , or ASXL1 ), followed by SF ( SF3B1, SRSF2, U2AF1 , or ZRSR2 ) and DDR ( PPM1D or TP53 ) genes, and large clones (VAF ≥ 10%) in ASXL1 , SF3B1, JAK2, ZNF318, U2AF1 , and ZRSR2 (Fig. 3b, c ). CHIP incidence increased with advancing age, where >23% of participants older than 75 years acquired incident CHIP (Fig.  3d ).

figure 3

a The vast majority (over 83%) of individuals with incident CHIP carry a single clone, b with approximately 37% of the clones exhibiting an expanded state (VAF > = 10%). b DNMT3A and TET2 show higher proportions of smaller incident clones (VAF between 2% and 10%), while ASXL1, JAK2, U2AF1 , and ZRSR2 display expanded clones. c The median clone size exceeds 10% for ASXL1, SF3B1, U2AF1 , and ZRSR2 CHIP. Box plots display median (center line), 25th and 75th percentiles (box edges), and whiskers extend to ±1.5 * interquartile range. d Like CHIP prevalence (Fig.  1 ), CHIP incidence increases with age. Error bands show a 95% confidence interval around the fitted logistic regression line. VAF variant allele fraction.

Clinical predictors of incident CHIP

First, we performed univariable logistic regression to examine the associations of baseline risk factors such as age, sex, race, body mass index (BMI), high-density lipoprotein cholesterol (HDL-C), non-HDL-C, history of smoking, hypertension, ASCVD, and T2D with incident CHIP vs. non-CHIP (binary outcome) (Supplementary Fig.  5 ). In univariable analyses, we observed significant associations (P < 0.0025, considering 20 independent tests at a 5% level of significance) of age at baseline, male sex, and European with incident CHIP categories (Supplementary Fig.  5 ). Age was significantly associated with a higher incidence of overall CHIP, TET2 , and SF mutations (1.03≤ odds ratio (OR) ≤ 1.09; 5.4E−06 ≤ P ≤ 3.2E−4), and nominally associated (0.0025 < P < 0.05) with incidence of ASXL1 and DDR mutations (Supplementary Fig.  5 ). However, no association was found between age and incident DNMT3A mutations. Male sex was significantly associated with a higher incidence of overall CHIP, ASXL1 , and SF (1.32 ≤ OR ≤ 2.82; 6.9E−5 ≤ P ≤ 9E−4) and nominally associated with higher incidence of DDR mutations (Supplementary Fig.  5 ). European American was significantly associated with lower incidence of DNMT3A (OR = 0.67; 95% confidence interval (CI) = 0.52–0.86; P = 1.8E−3), and nominally associated with higher incidence of SF mutations (Supplementary Fig.  5 ). Additionally, there were nominal non-significant (0.0025 < P < 0.05) associations between smoking status (never vs. ever) and incident SF, BMI (inverse rank normalized) and incident DNMT3A , history of ASCVD and incident DNMT3A , and non-HDL-C level (inverse rank normalized) and incident TET2 (Supplementary Fig.  5 ).

Next, we performed multivariable-adjusted logistic regression analyses of incident CHIP categories (CHIP vs. non-CHIP) vs. baseline risk factors, including age, sex, race, BMI, HDL-C, non-HDL-C, history of smoking, hypertension, ASCVD, and T2D (Fig.  4 , and Supplementary Fig.  6 ). Fully adjusted regression models included baseline risk factors and covariates such as age, sex, race, BMI, HDL-C, non-HDL-C, cholesterol-lowering medication use, history of smoking, hypertension, ASCVD, T2D, baseline visits, and visit center. Age was independently associated with a higher incidence of overall CHIP, as well as gene-specific CHIP subtypes (1.04 ≤ OR ≤ 1.10; 2.0E−7 ≤ P ≤ 7.4E−5), with higher odds for splicing factor genes (Fig.  4 and Supplementary Fig.  6 ). Male sex was nominally associated with a higher incidence of overall and DDR CHIP, and significantly associated with a higher incidence of ASXL1 and SF CHIP (1.30 ≤ OR ≤ 2.79; 7.5E−4 ≤ P < 0.05). European ancestry was nominally associated with a lower incidence of CHIP in DNMT3A and a higher incidence of CHIP in DDR. Interestingly, no significant association was observed between BMI, HDL-C, non-HDL-C, history of smoking, hypertension, T2D, and ASCVD, and incident CHIP categories (Fig.  4 and Supplementary Fig.  6 ). However, we observed a nominal non-significant association of higher BMI with reduced incident DNMT3A , and history of ASCVD with increased incident DNMT3A but reduced incident TET2 (Fig.  4 ). We performed sensitivity analyses with stringent incident CHIP case-control definition; findings were robust to the case-control status (“Methods” section; Supplementary Data  3 and 4 ).

figure 4

Forest plot showing odds ratio (OR) with a 95% confidence interval (CI) from multivariable logistic regression analyses examining the association between incident overall CHIP (N = 3614; 709 cases and 2905 controls), incident DNMT3A CHIP (N = 3614; 302 cases and 3312 controls,) and incident TET2 CHIP (N = 3614; 160 cases and 3454 controls), and baseline cardiovascular risk factors. Uncorrected P -values are from two-sided Z-tests. The adjusted model accounted for age, sex, race, body mass index (BMI), high-density lipoprotein cholesterol (HDL-C), non-HDL-C, cholesterol medication usage, smoking history, hypertension, atherosclerotic cardiovascular disease (ASCVD, including coronary heart disease and/or ischemic stroke), type 2 diabetes (T2D), and batch effects. Inverse rank normalization was performed before the analysis to account for potential variations in BMI, HDL-C, and non-HDL-C distribution. The results indicate that age is independently and significantly associated with a higher incidence of CHIP. Additionally, nominal non-significant associations are observed for male sex, European American, BMI, and history of ASCVD with incident CHIP categories. ***P < 0.0025 (0.05/20); **P < 0.01; *P < 0.05.

In secondary analyses, we tested the associations of smoking categories (never vs. former or current smokers), BMI categories (BMI < 25 vs. 25–30 or >30 kg/m 2 ), triglyceride to HDL-C (TG/HDL-C) ratio, dyslipidemia, male sex by smoking status (never vs. ever) with incident CHIP categories in fully adjusted models (Supplementary Figs.  7 and 8 ). There were no significant associations between smoking status (never vs. former or current smoker) or BMI categories and incident CHIP categories (Supplementary Fig.  7 ). However, we observed a nominal non-significant association between current smoking status and incident CHIP in splicing factor genes, and between BMI > 30 kg/m 2 and incident ASXL1 (Supplementary Fig.  7 ).

No significant associations were observed between TG/HDL-C ratio and incident CHIP (Supplementary Fig.  8a ), although dyslipidemia did significantly associate with increased odds of incident TET2 (OR = 2.51; P = 2.5E−3; Supplementary Fig.  8b ). A nominal association was observed between dyslipidemia and increased odds of incident ASXL1 (Supplementary Fig.  8b ). We also tested interactions between sex and smoking history vs. incident CHIP categories in exploratory analyses. Fully adjusted models did not reveal significant interactions between sex and smoking history on incident CHIP categories. Nevertheless, there was a nominal interaction between sex and smoking history: we observed lower odds of incident DNMT3A in the male sex by ever-smoker status (OR = 0.48; P = 0.0036) (Supplementary Fig.  8c ).

Shared genetic predisposition in prevalent and incident CHIP

We separately assessed the association of an independently derived prevalent CHIP polygenic risk score (PRS), consisting of 21 conditionally independent and genome-wide significant (P < 5E−8; Supplementary Data  5 ) variants 15 , with incident CHIP categories in African American (AA) and European American (EA) participants, followed by inverse-variance weighted fixed-effect meta-analysis (Fig. 5a, b ). We found that per SD increase in genetic liability to prevalent CHIP was significantly associated with incident CHIP (OR = 1.23; 95% CI 1.12–1.35; P = 8.9E−6). Furthermore, genetic liability to prevalent CHIP was associated with incident DNMT3A (OR = 1.28; 95% CI 1.12–1.46), TET2 (OR = 1.30; 95% CI = 1.09–1.55), and ASXL1 (OR = 1.66; 95% CI = 1.26–2.17) CHIP (Fig.  5b ).

figure 5

Association of prevalent clonal hematopoiesis of indeterminate potential (CHIP) with incident CHIP: a , b polygenic risk score (PRS) and c germline variants analysis. a Distribution of PRS in ARIC AA and EA participants. b Ancestry-stratified PRS was calculated using 21 independent variants (P < 5E−8; Supplementary Data  5 ) from Kessler et al. 15 . Logistic regression was performed, adjusting for age, sex, smoking status, top five principal components of ancestry, and batch effect. Data are presented as odds ratio (OR) with a 95% confidence interval (CI) and uncorrected P-value from a two-sided Z-test. The results show a strong association between prevalent CHIP PRS and incident CHIP. c Forest plot of odds ratio with 95% CI from inverse variant-weighted fixed-effect meta-analyses of single-variant associations for incident overall CHIP, incident DNMT3A CHIP, and incident TET2 CHIP in ARIC AA (n = 637) and ARIC EA (n = 2378) participants. Uncorrected P -values are from two-sided Z-tests. Single-variant association adjusted for age, age 2 , sex, top ten principal components of ancestry, and batch effect, followed by multi-ancestry meta-analysis, was conducted to investigate the association of genome-wide significant (P < 5E−8) prevalent CHIP-associated variants 15 and incident CHIP at a significant level of P < 0.05. This figure presents the lead variant(s) from each locus, and the full list is available in Supplementary Data  6 – 8 . Several germline variants previously associated with prevalent CHIP were found to be associated with incident CHIP, indicating a shared genetic basis between prevalent and incident CHIP. ARIC AA the Atherosclerosis Risk in Communities Study African American; ARIC EA European American; EA effect allele; EAF effect allele frequency; SNV single nucleotide variant.

Next, to test the associations of genome-wide significant (P < 5E−8) variants for prevalent CHIP with risk of incident CHIP, we performed targeted single-variant associations in AA and EA participants from ARIC. We tested known loci associated with prevalent CHIP derived in the UK Biobank by Kessler et al. 15 . For matching variants with minor allele frequency ≥1%, we performed ancestry-stratified single-variant associations for incident CHIP, DNMT3A , and TET2 in the ARIC study, followed by a multi-ancestry inverse-variance weighted fixed-effect meta-analysis. A P threshold of <0.05 was considered statistically significant. At this threshold, we reported several prevalent CHIP loci associated with incident CHIP (Fig.  5c and Supplementary Data  6 – 8 ). Notably, the risk alleles in SMC4, TERT, HBS1L, ZNF318, GSDMC, ATM, TCL1A , and SETBP1 loci were associated with a higher incidence of overall CHIP (1.16 ≤ OR ≤ 1.41; 0.0011 ≤ P ≤ 0.044; Fig.  5c and Supplementary Data  6 ). Risk alleles in SMC4 , TERT , CD164 , HBS1L , GSDMC , SETBP1 , and BCL2L1 loci were associated with higher incidence of DNMT3A CHIP (1.24 ≤ OR ≤ 1.61; 0.0017 ≤ P ≤ 0.043; Fig.  5c and Supplementary Data  7 ). The risk alleles in SMC4, TERT, CD164, HBS1L, H2AFV, GSDMC, TCL1A, DNAH2, SETBP1 , and RUNX1 loci were associated with higher incidence of TET2 CHIP (1.35 ≤ OR ≤ 1.71; 0.0013 ≤ P ≤ 0.048; Fig.  5c and Supplementary Data  8 ). Here, rs2887399-T, the non- DNMT3A CHIP-associated protective variants in TCL1A promoter 8 , were nominally associated with incident overall CHIP and incident TET2 CHIP (P < 0.2) with similar effect estimate for TET2 (Supplementary Table  2 ). Importantly, variants in the TCL1A locus at discovery P < 0.05 were in strong linkage equilibrium with rs2887399-T (Supplementary Table  3 ).

Determinants of clonal growth

To identify determinants of clonal growth rate—calculated as log(VAF Followup /VAF Baseline )/ (Age Followup – Age Baseline ) when dVAF >0, we examined cell-intrinsic factors such as type of mutation and CHIP driver gene, and cell-extrinsic factors such as age at baseline, self-reported sex (female vs. male), self-reported race (African American vs. European American), smoking status (never vs ever), prevalent disease status, HDL-C, non-HDL-C, and BMI. In aggregate clonal growth rate was strongly associated with driver gene, but not with mutation type (Table  2 and Supplementary Table  4 ). Compared to DNMT3A non-R882 clones, splicing factor genes that included SF3B1 , SRSF2 , U2AF1 , and ZRSR2 clones, and TET2 clones grew significantly ( P  < 0.0025) faster, followed by nominal associations for DNA damage response genes ( PPM1D and TP53 ), other gene category, and ASXL1 clones. However, we did not observe a significant difference in the growth rate of DNMT3A non-R882 and DNMT3A R882 clones. Age was positively associated with clonal growth rate, though there was no other significant association between growth rate and traditional ASCVD risk factors, including sex, race, smoking, BMI, hypertension, HDL-C, or non-HDL-C level (Table  2 and Supplementary Table  4 ).

Inferring CHIP occurrences from synonymous passenger mutations

HSPCs acquire somatic mutations with aging, the majority of which are neutral and remain below the detection threshold or stochastically disappear from the cell population 27 . Neutral somatic mutations cooccurring with a (detected or undetected) driver can be positively selected and reach a higher VAF simply by hitchhiking—these are known as passenger mutations 28 . It is known from previous studies that passenger mutations can be used to identify new driver genes 29 , 30 , 31 and estimate the fitness effect of known drivers 8 . However, the utility of synonymous passenger mutations detected in the WES of healthy individuals for predicting CHIP occurrences is not known. Here, we analyzed 4187 WES from ARIC baseline visits with longitudinal CHIP ascertainment and identified 6789 synonymous passenger mutations (1% ≤ VAF ≤ 25%) in 1018 ARIC participants (Supplementary Data  9 and Supplementary Fig.  9 ).

Approximately half of the detected clones are within the VAF range of 5–10%, and 81% of clones are within the range of 1% ≤ VAF ≤ 10%. The mutational spectrum of nonsynonymous CHIP mutations and synonymous passenger mutations is presented in Supplementary Fig.  10a–d . C>T transitions are the most frequent nonsynonymous mutation in CHIP, followed by T>C transitions (Supplementary Fig.  10a, b ), whereas T>C transitions are the most frequent synonymous mutation, followed by C>T transitions (Supplementary Fig.  10c, d ). To assess the utility of synonymous passenger counts, we performed multivariable regression analyses between CHIP outcomes and passenger counts (“Methods” section) and found that passenger counts were strongly associated with the presence of CHIP (1.1E−30 ≤ P < 0.05) and were predictive of both prevalent and incident CHIP (Fig.  6a ). Additionally, passenger counts were strongly associated with the number of CHIP clones in a multivariable multinomial logistic regression ( P  < 5.6E−9; Fig.  6b ). Furthermore, passenger counts were significantly associated with clonal growth (dVAF>0) in a multivariable logistic regression (P = 5.0E−4; Fig.  6c ). Associations between CHIP categories and synonymous passenger counts in Fig.  6a–c were independent of demographic and other clinical exposures including enrollment age, age 2 , sex, race, smoking status, BMI, HDL-C, non-HDL-C, and prevalent disease status for hypertension, ASCVD, and T2D. Additionally, the clone size of synonymous passenger mutations (i.e., VAF) was positively correlated (Pearson’s r > 0.10, P < 0.05; Supplementary Table  5 ) and strongly associated with the growth rate of CHIP clones (P < 0.05; Supplementary Table  6 ).

figure 6

Association of synonymous passenger mutation counts with a presence of CHIP, b number of CHIP clones, and c growth of CHIP clones. Forest plot of odds ratio (OR) with a 95% confidence interval (CI) showing the effects of (inverse rank normalized) synonymous passenger counts on CHIP-related outcomes (listed below in a – c )—models accounted for age at baseline, age 2 , self-reported sex, self-reported race, body mass index (BMI), high-density lipoprotein cholesterol (HDL-C), non-HDL-C, cholesterol medication usage, smoking history, hypertension, atherosclerotic cardiovascular disease (ASCVD, including coronary heart disease and/or ischemic stroke), type 2 diabetes (T2D), and batch effects (including ARIC baseline visit and visit center). Uncorrected P-values are from two-sided Z-tests. a Model 1: four separate multivariable logistic regression analyses were performed for outcomes (i) no CHIP (n = 2905) vs. presence of CHIP at either visit (n = 1147), (ii) no CHIP at baseline (n = 3614) vs. CHIP detected at baseline (i.e. prevalent CHIP, n = 438), (iii) no CHIP at follow-up visit (n = 3043) vs. CHIP detected at the follow-up visit (n = 1009), and (iv) no CHIP (n = 2905) vs. incident CHIP (n = 709). b Model 2: multivariable multinomial logistic regression was performed for the number of CHIP clones as outcomes, i.e., no CHIP clone (n = 2905) vs. one clone (n = 877), 2 clones (n = 206), 3 clones (n = 43), or ≥4 clones (n = 21). c Multivariable logistic regression was performed for no CHIP clone (or no change in clone size, dVAF = 0; n = 2909) vs. growing CHIP clone (i.e. dVAF > 0; n = 936). BMI, HDL-C, and non-HDL-C values were inverse rank normalized. CHIP: clonal hematopoiesis of indeterminate potential; dVAF = VAF follow-up  − VAF baseline .

We conducted one of the largest long-term longitudinal whole-exome sequencing studies on clonal hematopoiesis involving 4,187 healthy participants from the ARIC study with a median follow-up of 21 years. We identified CHIP (VAF ≥ 2%) in 11% of the participants at baseline and 25% at the follow-up visits. Consistent with previous studies 3 , 6 , we showed that the prevalence of CHIP increases with advancing age.

We observed incident CHIP in approximately 17% of participants below 70 years, and this proportion increased to around 30% in individuals above 85 years of age among those without CHIP in middle age. These results reinforce the substantial impact of age on the acquisition of CHIP. Furthermore, our study revealed an interesting pattern of decreasing diversity in mutated CHIP genes with advancing age. Specifically, we noted an increase in incident clones in genes such as TET2 , ASXL1 , and splicing factor genes ( SF3B1 , U2AF1 , SRSF2 , and ZRSR2 ), as well as PPM1D and TP53 . In multivariable linear regression, we further show that splicing factor genes and TET2 clones expanded significantly faster than DNMT3A non-R882 clones over a median 21-year follow-up. The clonal growth rate after CHIP was manifest was not associated with traditional ASCVD risk factors. These findings corroborate and extend recent reports highlighting reduced clonal diversity (VAF ≥ 2%), faster expansion, and the acquisition of new mutations in these genes as individuals age 12 , 32 , 33 .

Previous cross-sectional studies reported associations of age, smoking, T2D, and BMI with prevalence of CHIP 4 , 5 , 18 , 34 . Interestingly, the impact of age on CHIP subtypes varied, with stronger associations observed for splicing factor genes, TET2 , and ASXL1 CHIP, but not for DNMT3A CHIP. Surprisingly, besides age, we did not find any significant association between certain traditional risk factors for atherosclerosis, such as a history of smoking, hypertension, T2D, BMI, and LDL-C, and the risk for incident CHIP, contrary to previous cross-sectional observations related to prevalent CHIP. Our findings aligned with a recent longitudinal study demonstrating no association between smoking, being overweight, and acquiring new CHIP mutations 33 . Another study reported an inverse association between HDL-C and clonal expansion 35 . While we observed a nominal association between non-HDL-C and incident TET2 CHIP in univariable analysis, this association did not hold in a fully adjusted logistic regression model.

Our genetic analyses provide compelling evidence for a shared genetic basis between prevalent and incident CHIP. For the first time, we demonstrated strong associations between prevalent CHIP PRS and increased odds of incident CHIP, underscoring the predictive value of CHIP PRS in identifying individuals at risk of developing CHIP. Furthermore, the genetic variants previously linked to prevalent CHIP 15 also showed associations with incident CHIP in our study using the ARIC dataset. Specifically, we observed significant associations for SMC4 , TERT , ATM , TCL1A , and SETBP1 loci for incident CHIP. These findings further support the notion of a shared genetic underpinning between prevalent and incident CHIP, emphasizing the relevance of these specific loci in the development and progression of clonal hematopoiesis. Importantly, these results provide evidence for the causality of key genes in the development of CHIP itself.

However, it is worth noting that only a quarter of the prevalent CHIP-associated loci were associated with incident CHIP. Notably, certain loci, including PARP1 , LY75 , SENP7 , TET2 , CD164 , and ITPR2 , showed no association with incident CHIP. This could be due to lower statistical power, poor imputation quality, or true biological differences between prevalent and incident CHIP. The CD164 locus is an example of the latter, where this locus is strongly associated with prevalent CHIP 15 , 16 , 17 but lacks any association with incident CHIP. The opposite effect directions observed at the CD164 locus for incident DNMT3A and TET2 CHIP could explain this null association in overall CHIP. Previously, we and others reported opposite associations at the TCL1A locus with prevalent DNMT3A and TET2 CHIP, leading to a null association with overall CHIP 3 , 15 , 16 , 17 . This study found an association at the TCL1A locus with incident overall CHIP and TET2 CHIP but not with incident DNMT3A CHIP. While our study was insufficiently powered to identify the lead variant TCL1A rs2887399-T, reported in Weinstock et al. 8 , we observed an identical effect estimate for incident TET2 CHIP, reaching statistical significance ( P ~0.05) (Supplementary Table  2 ). Notably, the association of incident DNMT3A tended toward null, with P approaching 1.0. Furthermore, it is crucial to highlight that incident TET2 and overall CHIP-associated variants identified in this study exhibit strong linkage disequilibrium with TCL1A rs2887399-T (Supplementary Table  3 ). Our findings align with recent research demonstrating the involvement of TCL1A in the expansion of various non- DNMT3A CHIP subtypes, including TET2 8 .

The ongoing debate surrounding the relationship between CHIP and the development of ASCVD has led to investigations exploring the role of CHIP if it triggers inflammation and contributes to the development of ASCVD 6 , 36 , or if ASCVD itself or clinical risk factors for ASCVD promote the development of CHIP 37 , as well as a potential bidirectional link between CHIP and inflammation 38 . While baseline ASCVD status showed a nominal association with incident DNMT3A and TET2 CHIP, the associations were not statistically significant and were in opposite directions. Secondary analyses revealed a strong link between dyslipidemia and incident TET2 and potentially ASXL1 CHIP, while no such associations were found for DNMT3A or overall CHIP. These findings partially support the hypothesis that dyslipidemia may influence CHIP development 37 .

Nonetheless, explanations based solely on ASCVD (or associated risk factors) are insufficient in understanding the development of CHIP among older adults. The results from our clinical and genetic analyses point to intricate relationships between environmental and genetic risk factors in the development of clonal hematopoiesis, providing strong evidence for a bidirectional association between CHIP and inflammation. Notably, studies in mice and zebrafish support a positive feedback loop between CHIP expansion and inflammation, where CHIP promotes inflammation, and the relative fitness and resistance to inflammation of CHIP clones further fuel clonal expansion, creating a vicious cycle of inflammation and expansion 38 , 39 , 40 . These findings highlight the complexity of the interactions between exposome, inherited, and somatic genomes, shedding light on the pathogenesis of CHIP in the context of (inflamm-)aging 41 , 42 , 43 . Further research is crucial to fully unravel these intricate relationships’ underlying mechanisms and potential therapeutic implications.

Finally, we assessed the utility of synonymous passenger mutations in understanding clonal dynamics in aging HSPCs. We show that baseline synonymous passenger counts are strongly associated with CHIP status and are predictive of acquiring new CHIP clones (VAF ≥ 2%) and clonal growth that were ascertained after a long median follow-up interval of 21 years. Furthermore, passenger clone size was moderately correlated with the growth rate of CHIP clones. Our findings suggest that synonymous passengers detected in WES are a good proxy for clonal growth and are predictive of CHIP occurrences, providing valuable insights into the dynamics of somatic mutations in aging HSPCs.

Although our study demonstrates associations between environmental and genetic determinants, important limitations should be considered. First, variability in CHIP ascertainment: CHIP was determined using two different sequencing platforms. The technical differences, albeit with good correlation, precluded the ability to compare clonal trajectories and growth rates and investigate factors associated with VAF changes but were suitable for incident CHIP analyses. Second, homogeneity in follow-up time: the study was constrained by a lack of heterogeneity in the follow-up time among participants as well as a single follow-up sampling. This limited the ability to perform time-to-event analyses between the baseline exposures and the incidence of CHIP. Longer follow-up durations and varying time intervals could provide a more comprehensive understanding of the relationship between exposures and the development of CHIP. Third, although the long duration of almost two decades between time points was a strength in regards to examining the impact of age, a substantial number of individuals died during this time period. Participants without follow-up WES had more clinical ASCVD, T2D, hypertension, cholesterol medication usage, increased BMI, and unfavorable lipid profiles (as shown in Supplementary Table  1 ). Thus, individuals with the most severe ASCVD and perhaps the greatest baseline inflammation were more likely to die before the follow-up visit (ARIC visit-5), limiting our power and biasing our results towards the null. Fourth, the study may have suffered from reduced statistical power, which could have influenced the observed associations between exposures and outcomes. Larger sample sizes would enhance the study’s statistical power and provide more reliable conclusions. Fifth, WES data limited our passenger mutation search to only coding regions; focusing on only synonymous mutations further reduced the pool of putative passenger mutations in the population. This could limit the generalizability of the findings from this study to studies that include a more diverse set of mutation types detected across the genome for passenger mutation. Therefore, these limitations should be considered when interpreting the findings, and future studies should aim to address these issues to further advance our understanding of the relationships between environmental and genetic determinants and their impact on clonal hematopoiesis.

Our comprehensive long-term longitudinal assessment provides new insights into the factors that promote incident CHIP and clonal growth rate in older adults. We find that age at baseline, sex, and dyslipidemia are significant predictors of incident CHIP, while ASCVD and traditional risk factors for ASCVD are not. We also find that the factors driving clinically relevant clonal expansion (VAF ≥ 2%) may vary, to some extent, among different CHIP driver genes. Additionally, our research demonstrates a shared genetic basis between prevalent and incident CHIP, further supporting the notion that CHIP can evolve independently of prevalent ASCVD and traditional ASCVD risk factors. Taken together, these results support a bidirectional relationship between CHIP and inflammation. Early identification of incident CHIP presents an opportunity for timely intervention and preventive measures. These findings underscore the importance of further research to understand the mechanisms underlying incident CHIP better and to develop strategies for its early detection and effective management.

Informed consent was obtained from all study participants, and the study design and methods were approved by the respective institutional review boards at each of the collaborating institutions: University of Mississippi Medical Center Institutional Review Board (ARIC: Jackson Field Center); Wake Forest University Health Sciences Institutional Review Board (ARIC: Forsyth County Field Center); University of Minnesota Institutional Review Board (ARIC: Minnesota Field Center); and Johns Hopkins University School of Public Health Institutional Review Board (ARIC: Washington County Field Center). Each study received institutional certification before depositing sequencing data into dbGaP, ensuring approval by all relevant institutional ethics committees and compliance with relevant ethical regulations. The secondary use of genomic data was approved by the Mass General Brigham Institutional Review Board (protocol 2016P002395 and 2016P001308).

Study samples

There were 10,881 ARIC participants with WES data at baseline sequenced using the HiSeq 2000 platform (Illumina, Inc., CA) (ARIC sub-study phs000668 CHARGE-S 44 https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000668.v6.p2 ). Baseline samples were from five visits with age ranges 44–84 years (Mean = 57.3; Median = 57.0; SD = 6.06). WES was also generated using the NovaSeq 6000 platform (Illumina, Inc., CA) from 4233 longitudinal visit samples (ARIC Visit 05). Among the longitudinal visit samples, 4,187 without hematologic malignancy had baseline WES data (detailed study design in Supplementary Fig.  1 ).

Detection of clonal hematopoiesis of indeterminate potential (CHIP)

Somatic mutations were identified from WES using Mutect2 software 45 in the Terra platform ( https://portal.firecloud.org/?return=terra#methods/gatk/mutect2-gatk4/21 ), annotated using ANNOVAR software 46 , and CHIP was detected using a publicly available pipeline ( https://app.terra.bio/#workspaces/terra-outreach/CHIP-Detection-Mutect2/ ). To minimize potential artifacts in the Mutect2 calls, a panel-of-normal (PON) was created from 100 random HiSeq WES from the youngest participants, while 1000 Genomes PON was used for the NovaSeq WES. Besides, the Genome Aggregation Database (gnomAD) was used to limit germline variants in the somatic mutation call. Mutect2 calls were further filtered, and variants were kept if (i) total depth of coverage (DP) ≥ 20, (ii) number of reads supporting the alternate allele (AD_Alt) ≥ 3, (iii) ≥1 read in both forward and reverse direction supporting the alternate allele (F1R2_Alt and F2R1_Alt), (iv) variant allele fraction ≥2%, (v) gnomAD allele frequency ≤0.001 (not hotspot mutations). CHIP variants that passed sequence-based filtering underwent additional curation, wherein Mutect2 annotation criteria encompassed “PASS,” “weak_evidence,” “multiallelic,” or “germline.” Variants with frequencies exceeding those of DNMT3A R882 were excluded. Moreover, variants were eliminated if their VAF was less than 10% in the majority of clones, specifically those annotated with “weak_evidence,” and exhibited AD_Alt < 5. To remove potential oxoG artifacts, G>T and C>A substitutions with F1R2_Alt = 1 or F2R1_Alt = 1 were excluded from the analysis. Indel variants positioned at the ends of sequence reads, denoted by Mutect2 annotation “MPOS” ≤10 or >45, were also excluded. Additionally, indels in proximity to homopolymer regions were excluded if AD_Alt < 5 and the median VAF was less than 10%. A subset of the identified CHIP variants underwent manual inspection using the Integrative Genomics Viewer. In instances where CHIP was detected in Mutect2 during one visit compared to another, the sequence read pileup for the corresponding genomic coordinates was extracted from the bam or cram file. Pathogenic variants were queried in 69 genes known to drive clonal hematopoiesis and myeloid malignancies 6 , 47 to identify CH. The detailed CHIP calling pipeline was previously reported by Bick et al. 3 and Uddin et al. 17 ( https://app.terra.bio/#workspaces/terra-outreach/CHIP-Detection-Mutect ).

Hotspot mutations in U2AF1

A special approach was used to identify somatic variants in U2AF1 since an erroneous segmental duplication in the hg38 reference genome resulted in a mapping score of zero for this gene during the sequence alignment from FASTQ to BAM/CRAM. We used a custom script ( https://github.com/MMesbahU/U2AF1_pileup ) to recover hotspot mutations: S34F, S34Y, R156H, Q157P, and Q157R. A minimum of 5 supporting reads for alternate alleles was required to include a somatic mutation in U2AF1 .

HiSeq vs. NovaSeq WES

To analyze the concordance between the two platforms used for WES, we re-sequenced 786 baseline samples using the NovaSeq 6000 platform (Illumina, Inc., CA). CHIP was detected using the same pipeline described above. We compared CHIP status “yes/no” at VAF ≥ 2% and VAF ≥ 10%. We also performed Pearson’s correlation between the VAF estimates when the same clone was detected from WES generated by the two platforms.

Associations between baseline risk factors and incident CHIP

Incident CHIP was defined as the presence of a clone with a VAF of ≥2% at the follow-up visit in individuals who did not have a prevalent CHIP clone (or CHIP at VAF < 2%) at baseline. The CHIP mutations were considered if they met the following criteria: a read depth ≥20, a minimum of 3 supporting reads for the alternative allele, and at least one read from both the forward and reverse directions.

Univariable and multivariable analyses were performed using a logistic regression model to examine the associations between incident overall CHIP vs. non-CHIP and driver gene-specific incident CHIP categories (incident DNMT3A , incident TET2 , incident ASXL1 , incident SF, and incident DDR) vs. non-CHIP or other CHIP categories, and baseline risk factors. The tested risk factors included age, self-reported sex, self-reported race, smoking status, body mass index (BMI), high-density lipoprotein cholesterol (HDL-C), non-HDL-C, disease status for type 2 diabetes (T2D), atherosclerotic cardiovascular disease (ASCVD, including ischemic stroke and coronary heart disease), and hypertension. Additional covariates considered in the analysis included cholesterol medication usage, baseline visits, and visit centers.

Secondary analyses were performed to test the association between exposures such as smoking categories (never vs former or current smoker), BMI categories (BMI < 25 vs. 25 ≤ BMI ≤ 30 or BMI > 30 kg/m 2 ), triglyceride to HDL-C (TG/HDL-C) ratio and dyslipidemia, and incident CHIP. Interactions between sex and smoking status were also tested in a fully adjusted model for association with incident CHIP.

Here, dyslipidemia was defined as a binary variable (“yes/no”) based on the following criteria: individuals with total cholesterol ≥240, triglyceride ≥200, LDL-C ≥ 160, HDL-C < 40 in men or HDL-C < 50 mg/dl in women, and/or individuals on statin therapy. Inverse rank normalization was performed before the analysis to account for potential variations in the distribution of BMI, HDL-C, non-HDL-C, and TG/HDL-C values. To account for multiple comparisons in the analysis, a significance threshold of <0.0025 was employed, considering 20 independent tests at a 5% significance level.

We also performed sensitivity analysis where CHIP was detected at the follow-up visit with baseline sequence coverage for corresponding positions ≥20 with all nonmutant alleles or clones with VAF < 0.1% for the mutation. Here, we applied stringent filtering where all variants newly detected at the follow-up visit required ≥5 supporting reads where 2 reads each needed from both forward and reverse directions. The findings were consistent with our primary analysis (Supplementary Data  3 and 4 ).

Single-variant association analysis

Imputed genotype data from the ARIC sub-study GENEVA (dbGaP accession phs000090 “phg000248” https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000090.v8.p2 ) was used for genetic analyses. From dbGaP, we downloaded genotype data (Affymetrix 6.0 SNP array) imputed to the whole-genome sequence using the 1000 Genomes reference panel ( www.1000genomes.org , June 2011 release) 48 using IMPUTE2 software 49 (details in dbGaP phs000090/phg000248/). Imputed GWAS data and WES were available for 637 AA participants and 2378 EA participants. Variants with a minor allele frequency >1%, imputation accuracy (INFO score) ≥0.30, and significant association ( P  < 5.0E−8) with prevalent CHIP in the UK Biobank 15 were considered for association analysis. We performed ancestry-stratified single-variant associations, adjusted for age, age 2 , sex, top ten principal components of ancestry, and batch effect, followed by multi-ancestry inverse-variance weighted fixed-effect meta-analysis for incident overall CHIP, DNMT3A , and TET2 CHIP. REGENIE software 50 was used for single-variant associations, and GWAMA software 51 for multi-ancestry metanalysis (scripts available in the “Code Availability” section). Variants with an association P  < 0.05 were considered significant for these analyses.

Prevalent CHIP polygenic risk score (PRS)

Prevalent CHIP PRS were derived for 637 AA and 2376 EA participants using weights of 21 single nucleotide variants that were conditionally independent and significantly associated with the prevalence of CHIP in the UK Biobank at P  < 5E−8 15 . Effects for the risk-increasing allele were considered and the PLINK (version v2.00a3LM) “score” function was used to derive prevalent CHIP PRS. PRS was standardized to have a mean of zero and SD of 1. We performed associations for prevent CHIP PRS with incident CHIP categories in ARIC AA and EA participants using multivariable logistic regressions adjusted for age, sex, smoking status (never vs. ever), top five principal components of ancestry, and batch effect, followed by inverse-variance weighted fixed-effect multi-ancestry meta-analyses (scripts available in the “Code Availability” section).

Clonal trajectories

We classified CHIP clones observed at both visits (VAF ≥ 2% in one of the visits) in three different trajectories: “growing” when growth rate>0, with percent VAF change≥10% and dVAF≥0.02 (red), “shrinking” when growth rate<0, percent VAF change ≤ −10% and dVAF ≤ −0.02 (blue), otherwise “static” (black). Here, dVAF = VAF followup  - VAF baseline .

Determinants of clonal growth rate

A single clone per individual at dVAF > 0 was considered for growth rate analysis. Where multiple clones were detected at the follow-up visit at VAF > 2%, only a clone with max dVAF>0 was considered. If no clone was detected at baseline with coverage ≥20, we imputed baseline VAF = 0.1%. The growth rate was calculated as log(VAF Followup /VAF Baseline )/(Age Followup  − Age Baseline ). Multivariable linear regression was performed to model inverse rank normalized growth rate (outcome) with age, driver gene, mutation type, self-reported sex, self-reported race, BMI, smoking, HDL-C, Non-HDL-C, hypertension, T2D, ASCVD as exposures of interest, with additional covariates such as batch effect that included log(baseline sequencing coverage), baseline visit (i.e. ARIC visit 02 vs. others [visit 01, visit 03, visit 04 or MRI visit]), baseline visit center, baseline mutation detection method [MUTECT vs. manually detected in IGV or from read pileup], estimated baseline VAF vs. imputed VAF. The fully adjusted multivariable model included all these variables. Sensitivity analysis was performed where only variants at VAF ≥ 2% detected at follow-up with ≥5 supporting reads with ≥2 forward and reverse reads for the mutant allele were considered.

Detection of synonymous passenger mutations and association analysis

We analyzed ANNOVAR annotated Mutect2 variant call format files from 4187 baseline WES with longitudinal visit samples for synonymous passenger mutations (Supplementary Fig.  1 ). We identified synonymous single nucleotide variants (synonymousSNV) from autosomes in the VAF range 1%–25% with sequence coverage 20–400× with a minimum of three supporting reads for the alternative allele with ≥1 read each from forward and reverse directions. Variants were excluded if present in gnomAD, Mutect2 filter annotation other than “PASS”, median base quality (MBQ) < 30, median mapping quality (MMQ) < 60, median distance from the end of supporting reads (45 ≤ MPOS ≤ 10), number of events in the haplotype (ECNT) > 1, and variant observed in more than one sample (i.e. only singletons were considered for analysis). synonymousSNV that remained after filtering were considered for analysis and per-sample passenger counts were calculated. Individuals without a synonymousSNV detected in baseline WES were assigned a “0” for passenger count. Inverse rank normalized synonymous passenger counts (INTsynonymousSNV) were used as an exposure of interest for regression analyses.

To assess the association between the presence of CHIP (outcome) and (inverse rank normalized) synonymous passenger counts (exposure, INTsynonymousSNV), four separate multivariable logistic regressions were performed with varying outcome definitions, such as (i) no CHIP vs. presence of CHIP at either baseline and/or follow-up visit, (ii) no CHIP at baseline vs. prevalent CHIP (i.e. CHIP at VAF ≥ 2% detected at baseline visit), (iii) no CHIP at follow-up visit vs. CHIP detected at the follow-up visit, and (iv) no CHIP vs. incident CHIP (i.e. CHIP at VAF ≥ 2% only detected at the follow-up visit in individuals without a prevalent clone). Using a multivariable multinomial logistic regression, we further assessed the association between the number of CHIP clones (no CHIP clone vs. one clone/ 2 clones/ 3 clones or ≥4 clones) and INTsynonymousSNV. Finally, we used multivariable logistic regression to assess the association between clonal growth (no = 0/yes = 1) and INTsynonymousSNV. Here, binary outcome clonal growth was defined as no CHIP clone or no change in clone size (i.e., dVAF = 0) as “0” vs. growing CHIP clone (i.e., dVAF > 0) as “1”. All multivariable analyses accounted for additional covariates such as age at baseline, age 2 , self-reported sex, self-reported race, BMI, HDL-C, non-HDL-C, cholesterol medication usage, smoking history, hypertension, ASCVD, T2D, and batch effects (including ARIC baseline visit and visit center).

Use of large language models

Advanced language models like Grammarly, Inc., Bard ( https://bard.google.com/ ), and ChatGPT (version May 24; https://chat.openai.com/ ) were used to enhance grammatical accuracy and improve sentence structure and clarity.

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

To protect the privacy of research participants and the confidentiality of their data while ensuring that these data are available for appropriate use by researchers, all raw data used in this study are available via controlled access. Individual-level phenotypes and whole-exome sequencing data from the ARIC baseline visits participants (used in this study) are available in dbGaP ( https://www.ncbi.nlm.nih.gov/gap/ ) accession code phs000668 ( https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000668.v6.p2 ). Affymetrix 6.0 genome-wide association array dataset is available in dbGaP accession phs000557.v7.p2 ( https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000557.v7.p2 ). Imputed genotype data from the ARIC sub-study GENEVA is available in dbGaP (accession phs000090 “phg000248” https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000090.v8.p2 ). ARIC Visit-5 WES (generated in this study) and individual-level data are available via controlled access ancillary study proposal ( https://aric.cscc.unc.edu/aric9/researchers/Obtain_Submit_Data ). The timeline for the approval process ranges from 3–6 weeks for ARIC ancillary studies, with specific criteria and proposal forms available at https://sites.cscc.unc.edu/aric/ancillary-studies-pfg . All other data supporting the findings described in this manuscript are available in the article and its Supplementary Information files.

Code availability

The complete scripts used in this study are available at https://github.com/MMesbahU/longitudinal-profiling-of-clonal-hematopoiesis 52 ; CHIP calling pipeline is available at https://app.terra.bio/#workspaces/terra-outreach/CHIP-Detection-Mutect , and the custom script for mutations in U2AF1 is available at https://github.com/MMesbahU/U2AF1_pileup .

Khoury, J. D. et al. The 5th edition of the World Health Organization classification of haematolymphoid tumours: myeloid and histiocytic/dendritic neoplasms. Leukemia 36 , 1703–1719 (2022).

Article   PubMed   PubMed Central   Google Scholar  

Arber, D. A. et al. International consensus classification of myeloid neoplasms and acute leukemias: integrating morphologic, clinical, and genomic data. Blood 140 , 1200–1228 (2022).

Bick, A. G. et al. Inherited causes of clonal haematopoiesis in 97,691 whole genomes. Nature 586 , 763–768 (2020).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Genovese, G. et al. Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence. N. Engl. J. Med. 371 , 2477–2487 (2014).

Jaiswal, S. et al. Age-related clonal hematopoiesis associated with adverse outcomes. N. Engl. J. Med. 371 , 2488–2498 (2014).

Jaiswal, S. et al. Clonal hematopoiesis and risk of atherosclerotic cardiovascular disease. N. Engl. J. Med. 377 , 111–121 (2017).

Bhattacharya, R. et al. Clonal hematopoiesis is associated with higher risk of stroke. Stroke 53 , 788–797 (2022).

Article   PubMed   Google Scholar  

Weinstock, J. S. et al. Aberrant activation of TCL1A promotes stem cell expansion in clonal haematopoiesis. Nature 616 , 755–763 (2023).

Yu, B. et al. Association of clonal hematopoiesis with incident heart failure. J. Am. Coll. Cardiol. 78 , 42–52 (2021).

Niroula, A. et al. Distinction of lymphoid and myeloid clonal hematopoiesis. Nat. Med. 27 , 1921–1927 (2021).

Desai, P. et al. Somatic mutations precede acute myeloid leukemia years before diagnosis. Nat. Med. 24 , 1015–1023 (2018).

Fabre, M. A. et al. The longitudinal dynamics and natural history of clonal haematopoiesis. Nature 606 , 335–342 (2022).

van der Werf, I. et al. Splicing factor gene mutations in acute myeloid leukemia offer additive value if incorporated in current risk classification. Blood Adv. 5 , 3254–3265 (2021).

Zink, F. et al. Clonal hematopoiesis, with and without candidate driver mutations, is common in the elderly. Blood 130 , 742–752 (2017).

Kessler, M. D. et al. Common and rare variant associations with clonal haematopoiesis phenotypes. Nature 612 , 301–309 (2022).

Kar, S. P. et al. Genome-wide analyses of 200,453 individuals yield new insights into the causes and consequences of clonal hematopoiesis. Nat. Genet. 54 , 1155–1166 (2022).

Uddin, M. M. et al. Germline genomic and phenomic landscape of clonal hematopoiesis in 323,112 individuals. Preprint at medRxiv https://doi.org/10.1101/2022.07.29.22278015 (2022).

Coombs, C. C. et al. Therapy-related clonal hematopoiesis in patients with non-hematologic cancers is common and associated with adverse clinical outcomes. Cell Stem Cell 21 , 374–382.e374 (2017).

Bolton, K. L. et al. Cancer therapy shapes the fitness landscape of clonal hematopoiesis. Nat. Genet. 52 , 1219–1226 (2020).

Dawoud, A. A. Z., Tapper, W. J. & Cross, N. C. P. Clonal myelopoiesis in the UK Biobank cohort: ASXL1 mutations are strongly associated with smoking. Leukemia 34 , 2660–2672 (2020).

Bhattacharya, R. et al. Association of diet quality with prevalence of clonal hematopoiesis and adverse cardiovascular events. JAMA Cardiol. 6 , 1069–1077 (2021).

Jasra, S. et al. High burden of clonal hematopoiesis in first responders exposed to the World Trade Center disaster. Nat. Med. 28 , 468–471 (2022).

Article   MathSciNet   PubMed   PubMed Central   Google Scholar  

Mencia-Trinchant, N. et al. Clonal hematopoiesis before, during, and after human spaceflight. Cell Rep. 33 , 108458 (2020).

Jakubek, Y. A., Reiner, A. P. & Honigberg, M. C. Risk factors for clonal hematopoiesis of indeterminate potential and mosaic chromosomal alterations. Transl. Res. 255 , 171–180 (2023).

Robertson, N. A. et al. Longitudinal dynamics of clonal hematopoiesis identifies gene-specific fitness effects. Nat. Med. 28 , 1439–1446 (2022).

Uddin, M. M. et al. Longitudinal profiling of clonal hematopoiesis provides insight into clonal dynamics. Immun. Ageing 19 , 23 (2022).

Fabre, M. A. & Vassiliou, G. S. The lifelong natural history of clonal hematopoiesis and its links to myeloid malignancy. Blood 143 , 573–581 (2023).

Article   Google Scholar  

Poon, G. Y. P., Watson, C. J., Fisher, D. S. & Blundell, J. R. Synonymous mutations reveal genome-wide levels of positive selection in healthy tissues. Nat. Genet. 53 , 1597–1605 (2021).

Dietlein, F. et al. Identification of cancer driver genes based on nucleotide context. Nat. Genet. 52 , 208–218 (2020).

Stacey,,S. N. et al. Genetics and epidemiology of mutational barcode-defined clonal hematopoiesis. Nat. Genet. 55 , 2149–2159 (2023).

Hess, J. M. et al. Passenger hotspot mutations in cancer. Cancer Cell 36 , 288–301.e214 (2019).

Mitchell, E. et al. Clonal dynamics of haematopoiesis across the human lifespan. Nature 606 , 343–350 (2022).

van Zeventer, I. A. et al. Evolutionary landscape of clonal hematopoiesis in 3,359 individuals from the general population. Cancer Cell 41 , 1017–1031.e4 (2023).

Haring, B. et al. Healthy lifestyle and clonal hematopoiesis of indeterminate potential: results from the women’s health initiative. J. Am. Heart Assoc. 10 , e018789 (2021).

Andersson-Assarsson, J. C. et al. Evolution of age-related mutation-driven clonal haematopoiesis over 20 years is associated with metabolic dysfunction in obesity. EBioMedicine 92 , 104621 (2023).

Fuster, J. J. et al. Clonal hematopoiesis associated with TET2 deficiency accelerates atherosclerosis development in mice. Science 355 , 842–847 (2017).

Heyde, A. et al. Increased stem cell proliferation in atherosclerosis accelerates clonal hematopoiesis. Cell 184 , 1348–1361 e1322 (2021).

Rauh, M. J. Breaking the CH inflammation-expansion cycle. Blood 141 , 815–816 (2023).

Avagyan, S. et al. Resistance to inflammation underlies enhanced fitness in clonal hematopoiesis. Science 374 , 768–772 (2021).

Article   ADS   PubMed   Google Scholar  

Caiado, F. et al. Aging drives Tet2+/- clonal hematopoiesis via IL-1 signaling. Blood 141 , 886–903 (2023).

Ferrucci, L. & Fabbri, E. Inflammageing: chronic inflammation in ageing, cardiovascular disease, and frailty. Nat. Rev. Cardiol. 15 , 505–522 (2018).

Belizaire, R., Wong, W. J., Robinette, M. L. & Ebert, B. L. Clonal haematopoiesis and dysregulation of the immune system. Nat. Rev. Immunol. 23 , 595–610 (2023).

Liberale, L., Montecucco, F., Tardif, J. C., Libby, P. & Camici, G. G. Inflamm-ageing: the role of inflammation in age-dependent cardiovascular disease. Eur. Heart J. 41 , 2974–2982 (2020).

Psaty, B. M. et al. Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium: design of prospective meta-analyses of genome-wide association studies from 5 cohorts. Circ. Cardiovasc. Genet. 2 , 73–80 (2009).

Benjamin, D. et al. Calling Somatic SNVs and Indels with Mutect2. Preprint at bioRxiv https://doi.org/10.1101/861054 (2019).

Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38 , e164 (2010).

Gibson, C. J. et al. Clonal hematopoiesis associated with adverse outcomes after autologous stem-cell transplantation for lymphoma. J. Clin. Oncol. 35 , 1598–1605 (2017).

Genomes Project Consortium. et al. A map of human genome variation from population-scale sequencing. Nature 467 , 1061–1073 (2010). 1000.

Article   ADS   Google Scholar  

Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5 , e1000529 (2009).

Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 53 , 1097–1103 (2021).

Magi, R. & Morris, A. P. GWAMA: software for genome-wide association meta-analysis. BMC Bioinform. 11 , 288 (2010).

Uddin, M. M. et al. Long-term longitudinal analysis of 4,187 participants reveals insights into determinants of clonal hematopoiesis. Zenodo (2024). Determinants of longitudinal CHIP v1.0.0 .

Download references

Acknowledgements

The Atherosclerosis Risk in Communities study has been funded in whole or in part with Federal funds from the National Heart, Lung, and Blood Institute, National Institutes of Health, Department of Health and Human Services, under Contract nos. (HHSN268201700001I, HHSN268201700002I, HHSN268201700003I, HHSN268201700004I, HHSN268201700005I). Funding was also supported by R01HL087641, R01HL059367, and R01HL086694; National Human Genome Research Institute contract U01HG004402; and National Institutes of Health contract HHSN268200625226C. Infrastructure was partly supported by Grant Number UL1RR025005, a component of the National Institutes of Health and NIH Roadmap for Medical Research. Funding support for “Building on GWAS for NHLBI-diseases: the U.S. CHARGE consortium” was provided by the NIH through the American Recovery and Reinvestment Act of 2009 (ARRA) (5RC2HL102419). CHARGE sequencing was carried out at the Baylor College of Medicine Human Genome Sequencing Center (U54 HG003273 and R01HL086694). Funding for GO ESP was provided by NHLBI grants RC2 HL-103010 (HeartGO) and exome sequencing was performed through NHLBI grants RC2 HL-102925 (BroadGO) and RC2 HL-102926 (SeattleGO). The authors thank the staff and participants of the ARIC study for their important contributions. The authors also thank Mrs. Leslie Gaffney from the Broad Research Communication Lab for her valuable assistance in improving the display items. S.S. is supported by NIH/NHLBI T32 HL139425. A.N. was supported by funds from the Knut and Alice Wallenberg Foundation (KAW2017.0436). M.C.H. is supported by the U.S. National Heart, Lung, and Blood Institute (K08HL166687) and the American Heart Association (940166, 979465). A.G.B. is supported by a Burroughs Wellcome Foundation Career Award for Medical Scientists, the NIH Director’s Early Independence Award (DP5-OD029586), and the Pew-Stewart Scholar for Cancer Research Award, supported by the Pew Charitable Trusts and the Alexander and Margaret Stewart Trust. P.L. receives funding support from the National Heart, Lung, and Blood Institute (1R01HL134892, 1R01HL163099-01, R01AG063839, R01HL151627, R01HL157073, R01HL166538), the RRM Charitable Fund, and the Simard Fund. P.N. is supported by a Hassenfeld Scholar Award and the Paul & Phyllis Fireman Endowed Chair in Vascular Medicine from the Massachusetts General Hospital, and grants from the National Heart, Lung, and Blood Institute (R01HL148050). P.N. and B.L.E. are supported by a grant from the Fondation Leducq (TNE-18CVD04). B.L.E. is also supported by the NIH (R01HL082945, P01CA108631, and P50CA206963) and the Howard Hughes Medical Institute.

Author information

These authors contributed equally: Md Mesbah Uddin, Seyedmohammad Saadatagah.

These authors jointly supervised this work: Christie M. Ballantyne, Pradeep Natarajan.

Authors and Affiliations

Program in Medical and Population Genetics, Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA

Md Mesbah Uddin, Whitney E. Hornsby, Shriienidhie Ganesh, Kim Lannery, Art Schuermans, Michael C. Honigberg & Pradeep Natarajan

Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA

Department of Medicine, Baylor College of Medicine, Houston, TX, USA

Seyedmohammad Saadatagah & Christie M. Ballantyne

Center for Translational Research on Inflammatory Diseases, Baylor College of Medicine, Houston, TX, USA

Seyedmohammad Saadatagah

Broad Institute of MIT and Harvard, Cambridge, MA, USA

Abhishek Niroula & Benjamin L. Ebert

Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA

Institute of Biomedicine, SciLifeLab, University of Gothenburg, Gothenburg, Sweden

Abhishek Niroula

Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA

Department of Cardiovascular Sciences, KU Leuven, Leuven, Belgium

Art Schuermans

Department of Medicine, Harvard Medical School, Boston, MA, USA

Michael C. Honigberg, Peter Libby & Pradeep Natarajan

Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA

Alexander G. Bick

Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA

Peter Libby

Howard Hughes Medical Institute, Boston, MA, USA

Benjamin L. Ebert

You can also search for this author in PubMed   Google Scholar

Contributions

M.M.U., S.S., C.M.B., and P.N. conceived and designed the study. M.M.U., A.N., A.B., B.L.E., and P.N. generated somatic mutation calls. S.S., B.Y., and C.M.B. prepared ARIC phenotypes. M.M.U. and S.S. performed bioinformatic and statistical analysis with inputs from W.E.H., S.G., K.L., A.S., M.C.H., and P.L. P.N. and C.M.B. supervised this study. M.M.U. drafted the manuscript with critical input from P.N. All authors read and provided critical revision of the manuscript.

Corresponding authors

Correspondence to Christie M. Ballantyne or Pradeep Natarajan .

Ethics declarations

Competing interests.

M.C.H. reports research grants from Genentech, advisory board service for Miga Health, and consulting fees from Comanche Biopharma, all unrelated to the present work. P.L. is an unpaid consultant to/or involved in clinical trials for Amgen, Baim Institute, Beren Therapeutics, Esperion Therapeutics, Genentech, Kancera, Kowa Pharmaceuticals, Novo Nordisk, Novartis, Sanofi-Regeneron. P.L. is a member of the scientific advisory board for Amgen, Caristo Diagnostics, CSL Behring, DalCor Pharmaceuticals, Dewpoint Therapeutics, Eulicid Bioimaging, Kancera, Kowa Pharmaceuticals, Olatec Therapeutics, MedImmune, Novartis, PlaqueTec, Polygon Therapeutics, TenSixteen Bio, Soley Therapeutics, and XBiotech, Inc. P.L.’s laboratory has received research funding in the last 2 years from Novartis, Novo Nordisk, and Genentech. P.L. is on the Board of Directors of XBiotech, Inc. P.L. has a financial interest in Xbiotech, a company developing therapeutic human antibodies, in TenSixteen Bio, a company targeting somatic mosaicism and clonal hematopoiesis of indeterminate potential (CHIP) to discover and develop novel therapeutics to treat age-related diseases, and in Soley Therapeutics, a biotechnology company that is combining artificial intelligence with molecular and cellular response detection for discovering and developing new drugs, currently focusing on cancer therapeutics. P.L.’s interests were reviewed and are managed by Brigham and Women’s Hospital and Mass General Brigham in accordance with their conflict-of-interest policies. B.L.E. has received research funding from Novartis and Calico. He has received consulting fees from Abbvie. He is a member of the scientific advisory board and shareholder for Neomorph Inc., TenSixteen Bio, Skyhawk Therapeutics, and Exo Therapeutics, all distinct from the present work. P.N., A.G.B., and B.L.E. are scientific co-founders of TenSixteen Bio. P.N. reports research grants from Allelica, Amgen, Apple, Boston Scientific, Genentech / Roche, and Novartis, personal fees from Allelica, Apple, Astra Zeneca, Blackstone Life Sciences, Creative Education Concepts, CRISPR Therapeutics, Eli Lilly & Co, Esperion Therapeutics, Foresite Labs, Genentech / Roche, GV, HeartFlow, Magnet Biomedicine, Merck, Novartis, TenSixteen Bio, and Tourmaline Bio, equity in MyOme, Preciseli, and TenSixteen Bio, and spousal employment at Vertex Pharmaceuticals, all unrelated to the present work. C.M.B. reports grant/research support from Abbott Diagnostic, Akcea, Amgen, Arrowhead, Esperion, Ionis, Merck, New Amsterdam, Novartis, Novo Nordisk, Regeneron, Roche Diagnostic, NIH, AHA, ADA, consultation fees from Abbott Diagnostics, Alnylam Pharmaceuticals, Althera, Amarin, Amgen, Arrowhead, Astra Zeneca, Denka Seiken, Esperion, Genentech, Gilead, Illumina, Ionis, Matinas BioPharma Inc, Merck, New Amsterdam, Novartis, Novo Nordisk, Pfizer, Regeneron, Roche Diagnostic, TenSixteen Bio. The remaining authors declare no competing interests.

Peer review

Peer review information.

Nature Communications thanks Tamir Chandra and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, peer review file, description of additional supplementary files, supplementary data 1–9, reporting summary, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Uddin, M.M., Saadatagah, S., Niroula, A. et al. Long-term longitudinal analysis of 4,187 participants reveals insights into determinants of clonal hematopoiesis. Nat Commun 15 , 7858 (2024). https://doi.org/10.1038/s41467-024-52302-9

Download citation

Received : 06 December 2023

Accepted : 01 September 2024

Published : 09 September 2024

DOI : https://doi.org/10.1038/s41467-024-52302-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

how to formulate null hypothesis in research

COMMENTS

  1. How to Write a Null Hypothesis (5 Examples)

    How to Write a Null Hypothesis (5 Examples)

  2. Null & Alternative Hypotheses

    Null & Alternative Hypotheses | Definitions, Templates & ...

  3. How to Formulate a Null Hypothesis (With Examples)

    How to Formulate a Null Hypothesis (With Examples)

  4. How to Write a Null Hypothesis (with Examples and Templates)

    Writing Null Hypotheses in Research and Statistics

  5. Crafting a Null Hypothesis: A Guide to Writing it Right

    The null hypothesis is a statement that assumes there is no significant effect or relationship between the variables being studied. It represents the status quo or the assumption of no effect until proven otherwise. It's the hypothesis that researchers typically aim to test against and is denoted as H0.

  6. Null Hypothesis: Definition, Rejecting & Examples

    Null Hypothesis: Definition, Rejecting & Examples

  7. Null Hypothesis Definition and Examples, How to State

    Null Hypothesis Definition and Examples, How to State

  8. Formulating a Null Hypothesis: Key Elements to Consider

    To formulate a null hypothesis, first identify the research question or problem. Then, state the null hypothesis in a way that it asserts no effect or no difference between groups or variables. It should be clear, specific, and testable, often structured as H0: parameter = value (e.g., H0: μ1 = μ2).

  9. Null and Alternative Hypotheses

    Null and Alternative Hypotheses | Definitions & Examples

  10. How to Write a Strong Hypothesis

    How to Write a Strong Hypothesis | Steps & Examples

  11. 5.2

    5.2 - Writing Hypotheses | STAT 200

  12. Hypothesis Testing

    Hypothesis Testing | A Step-by-Step Guide with Easy ...

  13. What Is The Null Hypothesis & When To Reject It

    What Is The Null Hypothesis & When To Reject It

  14. How to Write a Strong Hypothesis

    How to Write a Strong Hypothesis | Guide & Examples - Scribbr

  15. Formulating a Null Hypothesis: The Foundation of Your Research

    Formulating a null hypothesis requires a clear understanding of the research question and a precise statement that can be empirically tested. Testing the null hypothesis involves collecting data and using statistical methods to determine whether to reject or fail to reject the null hypothesis. Understanding the Null Hypothesis in Research

  16. Research Hypothesis: Definition, Types, Examples and Quick Tips

    Research Hypothesis: Definition, Types, Examples and ...

  17. Null hypothesis

    The null is like the defendant in a criminal trial. Formulating null hypotheses and subjecting them to statistical testing is one of the workhorses of the scientific method. Scientists in all fields make conjectures about the phenomena they study, translate them into null hypotheses and gather data to test them.

  18. 7.3: The Research Hypothesis and the Null Hypothesis

    This null hypothesis can be written as: H0: X¯ = μ H 0: X ¯ = μ. For most of this textbook, the null hypothesis is that the means of the two groups are similar. Much later, the null hypothesis will be that there is no relationship between the two groups. Either way, remember that a null hypothesis is always saying that nothing is different.

  19. What is a Research Hypothesis: How to Write it, Types, and Examples

    What is a research hypothesis: How to write it, types, and ...

  20. What is a Null Hypothesis?

    Overview of null hypothesis, examples of null and alternate hypotheses, and how to write a null hypothesis statement.

  21. Understanding the Null Hypothesis for Linear Regression

    Understanding the Null Hypothesis for Linear Regression

  22. Null and Alternative Hypotheses

    Write the statement such that a relationship does not exist or a difference does not exist and you have the null hypothesis. You can reverse the process if you have a hypothesis and wish to write a research question. When you are comparing two groups, the groups are the independent variable. When you are testing whether something affects ...

  23. A Practical Guide to Writing Quantitative and Qualitative Research

    A Practical Guide to Writing Quantitative and Qualitative ...

  24. Long-term longitudinal analysis of 4,187 participants reveals ...

    These findings underscore the importance of further research to understand the mechanisms underlying incident CHIP better and to develop strategies for its early detection and effective management.