Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

16.3: The Process of Null Hypothesis Testing

  • Last updated
  • Save as PDF
  • Page ID 8804

  • Russell A. Poldrack
  • Stanford University

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

We can break the process of null hypothesis testing down into a number of steps:

  • Formulate a hypothesis that embodies our prediction ( before seeing the data )
  • Collect some data relevant to the hypothesis
  • Specify null and alternative hypotheses
  • Fit a model to the data that represents the alternative hypothesis and compute a test statistic
  • Compute the probability of the observed value of that statistic assuming that the null hypothesis is true
  • Assess the “statistical significance” of the result

For a hands-on example, let’s use the NHANES data to ask the following question: Is physical activity related to body mass index? In the NHANES dataset, participants were asked whether they engage regularly in moderate or vigorous-intensity sports, fitness or recreational activities (stored in the variable P h y s A c t i v e PhysActive ). The researchers also measured height and weight and used them to compute the Body Mass Index (BMI):

B M I = w e i g h t ( k g ) h e i g h t ( m ) 2 BMI = \frac{weight(kg)}{height(m)^2}

16.3.1 Step 1: Formulate a hypothesis of interest

For step 1, we hypothesize that BMI is greater for people who do not engage in physical activity, compared to those who do.

16.3.2 Step 2: Collect some data

For step 2, we collect some data. In this case, we will sample 250 individuals from the NHANES dataset. Figure 16.1 shows an example of such a sample, with BMI shown separately for active and inactive individuals.

Box plot of BMI data from a sample of adults from the NHANES dataset, split by whether they reported engaging in regular physical activity.

16.3.3 Step 3: Specify the null and alternative hypotheses

For step 3, we need to specify our null hypothesis (which we call H 0 H _0 ) and our alternative hypothesis (which we call H A H_A ). H 0 H _0 is the baseline against which we test our hypothesis of interest: that is, what would we expect the data to look like if there was no effect? The null hypothesis always involves some kind of equality (=, ≤ \le , or ≥ \ge ). H A H_A describes what we expect if there actually is an effect. The alternative hypothesis always involves some kind of inequality ( ≠ \ne , >, or <). Importantly, null hypothesis testing operates under the assumption that the null hypothesis is true unless the evidence shows otherwise.

We also have to decide whether to use directional or non-directional hypotheses. A non-directional hypothesis simply predicts that there will be a difference, without predicting which direction it will go. For the BMI/activity example, a non-directional null hypothesis would be:

H 0 : B M I a c t i v e = B M I i n a c t i v e H0 : BMI_{active} = BMI_{inactive}

and the corresponding non-directional alternative hypothesis would be:

H A : B M I a c t i v e ≠ B M I i n a c t i v e HA: BMI_{active} \neq BMI_{inactive}

A directional hypothesis, on the other hand, predicts which direction the difference would go. For example, we have strong prior knowledge to predict that people who engage in physical activity should weigh less than those who do not, so we would propose the following directional null hypothesis:

H 0 : B M I a c t i v e ≥ B M I i n a c t i v e H0: BMI_{active} \ge BMI_{inactive}

and directional alternative:

H A : B M I a c t i v e < B M I i n a c t i v e HA : BMI_{active} < BMI_{inactive}

As we will see later, testing a non-directional hypothesis is more conservative, so this is generally to be preferred unless there is a strong a priori reason to hypothesize an effect in a particular direction. Any direction hypotheses should be specified prior to looking at the data!

16.3.4 Step 4: Fit a model to the data and compute a test statistic

For step 4, we want to use the data to compute a statistic that will ultimately let us decide whether the null hypothesis is rejected or not. To do this, the model needs to quantify the amount of evidence in favor of the alternative hypothesis, relative to the variability in the data. Thus we can think of the test statistic as providing a measure of the size of the effect compared to the variability in the data. In general, this test statistic will have a probability distribution associated with it, because that allows us to determine how likely our observed value of the statistic is under the null hypothesis.

For the BMI example, we need a test statistic that allows us to test for a difference between two means, since the hypotheses are stated in terms of mean BMI for each group. One statistic that is often used to compare two means is the t-statistic , first developed by the statistician William Sealy Gossett, who worked for the Guiness Brewery in Dublin and wrote under the pen name “Student” - hence, it is often called “Student’s t-statistic”. The t-statistic is appropriate for comparing the means of two groups when the sample sizes are relatively small and the population standard deviation is unknown. The t-statistic for comparison of two independent groups is computed as:

t = X 1 ‾ − X 2 ‾ S 1 2 n 1 + S 2 2 n 2 t = \frac{\bar{X_1} - \bar{X_2}}{\sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}}}

where X ‾ 1 \bar{X}_1 and X ‾ 2 \bar{X}_2 are the means of the two groups, S 1 2 S ^2_1 and S 2 2 S ^2_2 are the estimated variances of the groups, and n 1 n _1 and n 2 n _2 are the sizes of the two groups. Note that the denominator is basically an average of the standard error of the mean for the two samples. Thus, one can view the the t-statistic as a way of quantifying how large the difference between groups is in relation to the sampling variability of the means that are being compared.

The t-statistic is distributed according to a probability distribution known as a t distribution. The t distribution looks quite similar to a normal distribution, but it differs depending on the number of degrees of freedom, which for this example is the number of observations minus 2, since we have computed two means and thus given up two degrees of freedom. When the degrees of freedom are large (say 1000), then the t distribution looks essentialy like the normal distribution, but when they are small then the t distribution has longer tails than the normal (see Figure 16.2).

Each panel shows the t distribution (in blue dashed line) overlaid on the normal distribution (in solid red line).  The left panel shows a t distribution with 4 degrees of freedom, in which case the distribution is similar but has slightly wider tails.  The right panel shows a t distribution with 1000 degrees of freedom, in which case it is virtually identical to the normal.

16.3.5 Step 5: Determine the probability of the data under the null hypothesis

This is the step where NHST starts to violate our intuition – rather than determining the likelihood that the null hypothesis is true given the data, we instead determine the likelihood of the data under the null hypothesis - because we started out by assuming that the null hypothesis is true! To do this, we need to know the probability distribution for the statistic under the null hypothesis, so that we can ask how likely the data are under that distribution. Before we move to our BMI data, let’s start with some simpler examples.

16.3.5.1 Randomization: A very simple example

Let’s say that we wish to determine whether a coin is fair. To collect data, we flip the coin 100 times, and we count 70 heads. In this example, H 0 : P ( h e a d s ) = 0.5 H_0: P(heads)=0.5 and H A : P ( h e a d s ) ≠ 0.5 H_A: P(heads) \neq 0.5 , and our test statistic is simply the number of heads that we counted. The question that we then want to ask is: How likely is it that we would observe 70 heads if the true probability of heads is 0.5. We can imagine that this might happen very occasionally just by chance, but doesn’t seem very likely. To quantify this probability, we can use the binomial distribution :

P ( X < k ) = ∑ i = 0 k ( N k ) p i ( 1 − p ) ( n − i ) P(X < k) = \sum_{i=0}^k \binom{N}{k} p^i (1-p)^{(n-i)} This equation will tell us the likelihood of a certain number of heads or fewer, given a particular probability of heads. However, what we really want to know is the probability of a certain number or more, which we can obtain by subtracting from one, based on the rules of probability:

P ( X ≥ k ) = 1 − P ( X < k ) P(X \ge k) = 1 - P(X < k)

Distribution of numbers of heads (out of 100 flips) across 100,000 simulated runsl with the observed value of 70 flips represented by the vertical line.

We can compute the probability for our example using the pbinom() function. The probability of 69 or fewer heads given P(heads)=0.5 is 0.999961, so the probability of 70 or more heads is simply one minus that value (0.000039) This computation shows us that the likelihood of getting 70 heads if the coin is indeed fair is very small.

Now, what if we didn’t have the pbinom() function to tell us the probability of that number of heads? We could instead determine it by simulation – we repeatedly flip a coin 100 times using a true probability of 0.5, and then compute the distribution of the number of heads across those simulation runs. Figure 16.3 shows the result from this simulation. Here we can see that the probability computed via simulation (0.000030) is very close to the theoretical probability (.00004).

Let’s do the analogous computation for our BMI example. First we compute the t statistic using the values from our sample that we calculated above, where we find that (t = 3.86). The question that we then want to ask is: What is the likelihood that we would find a t statistic of this size, if the true difference between groups is zero or less (i.e. the directional null hypothesis)?

We can use the t distribution to determine this probability. Our sample size is 250, so the appropriate t distribution has 248 degrees of freedom because lose one for each of the two means that we computed. We can use the pt() function in R to determine the probability of finding a value of the t-statistic greater than or equal to our observed value. Note that we want to know the probability of a value greater than our observed value, but by default pt() gives us the probability of a value less than the one that we provide it, so we have to tell it explicitly to provide us with the “upper tail” probability (by setting lower.tail = FALSE ). We find that (p(t > 3.86, df = 248) = 0.000), which tells us that our observed t-statistic value of 3.86 is relatively unlikely if the null hypothesis really is true.

In this case, we used a directional hypothesis, so we only had to look at one end of the null distribution. If we wanted to test a non-directional hypothesis, then we would need to be able to identify how unexpected the size of the effect is, regardless of its direction. In the context of the t-test, this means that we need to know how likely it is that the statistic would be as extreme in either the positive or negative direction. To do this, we multiply the observed t value by -1, since the t distribution is centered around zero, and then add together the two tail probabilities to get a two-tailed p-value: (p(t > 3.86 or t< -3.86, df = 248) = 0.000). Here we see that the p value for the two-tailed test is twice as large as that for the one-tailed test, which reflects the fact that an extreme value is less surprising since it could have occurred in either direction.

How do you choose whether to use a one-tailed versus a two-tailed test? The two-tailed test is always going to be more conservative, so it’s always a good bet to use that one, unless you had a very strong prior reason for using a one-tailed test. In that case, you should have written down the hypothesis before you ever looked at the data. In Chapter 32 we will discuss the idea of pre-registration of hypotheses, which formalizes the idea of writing down your hypotheses before you ever see the actual data. You should never make a decision about how to perform a hypothesis test once you have looked at the data, as this can introduce serious bias into the results.

16.3.5.2 Computing p-values using randomization

So far we have seen how we can use the t-distribution to compute the probability of the data under the null hypothesis, but we can also do this using simulation. The basic idea is that we generate simulated data like those that we would expect under the null hypothesis, and then ask how extreme the observed data are in comparison to those simulated data. The key question is: How can we generate data for which the null hypothesis is true? The general answer is that we can randomly rearrange the data in a particular way that makes the data look like they would if the null was really true. This is similar to the idea of bootstrapping, in the sense that it uses our own data to come up with an answer, but it does it in a different way.

16.3.5.3 Randomization: a simple example

Let’s start with a simple example. Let’s say that we want to compare the mean squatting ability of football players with cross-country runners, with H 0 : μ F B ≤ μ X C H_0: \mu_{FB} \le \mu_{XC} and H A : μ F B > μ X C H _A: \mu_{FB} > \mu_{XC} . We measure the maximum squatting ability of 5 football players and 5 cross-country runners (which we will generate randomly, assuming that μ F B = 300 \mu_{FB} = 300 , μ X C = 140 \mu_{XC} = 140 , and σ = 30 \sigma = 30 ).

Left: Box plots of simulated squatting ability for football players and cross-country runners.Right: Box plots for subjects assigned to each group after scrambling group labels.

From the plot in Figure 16.4 it’s clear that there is a large difference between the two groups. We can do a standard t-test to test our hypothesis, using the t.test() command in R, which gives the following result:

If we look at the p-value reported here, we see that the likelihood of such a difference under the null hypothesis is very small, using the t distribution to define the null.

Now let’s see how we could answer the same question using randomization. The basic idea is that if the null hypothesis of no difference between groups is true, then it shouldn’t matter which group one comes from (football players versus cross-country runners) – thus, to create data that are like our actual data but also conform to the null hypothesis, we can randomly reorder the group labels for the individuals in the dataset, and then recompute the difference between the groups. The results of such a shuffle are shown in Figure ?? .

Histogram of t-values for the difference in means between the football and cross-country groups after randomly shuffling group membership.  The vertical line denotes the actual difference observed between the two groups, and the dotted line shows the theoretical t distribution for this analysis.

After scrambling the labels, we see that the two groups are now much more similar, and in fact the cross-country group now has a slightly higher mean. Now let’s do that 10000 times and store the t statistic for each iteration; this may take a moment to complete. Figure 16.5 shows the histogram of the t-values across all of the random shuffles. As expected under the null hypothesis, this distribution is centered at zero (the mean of the distribution is -0.016. From the figure we can also see that the distribution of t values after shuffling roughly follows the theoretical t distribution under the null hypothesis (with mean=0), showing that randomization worked to generate null data. We can compute the p-value from the randomized data by measuring how many of the shuffled values are at least as extreme as the observed value: p(t > 5.14, df = 8) using randomization = 0.00380. This p-value is very similar to the p-value that we obtained using the t distribution, and both are quite extreme, suggesting that the observed data are very unlikely to have arisen if the null hypothesis is true - and in this case we know that it’s not true, because we generated the data.

16.3.5.3.1 Randomization: BMI/activity example

Now let’s use randomization to compute the p-value for the BMI/activity example. In this case, we will randomly shuffle the PhysActive variable and compute the difference between groups after each shuffle, and then compare our observed t statistic to the distribution of t statistics from the shuffled datasets. Figure 16.6 shows the distribution of t values from the shuffled samples, and we can also compute the probability of finding a value as large or larger than the observed value. The p-value obtained from randomization (0.0000) is very similar to the one obtained using the t distribution (0.0001). The advantage of the randomization test is that it doesn’t require that we assume that the data from each of the groups are normally distributed, though the t-test is generally quite robust to violations of that assumption. In addition, the randomization test can allow us to compute p-values for statistics when we don’t have a theoretical distribution like we do for the t-test.

Histogram of t statistics after shuffling of group labels, with the observed value of the t statistic shown in the vertical line, and values at least as extreme as the observed value shown in lighter gray

We do have to make one main assumption when we use the randomization test, which we refer to as exchangeability . This means that all of the observations are distributed in the same way, such that we can interchange them without changing the overall distribution. The main place where this can break down is when there are related observations in the data; for example, if we had data from individuals in 4 different families, then we couldn’t assume that individuals were exchangeable, because siblings would be closer to each other than they are to individuals from other families. In general, if the data were obtained by random sampling, then the assumption of exchangeability should hold.

16.3.6 Step 6: Assess the “statistical significance” of the result

The next step is to determine whether the p-value that results from the previous step is small enough that we are willing to reject the null hypothesis and conclude instead that the alternative is true. How much evidence do we require? This is one of the most controversial questions in statistics, in part because it requires a subjective judgment – there is no “correct” answer.

Historically, the most common answer to this question has been that we should reject the null hypothesis if the p-value is less than 0.05. This comes from the writings of Ronald Fisher, who has been referred to as “the single most important figure in 20th century statistics” (Efron 1998) :

“If P is between .1 and .9 there is certainly no reason to suspect the hypothesis tested. If it is below .02 it is strongly indicated that the hypothesis fails to account for the whole of the facts. We shall not often be astray if we draw a conventional line at .05 … it is convenient to draw the line at about the level at which we can say: Either there is something in the treatment, or a coincidence has occurred such as does not occur more than once in twenty trials” (Fisher 1925)

However, Fisher never intended p < 0.05 p < 0.05 to be a fixed rule:

“no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas” [fish:1956]

Instead, it is likely that it became a ritual due to the reliance upon tables of p-values that were used before computing made it easy to compute p values for arbitrary values of a statistic. All of the tables had an entry for 0.05, making it easy to determine whether one’s statistic exceeded the value needed to reach that level of significance.

The choice of statistical thresholds remains deeply controversial, and recently (Benjamin et al., 2018) it has been proposed that the standard threshold be changed from .05 to .005, making it substantially more stringent and thus more difficult to reject the null hypothesis. In large part this move is due to growing concerns that the evidence obtained from a significant result at p < . 05 32.

16.3.6.1 Hypothesis testing as decision-making: The Neyman-Pearson approach

Whereas Fisher thought that the p-value could provide evidence regarding a specific hypothesis, the statisticians Jerzy Neyman and Egon Pearson disagreed vehemently. Instead, they proposed that we think of hypothesis testing in terms of its error rate in the long run:

“no test based upon a theory of probability can by itself provide any valuable evidence of the truth or falsehood of a hypothesis. But we may look at the purpose of tests from another viewpoint. Without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern our behaviour with regard to them, in following which we insure that, in the long run of experience, we shall not often be wrong” (Neyman and Pearson 1933)

That is: We can’t know which specific decisions are right or wrong, but if we follow the rules, we can at least know how often our decisions will be wrong on average.

To understand the decision making framework that Neyman and Pearson developed, we first need to discuss statistical decision making in terms of the kinds of outcomes that can occur. There are two possible states of reality ( H 0 H _0 is true, or H 0 H _0 is false), and two possible decisions (reject H 0 H _0 , or fail to reject H 0 H _0 ). There are two ways in which we can make a correct decision:

  • We can decide to reject H 0 H _0 when it is false (in the language of decision theory, we call this a hit )
  • We can fail to reject H 0 H _0 when it is true (we call this a correct rejection )

There are also two kinds of errors we can make:

  • We can decide to reject H 0 H _0 when it is actually true (we call this a false alarm , or Type I error )
  • We can fail to reject H 0 H _0 when it is actually false (we call this a miss , or Type II error )

Neyman and Pearson coined two terms to describe the probability of these two types of errors in the long run:

  • P(Type I error) = α \alpha
  • P(Type II error) = β \beta

That is, if we set α 18.3, which is the complement of Type II error.

16.3.7 What does a significant result mean?

There is a great deal of confusion about what p-values actually mean (Gigerenzer, 2004). Let’s say that we do an experiment comparing the means between conditions, and we find a difference with a p-value of .01. There are a number of possible interpretations.

16.3.7.1 Does it mean that the probability of the null hypothesis being true is .01?

No. Remember that in null hypothesis testing, the p-value is the probability of the data given the null hypothesis ( P ( d a t a | H 0 ) P(data|H_0) ). It does not warrant conclusions about the probability of the null hypothesis given the data ( P ( H 0 | d a t a ) P(H_0|data) ). We will return to this question when we discuss Bayesian inference in a later chapter, as Bayes theorem lets us invert the conditional probability in a way that allows us to determine the latter probability.

16.3.7.2 Does it mean that the probability that you are making the wrong decision is .01?

No. This would be P ( H 0 | d a t a ) P(H_0|data) , but remember as above that p-values are probabilities of data under H 0 H _0 , not probabilities of hypotheses.

16.3.7.3 Does it mean that if you ran the study again, you would obtain the same result 99% of the time?

No. The p-value is a statement about the likelihood of a particular dataset under the null; it does not allow us to make inferences about the likelihood of future events such as replication.

16.3.7.4 Does it mean that you have found a meaningful effect?

No. There is an important distinction between statistical significance and practical significance . As an example, let’s say that we performed a randomized controlled trial to examine the effect of a particular diet on body weight, and we find a statistically significant effect at p<.05. What this doesn’t tell us is how much weight was actually lost, which we refer to as the effect size (to be discussed in more detail in Chapter 18). If we think about a study of weight loss, then we probably don’t think that the loss of ten ounces (i.e. the weight of a bag of potato chips) is practically significant. Let’s look at our ability to detect a significant difference of 1 ounce as the sample size increases.

Figure 16.7 shows how the proportion of significant results increases as the sample size increases, such that with a very large sample size (about 262,000 total subjects), we will find a significant result in more than 90% of studies when there is a 1 ounce weight loss. While these are statistically significant, most physicians would not consider a weight loss of one ounce to be practically or clinically significant. We will explore this relationship in more detail when we return to the concept of statistical power in Section 18.3, but it should already be clear from this example that statistical significance is not necessarily indicative of practical significance.

The proportion of signifcant results for a very small change (1 ounce, which is about .001 standard deviations) as a function of sample size.

  • Subscriber Services
  • For Authors
  • Publications
  • Archaeology
  • Art & Architecture
  • Bilingual dictionaries
  • Classical studies
  • Encyclopedias
  • English Dictionaries and Thesauri
  • Language reference
  • Linguistics
  • Media studies
  • Medicine and health
  • Names studies
  • Performing arts
  • Science and technology
  • Social sciences
  • Society and culture
  • Overview Pages
  • Subject Reference
  • English Dictionaries
  • Bilingual Dictionaries

Recently viewed (0)

  • Save Search
  • Share This Facebook LinkedIn Twitter

Related Content

Related overviews.

homoscedastic

heteroscedasticity

More Like This

Show all results sharing these subjects:

White's test

Quick reference.

A test of the null hypothesis of homoscedasticity against the alternative of heteroscedasticity, based on the fact that under heteroscedasticity the ordinary least squares estimator of the covariance matrix is inconsistent. The test is performed by regressing squared OLS residuals from the main regression on all explanatory variables, their squares and cross-products, and a constant. The test statistic is given by NR 2 , where N is the sample size and R 2 is the coefficient of determination from the test regression. Under the null it has an asymptotic chi-square distribution with the degrees of freedom equal to the number of regressors in the test regression, excluding the constant.

From:   White's test   in  A Dictionary of Economics »

Subjects: Social sciences — Economics

Related content in Oxford Reference

Reference entries.

View all related items in Oxford Reference »

Search for: 'White's test' in Oxford Reference »

  • Oxford University Press

PRINTED FROM OXFORD REFERENCE (www.oxfordreference.com). (c) Copyright Oxford University Press, 2023. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single entry from a reference work in OR for personal use (for details see Privacy Policy and Legal Notice ).

date: 02 June 2024

  • Cookie Policy
  • Privacy Policy
  • Legal Notice
  • Accessibility
  • [66.249.64.20|185.66.14.133]
  • 185.66.14.133

Character limit 500 /500

Logo for BCcampus Open Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Chapter 13: Inferential Statistics

Understanding Null Hypothesis Testing

Learning Objectives

  • Explain the purpose of null hypothesis testing, including the role of sampling error.
  • Describe the basic logic of null hypothesis testing.
  • Describe the role of relationship strength and sample size in determining statistical significance and make reasonable judgments about statistical significance based on these two factors.

The Purpose of Null Hypothesis Testing

As we have seen, psychological research typically involves measuring one or more variables for a sample and computing descriptive statistics for that sample. In general, however, the researcher’s goal is not to draw conclusions about that sample but to draw conclusions about the population that the sample was selected from. Thus researchers must use sample statistics to draw conclusions about the corresponding values in the population. These corresponding values in the population are called  parameters . Imagine, for example, that a researcher measures the number of depressive symptoms exhibited by each of 50 clinically depressed adults and computes the mean number of symptoms. The researcher probably wants to use this sample statistic (the mean number of symptoms for the sample) to draw conclusions about the corresponding population parameter (the mean number of symptoms for clinically depressed adults).

Unfortunately, sample statistics are not perfect estimates of their corresponding population parameters. This is because there is a certain amount of random variability in any statistic from sample to sample. The mean number of depressive symptoms might be 8.73 in one sample of clinically depressed adults, 6.45 in a second sample, and 9.44 in a third—even though these samples are selected randomly from the same population. Similarly, the correlation (Pearson’s  r ) between two variables might be +.24 in one sample, −.04 in a second sample, and +.15 in a third—again, even though these samples are selected randomly from the same population. This random variability in a statistic from sample to sample is called  sampling error . (Note that the term error  here refers to random variability and does not imply that anyone has made a mistake. No one “commits a sampling error.”)

One implication of this is that when there is a statistical relationship in a sample, it is not always clear that there is a statistical relationship in the population. A small difference between two group means in a sample might indicate that there is a small difference between the two group means in the population. But it could also be that there is no difference between the means in the population and that the difference in the sample is just a matter of sampling error. Similarly, a Pearson’s  r  value of −.29 in a sample might mean that there is a negative relationship in the population. But it could also be that there is no relationship in the population and that the relationship in the sample is just a matter of sampling error.

In fact, any statistical relationship in a sample can be interpreted in two ways:

  • There is a relationship in the population, and the relationship in the sample reflects this.
  • There is no relationship in the population, and the relationship in the sample reflects only sampling error.

The purpose of null hypothesis testing is simply to help researchers decide between these two interpretations.

The Logic of Null Hypothesis Testing

Null hypothesis testing  is a formal approach to deciding between two interpretations of a statistical relationship in a sample. One interpretation is called the   null hypothesis  (often symbolized  H 0  and read as “H-naught”). This is the idea that there is no relationship in the population and that the relationship in the sample reflects only sampling error. Informally, the null hypothesis is that the sample relationship “occurred by chance.” The other interpretation is called the  alternative hypothesis  (often symbolized as  H 1 ). This is the idea that there is a relationship in the population and that the relationship in the sample reflects this relationship in the population.

Again, every statistical relationship in a sample can be interpreted in either of these two ways: It might have occurred by chance, or it might reflect a relationship in the population. So researchers need a way to decide between them. Although there are many specific null hypothesis testing techniques, they are all based on the same general logic. The steps are as follows:

  • Assume for the moment that the null hypothesis is true. There is no relationship between the variables in the population.
  • Determine how likely the sample relationship would be if the null hypothesis were true.
  • If the sample relationship would be extremely unlikely, then reject the null hypothesis  in favour of the alternative hypothesis. If it would not be extremely unlikely, then  retain the null hypothesis .

Following this logic, we can begin to understand why Mehl and his colleagues concluded that there is no difference in talkativeness between women and men in the population. In essence, they asked the following question: “If there were no difference in the population, how likely is it that we would find a small difference of  d  = 0.06 in our sample?” Their answer to this question was that this sample relationship would be fairly likely if the null hypothesis were true. Therefore, they retained the null hypothesis—concluding that there is no evidence of a sex difference in the population. We can also see why Kanner and his colleagues concluded that there is a correlation between hassles and symptoms in the population. They asked, “If the null hypothesis were true, how likely is it that we would find a strong correlation of +.60 in our sample?” Their answer to this question was that this sample relationship would be fairly unlikely if the null hypothesis were true. Therefore, they rejected the null hypothesis in favour of the alternative hypothesis—concluding that there is a positive correlation between these variables in the population.

A crucial step in null hypothesis testing is finding the likelihood of the sample result if the null hypothesis were true. This probability is called the  p value . A low  p  value means that the sample result would be unlikely if the null hypothesis were true and leads to the rejection of the null hypothesis. A high  p  value means that the sample result would be likely if the null hypothesis were true and leads to the retention of the null hypothesis. But how low must the  p  value be before the sample result is considered unlikely enough to reject the null hypothesis? In null hypothesis testing, this criterion is called  α (alpha)  and is almost always set to .05. If there is less than a 5% chance of a result as extreme as the sample result if the null hypothesis were true, then the null hypothesis is rejected. When this happens, the result is said to be  statistically significant . If there is greater than a 5% chance of a result as extreme as the sample result when the null hypothesis is true, then the null hypothesis is retained. This does not necessarily mean that the researcher accepts the null hypothesis as true—only that there is not currently enough evidence to conclude that it is true. Researchers often use the expression “fail to reject the null hypothesis” rather than “retain the null hypothesis,” but they never use the expression “accept the null hypothesis.”

The Misunderstood  p  Value

The  p  value is one of the most misunderstood quantities in psychological research (Cohen, 1994) [1] . Even professional researchers misinterpret it, and it is not unusual for such misinterpretations to appear in statistics textbooks!

The most common misinterpretation is that the  p  value is the probability that the null hypothesis is true—that the sample result occurred by chance. For example, a misguided researcher might say that because the  p  value is .02, there is only a 2% chance that the result is due to chance and a 98% chance that it reflects a real relationship in the population. But this is incorrect . The  p  value is really the probability of a result at least as extreme as the sample result  if  the null hypothesis  were  true. So a  p  value of .02 means that if the null hypothesis were true, a sample result this extreme would occur only 2% of the time.

You can avoid this misunderstanding by remembering that the  p  value is not the probability that any particular  hypothesis  is true or false. Instead, it is the probability of obtaining the  sample result  if the null hypothesis were true.

Role of Sample Size and Relationship Strength

Recall that null hypothesis testing involves answering the question, “If the null hypothesis were true, what is the probability of a sample result as extreme as this one?” In other words, “What is the  p  value?” It can be helpful to see that the answer to this question depends on just two considerations: the strength of the relationship and the size of the sample. Specifically, the stronger the sample relationship and the larger the sample, the less likely the result would be if the null hypothesis were true. That is, the lower the  p  value. This should make sense. Imagine a study in which a sample of 500 women is compared with a sample of 500 men in terms of some psychological characteristic, and Cohen’s  d  is a strong 0.50. If there were really no sex difference in the population, then a result this strong based on such a large sample should seem highly unlikely. Now imagine a similar study in which a sample of three women is compared with a sample of three men, and Cohen’s  d  is a weak 0.10. If there were no sex difference in the population, then a relationship this weak based on such a small sample should seem likely. And this is precisely why the null hypothesis would be rejected in the first example and retained in the second.

Of course, sometimes the result can be weak and the sample large, or the result can be strong and the sample small. In these cases, the two considerations trade off against each other so that a weak result can be statistically significant if the sample is large enough and a strong relationship can be statistically significant even if the sample is small. Table 13.1 shows roughly how relationship strength and sample size combine to determine whether a sample result is statistically significant. The columns of the table represent the three levels of relationship strength: weak, medium, and strong. The rows represent four sample sizes that can be considered small, medium, large, and extra large in the context of psychological research. Thus each cell in the table represents a combination of relationship strength and sample size. If a cell contains the word  Yes , then this combination would be statistically significant for both Cohen’s  d  and Pearson’s  r . If it contains the word  No , then it would not be statistically significant for either. There is one cell where the decision for  d  and  r  would be different and another where it might be different depending on some additional considerations, which are discussed in Section 13.2 “Some Basic Null Hypothesis Tests”

Although Table 13.1 provides only a rough guideline, it shows very clearly that weak relationships based on medium or small samples are never statistically significant and that strong relationships based on medium or larger samples are always statistically significant. If you keep this lesson in mind, you will often know whether a result is statistically significant based on the descriptive statistics alone. It is extremely useful to be able to develop this kind of intuitive judgment. One reason is that it allows you to develop expectations about how your formal null hypothesis tests are going to come out, which in turn allows you to detect problems in your analyses. For example, if your sample relationship is strong and your sample is medium, then you would expect to reject the null hypothesis. If for some reason your formal null hypothesis test indicates otherwise, then you need to double-check your computations and interpretations. A second reason is that the ability to make this kind of intuitive judgment is an indication that you understand the basic logic of this approach in addition to being able to do the computations.

Statistical Significance Versus Practical Significance

Table 13.1 illustrates another extremely important point. A statistically significant result is not necessarily a strong one. Even a very weak result can be statistically significant if it is based on a large enough sample. This is closely related to Janet Shibley Hyde’s argument about sex differences (Hyde, 2007) [2] . The differences between women and men in mathematical problem solving and leadership ability are statistically significant. But the word  significant  can cause people to interpret these differences as strong and important—perhaps even important enough to influence the college courses they take or even who they vote for. As we have seen, however, these statistically significant differences are actually quite weak—perhaps even “trivial.”

This is why it is important to distinguish between the  statistical  significance of a result and the  practical  significance of that result.  Practical significance refers to the importance or usefulness of the result in some real-world context. Many sex differences are statistically significant—and may even be interesting for purely scientific reasons—but they are not practically significant. In clinical practice, this same concept is often referred to as “clinical significance.” For example, a study on a new treatment for social phobia might show that it produces a statistically significant positive effect. Yet this effect still might not be strong enough to justify the time, effort, and other costs of putting it into practice—especially if easier and cheaper treatments that work almost as well already exist. Although statistically significant, this result would be said to lack practical or clinical significance.

Key Takeaways

  • Null hypothesis testing is a formal approach to deciding whether a statistical relationship in a sample reflects a real relationship in the population or is just due to chance.
  • The logic of null hypothesis testing involves assuming that the null hypothesis is true, finding how likely the sample result would be if this assumption were correct, and then making a decision. If the sample result would be unlikely if the null hypothesis were true, then it is rejected in favour of the alternative hypothesis. If it would not be unlikely, then the null hypothesis is retained.
  • The probability of obtaining the sample result if the null hypothesis were true (the  p  value) is based on two considerations: relationship strength and sample size. Reasonable judgments about whether a sample relationship is statistically significant can often be made by quickly considering these two factors.
  • Statistical significance is not the same as relationship strength or importance. Even weak relationships can be statistically significant if the sample size is large enough. It is important to consider relationship strength and the practical significance of a result in addition to its statistical significance.
  • Discussion: Imagine a study showing that people who eat more broccoli tend to be happier. Explain for someone who knows nothing about statistics why the researchers would conduct a null hypothesis test.
  • The correlation between two variables is  r  = −.78 based on a sample size of 137.
  • The mean score on a psychological characteristic for women is 25 ( SD  = 5) and the mean score for men is 24 ( SD  = 5). There were 12 women and 10 men in this study.
  • In a memory experiment, the mean number of items recalled by the 40 participants in Condition A was 0.50 standard deviations greater than the mean number recalled by the 40 participants in Condition B.
  • In another memory experiment, the mean scores for participants in Condition A and Condition B came out exactly the same!
  • A student finds a correlation of  r  = .04 between the number of units the students in his research methods class are taking and the students’ level of stress.

Long Descriptions

“Null Hypothesis” long description: A comic depicting a man and a woman talking in the foreground. In the background is a child working at a desk. The man says to the woman, “I can’t believe schools are still teaching kids about the null hypothesis. I remember reading a big study that conclusively disproved it years ago.” [Return to “Null Hypothesis”]

“Conditional Risk” long description: A comic depicting two hikers beside a tree during a thunderstorm. A bolt of lightning goes “crack” in the dark sky as thunder booms. One of the hikers says, “Whoa! We should get inside!” The other hiker says, “It’s okay! Lightning only kills about 45 Americans a year, so the chances of dying are only one in 7,000,000. Let’s go on!” The comic’s caption says, “The annual death rate among people who know that statistic is one in six.” [Return to “Conditional Risk”]

Media Attributions

  • Null Hypothesis by XKCD  CC BY-NC (Attribution NonCommercial)
  • Conditional Risk by XKCD  CC BY-NC (Attribution NonCommercial)
  • Cohen, J. (1994). The world is round: p < .05. American Psychologist, 49 , 997–1003. ↵
  • Hyde, J. S. (2007). New directions in the study of gender similarities and differences. Current Directions in Psychological Science, 16 , 259–263. ↵

Values in a population that correspond to variables measured in a study.

The random variability in a statistic from sample to sample.

A formal approach to deciding between two interpretations of a statistical relationship in a sample.

The idea that there is no relationship in the population and that the relationship in the sample reflects only sampling error.

The idea that there is a relationship in the population and that the relationship in the sample reflects this relationship in the population.

When the relationship found in the sample would be extremely unlikely, the idea that the relationship occurred “by chance” is rejected.

When the relationship found in the sample is likely to have occurred by chance, the null hypothesis is not rejected.

The probability that, if the null hypothesis were true, the result found in the sample would occur.

How low the p value must be before the sample result is considered unlikely in null hypothesis testing.

When there is less than a 5% chance of a result as extreme as the sample result occurring and the null hypothesis is rejected.

Research Methods in Psychology - 2nd Canadian Edition Copyright © 2015 by Paul C. Price, Rajiv Jhangiani, & I-Chant A. Chiang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

null hypothesis white test

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

5.2 - writing hypotheses.

The first step in conducting a hypothesis test is to write the hypothesis statements that are going to be tested. For each test you will have a null hypothesis (\(H_0\)) and an alternative hypothesis (\(H_a\)).

When writing hypotheses there are three things that we need to know: (1) the parameter that we are testing (2) the direction of the test (non-directional, right-tailed or left-tailed), and (3) the value of the hypothesized parameter.

  • At this point we can write hypotheses for a single mean (\(\mu\)), paired means(\(\mu_d\)), a single proportion (\(p\)), the difference between two independent means (\(\mu_1-\mu_2\)), the difference between two proportions (\(p_1-p_2\)), a simple linear regression slope (\(\beta\)), and a correlation (\(\rho\)). 
  • The research question will give us the information necessary to determine if the test is two-tailed (e.g., "different from," "not equal to"), right-tailed (e.g., "greater than," "more than"), or left-tailed (e.g., "less than," "fewer than").
  • The research question will also give us the hypothesized parameter value. This is the number that goes in the hypothesis statements (i.e., \(\mu_0\) and \(p_0\)). For the difference between two groups, regression, and correlation, this value is typically 0.

Hypotheses are always written in terms of population parameters (e.g., \(p\) and \(\mu\)).  The tables below display all of the possible hypotheses for the parameters that we have learned thus far. Note that the null hypothesis always includes the equality (i.e., =).

White's Test for Heteroskedasticity in a Linear Regression Model

Description.

This function implements the popular method of White (1980) for testing for heteroskedasticity in a linear regression model.

White's Test entails fitting an auxiliary regression model in which the response variable is the vector of squared residuals from the original model and the design matrix includes the original explanatory variables, their squares, and (optionally) their two-way interactions. The test statistic is the number of observations multiplied by the coefficient of determination from the auxiliary regression model:

T = n r_{\mathrm{aux}}^2

White's Test is thus a special case of the method of Breusch and Pagan (1979). Under the null hypothesis of homoskedasticity, the distribution of the test statistic is asymptotically chi-squared with parameter degrees of freedom. The test is right-tailed.

An object of class "htest" . If object is not assigned, its attributes are displayed in the console as a tibble using tidy .

Breusch TS, Pagan AR (1979). “A Simple Test for Heteroscedasticity and Random Coefficient Variation.” Econometrica , 47 (5), 1287–1294. White H (1980). “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity.” Econometrica , 48 (4), 817–838.

This function should not be confused with tseries::white.test , which does not implement the method of White (1980) for testing for heteroskedasticity in a linear model.

  • Terms of Use
  • Privacy Policy
  • Cookies Policy
  • Definitions
  • R Programming Language
  • Python Programming Language

Heteroskedasticity: Breusch-Pagan and White Tests

Last Update: February 21, 2022

Heteroskedasticity is when linear regression errors have non-constant variance. This can be tested through Breusch-Pagan test [ 1 ] which evaluates whether model independent variables explain its errors variance. If model independent variables explain its errors variance, then model errors are assumed heteroskedastic or with non-constant variance. This can also be tested through White test [ 2 ] which evaluates whether model independent variables and squared independent variables explain its errors variance. If model independent variables and squared independent variables explain its errors variance, then model errors are assumed heteroskedastic or with non-constant variance.

\hat{y}_{i}=\hat{\beta}_{0}+\hat{\beta}_{1}x_{1i}+\hat{\beta}_{2}x_{2i}\;(1)

Below, we find examples of Breusch-Pagan, White (no cross terms) and White (cross terms) tests auxiliary regressions Lagrange Multiplier (LM) tests from original multiple linear regression of house price explained by its lot size and number of bedrooms [ 3 ].

null hypothesis white test

My online courses are hosted at Teachable website .

For more details on this concept, you can view my Linear Regression Courses .

[1] Breusch, T. S.; Pagan, A. R. (1979). “A Simple Test for Heteroskedasticity and Random Coefficient Variation”. Econometrica . 47 (5): 1287–1294.

[2] White H (1980). “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity.” Econometrica , 48(4), 817–838.

[3] Data Description: Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

Original Source: Anglin, P., and Gencay, R. (1996). Semiparametric Estimation of a Hedonic Price Function. Journal of Applied Econometrics, 11, 633–648.

Source: AER R Package HousePrices Object. Christian Kleiber and Achim Zeileis. (2008). Applied Econometrics with R. Springer-Verlag, New York.

Privacy Overview

  • School Guide
  • Mathematics
  • Number System and Arithmetic
  • Trigonometry
  • Probability
  • Mensuration
  • Maths Formulas
  • Integration Formulas
  • Differentiation Formulas
  • Trigonometry Formulas
  • Algebra Formulas
  • Mensuration Formula
  • Statistics Formulas
  • Trigonometric Table

Null Hypothesis

  • Hypothesis Testing Formula
  • SQLite IS NULL
  • MySQL NOT NULL Constraint
  • Real-life Applications of Hypothesis Testing
  • Null in JavaScript
  • SQL NOT NULL Constraint
  • Null Cipher
  • Hypothesis in Machine Learning
  • Understanding Hypothesis Testing
  • Difference between Null and Alternate Hypothesis
  • Hypothesis Testing in R Programming
  • SQL | NULL functions
  • Kotlin Null Safety
  • PHP | is_null() Function
  • MySQL | NULLIF( ) Function
  • Null-Coalescing Operator in C#
  • Nullity of a Matrix

Null Hypothesis , often denoted as H 0, is a foundational concept in statistical hypothesis testing. It represents an assumption that no significant difference, effect, or relationship exists between variables within a population. It serves as a baseline assumption, positing no observed change or effect occurring. The null is t he truth or falsity of an idea in analysis.

In this article, we will discuss the null hypothesis in detail, along with some solved examples and questions on the null hypothesis.

Table of Content

What is Null Hypothesis?

Null hypothesis symbol, formula of null hypothesis, types of null hypothesis, null hypothesis examples, principle of null hypothesis, how do you find null hypothesis, null hypothesis in statistics, null hypothesis and alternative hypothesis, null hypothesis and alternative hypothesis examples, null hypothesis – practice problems.

Null Hypothesis in statistical analysis suggests the absence of statistical significance within a specific set of observed data. Hypothesis testing, using sample data, evaluates the validity of this hypothesis. Commonly denoted as H 0 or simply “null,” it plays an important role in quantitative analysis, examining theories related to markets, investment strategies, or economies to determine their validity.

Null Hypothesis Meaning

Null Hypothesis represents a default position, often suggesting no effect or difference, against which researchers compare their experimental results. The Null Hypothesis, often denoted as H 0 asserts a default assumption in statistical analysis. It posits no significant difference or effect, serving as a baseline for comparison in hypothesis testing.

The null Hypothesis is represented as H 0 , the Null Hypothesis symbolizes the absence of a measurable effect or difference in the variables under examination.

Certainly, a simple example would be asserting that the mean score of a group is equal to a specified value like stating that the average IQ of a population is 100.

The Null Hypothesis is typically formulated as a statement of equality or absence of a specific parameter in the population being studied. It provides a clear and testable prediction for comparison with the alternative hypothesis. The formulation of the Null Hypothesis typically follows a concise structure, stating the equality or absence of a specific parameter in the population.

Mean Comparison (Two-sample t-test)

H 0 : μ 1 = μ 2

This asserts that there is no significant difference between the means of two populations or groups.

Proportion Comparison

H 0 : p 1 − p 2 = 0

This suggests no significant difference in proportions between two populations or conditions.

Equality in Variance (F-test in ANOVA)

H 0 : σ 1 = σ 2

This states that there’s no significant difference in variances between groups or populations.

Independence (Chi-square Test of Independence):

H 0 : Variables are independent

This asserts that there’s no association or relationship between categorical variables.

Null Hypotheses vary including simple and composite forms, each tailored to the complexity of the research question. Understanding these types is pivotal for effective hypothesis testing.

Equality Null Hypothesis (Simple Null Hypothesis)

The Equality Null Hypothesis, also known as the Simple Null Hypothesis, is a fundamental concept in statistical hypothesis testing that assumes no difference, effect or relationship between groups, conditions or populations being compared.

Non-Inferiority Null Hypothesis

In some studies, the focus might be on demonstrating that a new treatment or method is not significantly worse than the standard or existing one.

Superiority Null Hypothesis

The concept of a superiority null hypothesis comes into play when a study aims to demonstrate that a new treatment, method, or intervention is significantly better than an existing or standard one.

Independence Null Hypothesis

In certain statistical tests, such as chi-square tests for independence, the null hypothesis assumes no association or independence between categorical variables.

Homogeneity Null Hypothesis

In tests like ANOVA (Analysis of Variance), the null hypothesis suggests that there’s no difference in population means across different groups.

  • Medicine: Null Hypothesis: “No significant difference exists in blood pressure levels between patients given the experimental drug versus those given a placebo.”
  • Education: Null Hypothesis: “There’s no significant variation in test scores between students using a new teaching method and those using traditional teaching.”
  • Economics: Null Hypothesis: “There’s no significant change in consumer spending pre- and post-implementation of a new taxation policy.”
  • Environmental Science: Null Hypothesis: “There’s no substantial difference in pollution levels before and after a water treatment plant’s establishment.”

The principle of the null hypothesis is a fundamental concept in statistical hypothesis testing. It involves making an assumption about the population parameter or the absence of an effect or relationship between variables.

In essence, the null hypothesis (H 0 ) proposes that there is no significant difference, effect, or relationship between variables. It serves as a starting point or a default assumption that there is no real change, no effect or no difference between groups or conditions.

The null hypothesis is usually formulated to be tested against an alternative hypothesis (H 1 or H [Tex]\alpha [/Tex] ) which suggests that there is an effect, difference or relationship present in the population.

Null Hypothesis Rejection

Rejecting the Null Hypothesis occurs when statistical evidence suggests a significant departure from the assumed baseline. It implies that there is enough evidence to support the alternative hypothesis, indicating a meaningful effect or difference. Null Hypothesis rejection occurs when statistical evidence suggests a deviation from the assumed baseline, prompting a reconsideration of the initial hypothesis.

Identifying the Null Hypothesis involves defining the status quotient, asserting no effect and formulating a statement suitable for statistical analysis.

When is Null Hypothesis Rejected?

The Null Hypothesis is rejected when statistical tests indicate a significant departure from the expected outcome, leading to the consideration of alternative hypotheses. It occurs when statistical evidence suggests a deviation from the assumed baseline, prompting a reconsideration of the initial hypothesis.

In statistical hypothesis testing, researchers begin by stating the null hypothesis, often based on theoretical considerations or previous research. The null hypothesis is then tested against an alternative hypothesis (Ha), which represents the researcher’s claim or the hypothesis they seek to support.

The process of hypothesis testing involves collecting sample data and using statistical methods to assess the likelihood of observing the data if the null hypothesis were true. This assessment is typically done by calculating a test statistic, which measures the difference between the observed data and what would be expected under the null hypothesis.

In the realm of hypothesis testing, the null hypothesis (H 0 ) and alternative hypothesis (H₁ or Ha) play critical roles. The null hypothesis generally assumes no difference, effect, or relationship between variables, suggesting that any observed change or effect is due to random chance. Its counterpart, the alternative hypothesis, asserts the presence of a significant difference, effect, or relationship between variables, challenging the null hypothesis. These hypotheses are formulated based on the research question and guide statistical analyses.

Difference Between Null Hypothesis and Alternative Hypothesis

The null hypothesis (H 0 ) serves as the baseline assumption in statistical testing, suggesting no significant effect, relationship, or difference within the data. It often proposes that any observed change or correlation is merely due to chance or random variation. Conversely, the alternative hypothesis (H 1 or Ha) contradicts the null hypothesis, positing the existence of a genuine effect, relationship or difference in the data. It represents the researcher’s intended focus, seeking to provide evidence against the null hypothesis and support for a specific outcome or theory. These hypotheses form the crux of hypothesis testing, guiding the assessment of data to draw conclusions about the population being studied.

Let’s envision a scenario where a researcher aims to examine the impact of a new medication on reducing blood pressure among patients. In this context:

Null Hypothesis (H 0 ): “The new medication does not produce a significant effect in reducing blood pressure levels among patients.”

Alternative Hypothesis (H 1 or Ha): “The new medication yields a significant effect in reducing blood pressure levels among patients.”

The null hypothesis implies that any observed alterations in blood pressure subsequent to the medication’s administration are a result of random fluctuations rather than a consequence of the medication itself. Conversely, the alternative hypothesis contends that the medication does indeed generate a meaningful alteration in blood pressure levels, distinct from what might naturally occur or by random chance.

People Also Read:

Mathematics Maths Formulas Probability and Statistics

Example 1: A researcher claims that the average time students spend on homework is 2 hours per night.

Null Hypothesis (H 0 ): The average time students spend on homework is equal to 2 hours per night. Data: A random sample of 30 students has an average homework time of 1.8 hours with a standard deviation of 0.5 hours. Test Statistic and Decision: Using a t-test, if the calculated t-statistic falls within the acceptance region, we fail to reject the null hypothesis. If it falls in the rejection region, we reject the null hypothesis. Conclusion: Based on the statistical analysis, we fail to reject the null hypothesis, suggesting that there is not enough evidence to dispute the claim of the average homework time being 2 hours per night.

Example 2: A company asserts that the error rate in its production process is less than 1%.

Null Hypothesis (H 0 ): The error rate in the production process is 1% or higher. Data: A sample of 500 products shows an error rate of 0.8%. Test Statistic and Decision: Using a z-test, if the calculated z-statistic falls within the acceptance region, we fail to reject the null hypothesis. If it falls in the rejection region, we reject the null hypothesis. Conclusion: The statistical analysis supports rejecting the null hypothesis, indicating that there is enough evidence to dispute the company’s claim of an error rate of 1% or higher.

Q1. A researcher claims that the average time spent by students on homework is less than 2 hours per day. Formulate the null hypothesis for this claim?

Q2. A manufacturing company states that their new machine produces widgets with a defect rate of less than 5%. Write the null hypothesis to test this claim?

Q3. An educational institute believes that their online course completion rate is at least 60%. Develop the null hypothesis to validate this assertion?

Q4. A restaurant claims that the waiting time for customers during peak hours is not more than 15 minutes. Formulate the null hypothesis for this claim?

Q5. A study suggests that the mean weight loss after following a specific diet plan for a month is more than 8 pounds. Construct the null hypothesis to evaluate this statement?

Summary – Null Hypothesis and Alternative Hypothesis

The null hypothesis (H 0 ) and alternative hypothesis (H a ) are fundamental concepts in statistical hypothesis testing. The null hypothesis represents the default assumption, stating that there is no significant effect, difference, or relationship between variables. It serves as the baseline against which the alternative hypothesis is tested. In contrast, the alternative hypothesis represents the researcher’s hypothesis or the claim to be tested, suggesting that there is a significant effect, difference, or relationship between variables. The relationship between the null and alternative hypotheses is such that they are complementary, and statistical tests are conducted to determine whether the evidence from the data is strong enough to reject the null hypothesis in favor of the alternative hypothesis. This decision is based on the strength of the evidence and the chosen level of significance. Ultimately, the choice between the null and alternative hypotheses depends on the specific research question and the direction of the effect being investigated.

FAQs on Null Hypothesis

What does null hypothesis stands for.

The null hypothesis, denoted as H 0 ​, is a fundamental concept in statistics used for hypothesis testing. It represents the statement that there is no effect or no difference, and it is the hypothesis that the researcher typically aims to provide evidence against.

How to Form a Null Hypothesis?

A null hypothesis is formed based on the assumption that there is no significant difference or effect between the groups being compared or no association between variables being tested. It often involves stating that there is no relationship, no change, or no effect in the population being studied.

When Do we reject the Null Hypothesis?

In statistical hypothesis testing, if the p-value (the probability of obtaining the observed results) is lower than the chosen significance level (commonly 0.05), we reject the null hypothesis. This suggests that the data provides enough evidence to refute the assumption made in the null hypothesis.

What is a Null Hypothesis in Research?

In research, the null hypothesis represents the default assumption or position that there is no significant difference or effect. Researchers often try to test this hypothesis by collecting data and performing statistical analyses to see if the observed results contradict the assumption.

What Are Alternative and Null Hypotheses?

The null hypothesis (H0) is the default assumption that there is no significant difference or effect. The alternative hypothesis (H1 or Ha) is the opposite, suggesting there is a significant difference, effect or relationship.

What Does it Mean to Reject the Null Hypothesis?

Rejecting the null hypothesis implies that there is enough evidence in the data to support the alternative hypothesis. In simpler terms, it suggests that there might be a significant difference, effect or relationship between the groups or variables being studied.

How to Find Null Hypothesis?

Formulating a null hypothesis often involves considering the research question and assuming that no difference or effect exists. It should be a statement that can be tested through data collection and statistical analysis, typically stating no relationship or no change between variables or groups.

How is Null Hypothesis denoted?

The null hypothesis is commonly symbolized as H 0 in statistical notation.

What is the Purpose of the Null hypothesis in Statistical Analysis?

The null hypothesis serves as a starting point for hypothesis testing, enabling researchers to assess if there’s enough evidence to reject it in favor of an alternative hypothesis.

What happens if we Reject the Null hypothesis?

Rejecting the null hypothesis implies that there is sufficient evidence to support an alternative hypothesis, suggesting a significant effect or relationship between variables.

What are Test for Null Hypothesis?

Various statistical tests, such as t-tests or chi-square tests, are employed to evaluate the validity of the Null Hypothesis in different scenarios.

Please Login to comment...

Similar reads.

  • Geeks Premier League 2023
  • Math-Concepts
  • Geeks Premier League
  • School Learning

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

Statology

Statistics Made Easy

The Breusch-Pagan Test: Definition & Example

One of the key assumptions of linear regression is that the residuals are distributed with equal variance at each level of the predictor variable. This assumption is known as homoscedasticity .

When this assumption is violated, we say that heteroscedasticity is present in the residuals. When this occurs, the results of the regression become unreliable.

One way to visually detect whether heteroscedasticity is present is to create a plot of the residuals against the fitted values of the regression model.

If the residuals become more spread out at higher values in the plot, this is a tell-tale sign that heteroscedasticity is present.

Example of heteroscedasticity for a Breusch-Pagan Test

A formal statistical test we can use to determine if heteroscedasticity is present is the  Breusch-Pagan test .

This tutorial provides a brief explanation of the Breusch-Pagan test along with an example.

What is the Breusch-Pagan Test?

The  Breusch-Pagan test is used to determine whether or not heteroscedasticity is present in a regression model.

The test uses the following null and alternative hypotheses :

  • Null Hypothesis (H 0 ): Homoscedasticity is present (the residuals are distributed with equal variance)
  • Alternative Hypothesis (H A ): Heteroscedasticity is present (the residuals are not distributed with equal variance)

If the p-value of the test is less than some significance level (i.e. α = .05) then we reject the null hypothesis and conclude that heteroscedasticity is present in the regression model.

We use the following steps to perform a Breusch-Pagan test:

1.  Fit the regression model.

2.  Calculate the squared residuals of the model.

3. Fit a new regression model, using the squared residuals as the response values.

4.  Calculate the Chi-Square test statistic X 2 as n*R 2 new where:

  • n: The total number of observations
  • R 2 new : The R-squared of the new regression model that used the squared residuals as the response values

If the p-value that corresponds to this Chi-Square test statistic with  p (the number of predictors) degrees of freedom is less than some significance level (i.e. α = .05) then reject the null hypothesis and conclude that heteroscedasticity is present.

Otherwise, fail to reject the null hypothesis. In this case, it’s assumed that homoscedasticity is present.

Note that most statistical software can easily perform the Breusch-Pagan test so you will likely never have to perform these steps by hand, but it’s useful to know what’s going on behind the scenes.

An Example of the Breusch-Pagan Test

Suppose we have the following dataset that contains information for 10 different basketball players:

null hypothesis white test

Using statistical software, we fit the following multiple linear regression model :

rating = 62.47 + 1.12*(points) + 0.88*(assists) – 0.43*(rebounds)

We then use this model to make predictions for the rating of each player and calculated the squared residuals (i.e. the squared difference between the predicted rating and the actual rating):

null hypothesis white test

Next, we fit a new regression model using the squared residuals as the response values and the original predictor variables as the predictor variables once again. We find the following:

  • R 2 new : 0.600395

Thus, our Chi-Square test statistic for the Breusch-Pagan test is n*R 2 new = 10*.600395 = 6.00395 . The degrees of freedom is  p = 3 predictor variables.

According to the Chi-Square to P-Value Calculator , the p-value that corresponds to X 2 = 6.00395 with 3 degrees of freedom is  0.111418 .

Since this p-value is not less than .05, we fail to reject the null hypothesis. Thus, we assume that homoscedasticity is present.

The Breusch-Pagan Test in Practice

The following tutorials provide step-by-step examples of how to perform the Breusch-Pagan test in different statistical programs:

How to Perform a Breusch-Pagan Test in Excel How to Perform a Breusch-Pagan Test in R How to Perform a Breusch-Pagan Test in Python How to Perform a Breusch-Pagan Test in Stata

Featured Posts

7 Common Beginner Stats Mistakes and How to Avoid Them

Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike.  My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

7 Replies to “The Breusch-Pagan Test: Definition & Example”

You have no idea how many “lives” you’ve just saved with this wonderful tutorial and resources to go along with it.

Thank you so much!

There’s a typo in the null hypothesis… it should read:

Null Hypothesis (H0): Homoscedasticity is not present (the residuals are distributed with equal variance)

Thank you very much for your work ! Is it normal that the squared residuals don’t correspond to predicted rating minus rating ? I don’t find the same results as you…

I thank you for your answer,

Hi Zack. I was stuck and you have rescued me. Thanks! Joshua

I think you got the hypothesis part wrong. Based on my sources, if the chi-square calculated is less than the critical value, homoskedasticity is present. You got it the other way around. A typo perhaps.

Hi Yuds…You’re absolutely correct, and I appreciate the clarification. My apologies for any confusion caused by the error. Let me correct that:

In the context of testing for homoscedasticity using the Breusch-Pagan test, the null hypothesis is that the errors are homoscedastic (constant variance), while the alternative hypothesis is that the errors are heteroscedastic (varying variance).

To conduct the Breusch-Pagan test for homoscedasticity in R, you would typically use the `bptest()` function from the `lmtest` package. The function returns a test statistic and a p-value. If the p-value is less than your chosen significance level (e.g., 0.05), you reject the null hypothesis of homoscedasticity, indicating the presence of heteroscedasticity.

Here’s an example:

“`R # Load the lmtest package library(lmtest)

# Fit a linear regression model model <- lm(y ~ x1 + x2, data = your_data) # Perform Breusch-Pagan test for homoscedasticity test_result <- bptest(model) # Print test result print(test_result) ``` If the p-value from `bptest()` is less than your chosen significance level (e.g., 0.05), you would conclude that there is evidence of heteroscedasticity in the errors. Conversely, if the p-value is greater than your significance level, you would fail to reject the null hypothesis and conclude that homoscedasticity is present. Again, I appreciate your attention to detail and apologize for any confusion caused by the earlier error.

True. I think the hypothesis has been stated incorrectly and the inference that would be would be incorrect.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Join the Statology Community

Sign up to receive Statology's exclusive study resource: 100 practice problems with step-by-step solutions. Plus, get our latest insights, tutorials, and data analysis tips straight to your inbox!

By subscribing you accept Statology's Privacy Policy.

IMAGES

  1. Illustration of a null hypothesis (H 0 ) and alternative hypothesis (H

    null hypothesis white test

  2. Null Hypothesis

    null hypothesis white test

  3. For the model in the previous question, test the…

    null hypothesis white test

  4. Null Hypothesis Significance Tests

    null hypothesis white test

  5. White Test for Heteroscedasticity

    null hypothesis white test

  6. Mastering Hypothesis Writing: Expert Tips for 2023

    null hypothesis white test

VIDEO

  1. Null and Alternative Hypothesis

  2. Test of hypothesis (null and alternative) #nullhypothesis #alternativehypothesis

  3. Difference between Null and Alternative hypothesis #viral #statistics #psychology #hypothesistesting

  4. Global Test

  5. Illustrating Null and Alternative hypothesis, Level of Significance, Rejection Region

  6. Hypothsis Testing in Statistics Part 2 Steps to Solving a Problem

COMMENTS

  1. White Test: Definition, Examples

    A graph showing heteroscedasticity; the White test is used to identify heteroscedastic errors in regression analysis. The null hypothesis for White's test is that the variances for the errors are equal. In math terms, that's: H 0 = σ 2i = σ 2. The alternate hypothesis (the one you're testing), is that the variances are not equal: H 1 ...

  2. How to Perform White's Test in R (With Examples)

    The test statistic is X 2 = 7.0766. The degrees of freedom is 5. The corresponding p-value is 0.215. White's test uses the following null and alternative hypotheses: Null (H 0): Homoscedasticity is present. Alternative (H A): Heteroscedasticity is present. Since the p-value is not less than 0.05, we fail to reject the null hypothesis.

  3. White test

    White test is a statistical test that establishes whether the variance of the errors in a regression model is constant: that is for homoskedasticity . This test, and an estimator for heteroscedasticity-consistent standard errors, were proposed by Halbert White in 1980. [1] These methods have become widely used, making this paper one of the most ...

  4. 9.1: Null and Alternative Hypotheses

    The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. \(H_0\): The null hypothesis: It is a statement of no difference between the variables—they are not related. This can often be considered the status quo and as a result if you cannot accept the null it requires some action.

  5. How to Perform White's Test in Python (Step-by-Step)

    Step 3: Perform White's Test. Next, we will use the het_white () function from the statsmodels package to perform White's test to determine if heteroscedasticity is present in the regression model: white_test = het_white(model.resid, model.model.exog) #define labels to use for output of White's test.

  6. Null Hypothesis: Definition, Rejecting & Examples

    Null Hypothesis H 0: The correlation in the population is zero: ρ = 0. Alternative Hypothesis H A: The correlation in the population is not zero: ρ ≠ 0. For all these cases, the analysts define the hypotheses before the study. After collecting the data, they perform a hypothesis test to determine whether they can reject the null hypothesis.

  7. Hypothesis Testing

    Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test. Step 4: Decide whether to reject or fail to reject your null hypothesis. Step 5: Present your findings. Other interesting articles. Frequently asked questions about hypothesis testing.

  8. 16.3: The Process of Null Hypothesis Testing

    16.3.5 Step 5: Determine the probability of the data under the null hypothesis. This is the step where NHST starts to violate our intuition - rather than determining the likelihood that the null hypothesis is true given the data, we instead determine the likelihood of the data under the null hypothesis - because we started out by assuming that the null hypothesis is true!

  9. 6a.1

    The first step in hypothesis testing is to set up two competing hypotheses. The hypotheses are the most important aspect. If the hypotheses are incorrect, your conclusion will also be incorrect. The two hypotheses are named the null hypothesis and the alternative hypothesis. The null hypothesis is typically denoted as H 0.

  10. White's test

    White's test. A test of the null hypothesis of homoscedasticity against the alternative of heteroscedasticity, based on the fact that under heteroscedasticity the ordinary least squares estimator of the covariance matrix is inconsistent. The test is performed by regressing squared OLS residuals from the main regression on all explanatory ...

  11. Understanding Null Hypothesis Testing

    A crucial step in null hypothesis testing is finding the likelihood of the sample result if the null hypothesis were true. This probability is called the p value. A low p value means that the sample result would be unlikely if the null hypothesis were true and leads to the rejection of the null hypothesis. A high p value means that the sample ...

  12. 5.2

    5.2 - Writing Hypotheses. The first step in conducting a hypothesis test is to write the hypothesis statements that are going to be tested. For each test you will have a null hypothesis (\ (H_0\)) and an alternative hypothesis (\ (H_a\)). Null Hypothesis. The statement that there is not a difference in the population (s), denoted as \ (H_0\)

  13. PDF Chapter 8 Heteroskedasticity

    Test statistic: Reject the null hypothesis if: 2 2 NR a > ... White test: NR2 = 50.84 2 12282.71 0.44 (16752.29) (0.12) CO i = ++GNI e ii Sample Size = 182 R-squared = 0.61 Uncorrected Robust ln( 2 ) 1.09 1.01ln( ) (0.27) (0.026) CO i −+= + GNI e i i

  14. R: White's Test for Heteroskedasticity in a Linear Regression

    The test statistic is the number of observations multiplied by the coefficient of determination from the auxiliary regression model: T = n r_{\mathrm{aux}}^2 T = nraux2. White's Test is thus a special case of the method of Breusch and Pagan (1979). Under the null hypothesis of homoskedasticity, the distribution of the test statistic is ...

  15. Heteroskedasticity: Breusch-Pagan and White Tests

    Notice that null hypothesis test (4) can be done with an F-test. After that, we can also do White test (no cross terms) auxiliary regression with formula and chi-square test with joint null hypothesis that independent variables and squared independent variables coefficients are equal to zero with formula . If joint null hypothesis is rejected ...

  16. How to Write a Null Hypothesis (5 Examples)

    Whenever we perform a hypothesis test, we always write a null hypothesis and an alternative hypothesis, which take the following forms: H0 (Null Hypothesis): Population parameter =, ≤, ≥ some value. HA (Alternative Hypothesis): Population parameter <, >, ≠ some value. Note that the null hypothesis always contains the equal sign.

  17. Null & Alternative Hypotheses

    Revised on June 22, 2023. The null and alternative hypotheses are two competing claims that researchers weigh evidence for and against using a statistical test: Null hypothesis (H0): There's no effect in the population. Alternative hypothesis (Ha or H1): There's an effect in the population. The effect is usually the effect of the ...

  18. PDF whitestrap: White Test and Bootstrapped White Test for Heteroskedasticity

    Title White Test and Bootstrapped White Test for Heteroskedasticity Version 0.0.1 ... The null hypothesis states that 1 = 2 = 0 (homoskedasticity). The test statistic is defined as: LM= nR2 where R2 is the R-squared value from the regression of u2. Value AA list with class white_test containing:

  19. White Heteroskedasticity Test

    This test is similar to the Breusch-Pagan Test, except that in the second OLS regression, in addition to the variables x 1, …, x k we also include the independent variables x 1 2, …, x k 2 as well as x 1 x j for all i ≠ j.This test takes the form. where m = the number of independent variables in the second regression, not counting the constant term. Thus m = 2k + C(k,2).

  20. Test for Heteroskedasticity with the White Test

    The degrees of freedom for the F-test are equal to 2 in the numerator and n - 3 in the denominator. The degrees of freedom for the chi-squared test are 2. If either of these test statistics is significant, then you have evidence of heteroskedasticity. If not, you fail to reject the null hypothesis of homoskedasticity.

  21. Null Hypothesis

    Null Hypothesis, often denoted as H0, is a foundational concept in statistical hypothesis testing. It represents an assumption that no significant difference, effect, or relationship exists between variables within a population. It serves as a baseline assumption, positing no observed change or effect occurring.

  22. The Breusch-Pagan Test: Definition & Example

    We find the following: n: 10. R2new: 0.600395. Thus, our Chi-Square test statistic for the Breusch-Pagan test is n*R2new = 10*.600395 = 6.00395. The degrees of freedom is p = 3 predictor variables. According to the Chi-Square to P-Value Calculator, the p-value that corresponds to X2 = 6.00395 with 3 degrees of freedom is 0.111418.

  23. stata

    In the Portmanteau test, the null hypothesis is that the variable follows a white noise process. And just like in any other statistical inference, a p-value that is smaller than the significance level, α (the probability of rejecting a true null hypothesis, usually set at α=0.05) means that you reject the null and conclude that the variable is not a white noise.