hypothesis testing multiple means

Comparing More Than Two Means: One-Way ANOVA

When you have several means to compare, it’s not valid just to compare all possible pairs with t tests. Instead, you follow a two-stage process:

Are all the means equal? A computation called ANOVA (analysis of variance) answers this question.
If ANOVA shows that the means aren’t all equal, then which means are unequal, and by how much? There are many ways to answer this question (and they give different answers), but we’ll use a process called Tukey’s HSD (Honestly Significant Difference).

Terminology

Example 1: fat for frying donuts, requirements for anova, perform a 1-way anova test, estimating differences of means, other comparisons, example 2: stock market, example 3: crt lifetimes, why not just pick two means and do a t test, how anova works, estimating individual treatment means, η²: strength of association, what’s new.

The factor that varies between samples is called the factor . (Every once in a while things are easy.) The r different values or levels of the factor are called the treatments . Here the factor is the choice of fat and the treatments are the four fats, so r = 4.

The computations to test the means for equality are called a 1-way ANOVA or 1-factor ANOVA .

Hoping to produce a donut that could be marketed to health-conscious consumers, a company tried four different fats to see which one was least absorbed by the donuts during the deep frying process. Each fat was used for six batches of two dozen donuts each, and the table shows the grams of fat absorbed by each batch of donuts.

It looks like donuts absorb the most of Fat 2 and the least of Fat 4, with intermediate amounts of Fat 1 and Fat 3. But there’s a lot of overlap, too: for instance, even though the mean for Fat 2 is much higher than for Fat 1, one sample of Fat 1, 95 g, is higher than five of the six samples of Fat 2.

Nevertheless, the sample means do look different. But what about the population means? In other words, would the four fats be absorbed in different if you made a whole lot of batches of donuts — do statistics justify choosing one fat over another? This is the basic question of a hypothesis test or significance test: is the difference great enough that you can rule out chance?

If Fats 2 and 4 were the only ones you had data for, you’d do a good old 2-sample t test. So why can’t you do that anyway? because that would greatly increase your chances of a Type I error. The reasons are given in the Appendix .

By the way, though usually you are interested in the differences between population means with various treatments, you can also estimate the individual means. If you’re interested, see Estimating Individual Treatment Means in the Appendix.

Step 1: ANOVA Test for Equality of All Means

The ANOVA procedure tests these hypotheses:

H 0 : μ 1 = μ 2 = … = μ r , all the means are the same

H 1 : two or more means are different from the others

Let’s test these hypotheses at the α = 0.05 significance level.

You might wonder why you do analysis of variance to test means , but this actually makes sense. The question, remember, is whether the observed difference in means is too large to be the result of random selection. How do you decide whether the difference is too large? You look at the absolute difference of means between treatments (samples), but you also consider the variability within each treatment. Intuitively, if the difference between treatments is a lot bigger than the difference within treatments, you conclude that it’s not due to random chance and there is a real effect.

And this is just how ANOVA works: comparing the variation between groups to the variation within groups. Hence, analysis of variance .

You need r simple random samples for the r treatments, and they need to be independent samples. The sample sizes need not be the same, though it’s best if they’re not very different.
The underlying populations should be normally distributed . However, the ANOVA test is robust and moderate departures from normality aren’t a problem, especially if sample sizes are large and equal or nearly equal ( Kuzma & Bohnenblust 2005 [full citation at https://BrownMath.com/swt/sources.htm#so_Kuzma2005] page 180).

Miller 1986 [full citation in “References”, below] (pages 90–91) is more cautious. When sample sizes are equal but standard deviations are not, the actual p-value will be slightly larger than what you find in the tables. But when sample sizes are unequal, and the smaller samples have the larger standard deviations, the actual p-value “can increase dramatically above” what the tables say, even “without too much disparity” in the standard deviations. “Falsely reporting significant results when the small samples have the larger variances is a serious worry. The lesson to be learned is to balance the experiment [equal sample sizes] if at all possible. ”

A 1-way ANOVA tests whether the means of all groups are equal for different levels of one factor, using some fairly lengthy calculations. You could do all the computations by hand as shown in the Appendix, but no one ever does. Here are some alternatives:

Excel’s Anova: Single Factor command is in the Tools » Data Analysis menu in Excel 2003 and below, or the Data » Analysis » Data Analysis menu in Excel 2007. If you don’t see it there, follow instructions in Excel help to load the Analysis Toolpak.
On a TI-83 or TI-84, enter each sample in a statistics list, then press [ STAT ] [ ◄ ] [ ▲ ] to select ANOVA , and enter the list names separated by commas.
There are even Web-based ANOVA calculators, such as Lowry 2001b [full citation in “References”, below] .
There are many software packages for mathematics and statistics that include ANOVA calculations. One of them, R , is highly regarded and is open source.

When you use a calculator or computer program to do ANOVA, you get an ANOVA table that looks something like this:

Note that the mean square between treatments, 545.4, is much larger than the mean square within treatments, 100.9. That ratio, between-groups mean square over within-groups mean square, is called an F statistic (F = MS B /MS W = 5.41 in this example). It tells you how much more variability there is between treatment groups than within treatment groups. The larger that ratio, the more confident you feel in rejecting the null hypothesis , which was that all means are equal and there is no treatment effect.

But what you care about is the p-value of 0.0069, obtained from the F distribution. The p-value has the usual interpretation: the probability of the between-treatments MS being ≥5.41 times the within-treatments MS, if the null hypothesis is true, is p = 0.0069.

The p-value is below your significance level of 0.05: it would be quite unlikely to have MS B /MS W this large if there were no real difference among the means. Therefore you reject H 0 and accept H 1 , concluding that the mean absorption of all the fats is not the same .

An interesting extra parameter can be derived from the ANOVA table; see η²: Strength of Association in the Appendix below.

Now that you know that it does make a difference which fat is used, you naturally want to know which fats are significantly different. This is post-hoc analysis . There are several different post-hoc analyses, and no one is superior on all points, but the most common choice is the Tukey HSD.

Step 2: Tukey HSD for Post-Hoc Analysis

If your ANOVA test shows that the means aren’t all equal, your next step is to determine which means are different, to your level of significance. You can’t just perform a series of t tests , because that would greatly increase your likelihood of a Type I error. So what do you do?

John Tukey gave one answer to this question, the HSD (Honestly Significant Difference) test. You compute something analogous to a t score for each pair of means, but you don’t compare it to the Student’s t distribution. Instead, you use a new distribution called the studentized range or q distribution .

Caution: Perform post-hoc analysis only if the ANOVA test shows a p-value less than your α. If p>α, you don’t know whether the means are all the same or not, and you can’t go fishing for unequal means.

You generally want to know not just which means differ, but by how much they differ (the effect size ). The easiest thing is to compute the confidence interval first, and then interpret it for a significant difference in means (or no significant difference). You’ve already seen this relationship between a test of significance at the α level and a 1−α confidence interval:

If the endpoints of the CI have the same sign (both positive or both are negative), then 0 is not in the interval and you can conclude that the means are different.
If the endpoints of the CI have opposite signs, then 0 is in the interval and you can’t determine whether the means are equal or different .

You compute that confidence interval similarly to the confidence interval for the difference of two means, but using the q distribution which avoids the problem of inflating α :

where x̅ i and x̅ j are the two sample means, n i and n j are the two sample sizes, MS W is the within-groups mean square from the ANOVA table , and q is the critical value of the studentized range for α, the number of treatments or samples r , and the within-groups degrees of freedom df W . The square-root term is called the standardized error (as opposed to standard error).

Using the studentized range, developed by Tukey, overcomes the problem of inflated significance level that I talked about earlier. If sample sizes are equal, the risk of a Type I error is exactly α, and if sample sizes are unequal it’s less than α: the procedure is conservative . In terms of confidence intervals, if the sample sizes are equal then the confidence level is the stated 1−α, but if the sample size are unequal then the actual confidence level is greater than 1−α ( NIST 2012 [full citation in “References”, below] section 7.4.7.1).

Usually the comparisons are presented in a table, like this one for the example with frying donuts :

How do you read the table, and how was it constructed? Look first at the rows. Each row compares one pair of treatments.

If you have r treatments, there will be r ( r −1)/2 pairs of means. The “/2” part comes because there’s no need to compare Fat 1 to Fat 2 and then Fat 2 to Fat 1. If Fat 1 is absorbed less than Fat 2, then Fat 2 is absorbed more than Fat 1 and by the same amount.

Now look at the columns. I’ll work through all the columns of the first row with you, and you can interpret the others in the same way.

The row heading tells you which treatments are being compared in this row , and the direction of comparison.
The next column gives the point estimate of difference , which is nothing more than the difference or the two sample means. The sample means of Fat 1 and Fat 2 were 72 and 85, so the difference is −13: the sample average of Fat 1 was 13 g less fat absorbed than the sample average of Fat 2.

For this experiment, we had four treatments and df W from the ANOVA table was 20, so we need q(0.05, 4, 20). Your textbook may have a table of critical values for the studentized range, or you can look up q in an online table such as the one at the end of Abdi and Williams 2010 [full citation in “References”, below] , or find it with an online calculator like Lowry 2001a [full citation in “References”, below] . Most textbooks don’t have a table of q, and the TI calculators can’t compute it.)

Different sources give slightly different critical values of q, I suspect because q is extremely difficult to compute. One value I found was q(0.05,4,20) = 3.9597.

In an experiment with unequal sample sizes, the standardized error would vary for comparing different pairs of treatments. But in this experiment, every treatment has six data points, and so the standardized error is the same for every pair of means:

√ (MS W /2)·(1/6+1/6) = √ (100.9/2)·(2/6) = 4.1008

The confidence interval for the difference between Fat 1 and Fat 2 goes from a negative to a positive, so it does include zero. That means the two fats might have the same or different absorption, so you can’t say whether there’s a difference.

Caution : It’s generally best not to say that there is no significant difference. Even though that’s literally true, it’s easily misinterpreted to mean that the absorption of the two fats is the same, and you don’t know that. It might be, and it might not be. Stick to neutral language .

On the other hand, when the endpoints of the confidence interval are both positive or both negative, then 0 is not in the interval and we reject the null hypothesis of equality. In this table, only Fats 2 and 4 have a significant difference.

Interpretation : Fats 2 and 4 are not equally absorbed in frying donuts, and we’re 95% confident that a batch of 24 donuts absorbs 6.8 g to 30.2 g more of Fat 2 than Fat 4.

It’s possible to make more complicated comparisons. For instance, with a control group and two treatments you might compare the mean of the control group to the average of the means of the two treatments. Any kind of linear comparison can be done using a procedure developed by Henry Scheffé. A good brief explanation of Scheffé’s method is at NIST 2012 [full citation in “References”, below] section 7.4.7.2.

Tukey’s method is best when you are simultaneously comparing all pairs of means. If you have pre-selected a subset of means to compare, the Bonferroni method ( NIST 2012 [full citation in “References”, below] section 7.4.7.3) may be better.

A stock analyst randomly selected eight stocks in each of three industries and compiled the five-year rate of return for each stock. The analyst would like to know whether any of the industries have a different rate of return from the others, at the 0.05 significance level.

Solution : The hypotheses are

H 0 : = μ F = μ E = μ U , all three industries have the same average rate of return

H 1 : the industries don’t all have the same average rate of return

You can use a normal probability plot to assess normality for each sample; see MATH200A Program part 4 . The standard deviations of the three samples are fairly close together, so the requirements are met.

Here is the ANOVA table:

The F statistic is only 2.08, so the variation between groups is only about double the variation within groups. The high p-value makes you fail to reject H 0 and you cannot reach a conclusion about differences between average rates of returns for the three industries.

Since you failed to reject H 0 in the initial ANOVA test, you can’t do any sort of post-hoc analysis and look for differences between any particular pairs of means. (Well, you can , but you know in advance that all of the intervals will include zero, meaning that you don’t know whether any particular sector has a different return from any other sector or not.)

A company makes three types of high-performance CRTs. A random sample finds lifetimes shown in the table at right. At the 0.05 level, is there a difference in the average lifetimes of the three types?

Solution : Your hypotheses are

H 0 : μ A = μ B = μ C , the three types have equal mean lifetime

H 1 : the three types don’t all have the same mean lifetime

Excel or the TI-83/84 gives you this ANOVA table:

p<α, so you reject H 0 and accept H 1 , concluding that the three types don’t all have the same mean lifetime.

Since you were able to reject the null hypothesis, you can proceed with post-hoc analysis to determine which means are different and the size of the difference. Here is the table:

This result might surprise you: although the three means aren’t all equal, you can’t say that any two of the means are unequal. But when you look more closely at the numbers, this doesn’t seem quite so unreasonable.

First, look at the p-value in the ANOVA table: 0.0442 is below 0.05, yes, but it’s not very far below. There’s almost a 4½% chance that we’re committing a Type I error in rejecting H 0 . Next, look at the confidence interval μ A −μ B . While the interval does include 0, it’s extremely lopsided and almost doesn’t include 0.

Though we’re used to thinking of significance as “either it is or it isn’t”, there are cases where the decision is a close one, and this is one of those cases. And the confidence intervals are computed by a different method than the significance test, using a different distribution. Here again, the decision is a close one. So what we have is two close decisions, based on different computations, one falling slightly on one side of the line and the other falling slightly on the other side of the line. It’s a good reminder that in statistics we’re dealing with probabilities, not certainties.

Appendix (The Hard Stuff)

The following sections are for students who want to know more than just the bare bones of how to do a 1-way ANOVA test.

Remember that you have to set up hypotheses up before you know the data. Before you’ve actually fried the donuts, you have no reason to expect any particular outcome. Specifically, until you have the data you have no reason to think Fats 2 and 4 are any more different than Fats 1 and 4, or any other pair.

Why can’t you collect the data and then select your hypotheses? Because that can put significance on a chance event. For example, a golfer hits a ball and it lands on a particular tuft of grass. The probability of landing on that particular tuft is extremely small, so there’s something different about that particular tuft, right? Obviously not! It’s a logical fallacy to decide what to test after you already have the data.

So if you want to do a 2-sample t test in differences among four fats you would have to test every pair of fats: 1 and 2, 1 and 3 1 and 4, 2 and 3, 2 and 4, 3 and 4. That’s six hypotheses in all.

Well, why not do a 0.05 significance test on pair of means? Remember what a 0.05 significance level means: you’re willing to accept a 5% chance of a Type I error, rejecting H 0 when it’s actually true. But if you test six 0.05 hypotheses on the same set of data, you’re much more likely to commit a Type I error. How much more likely? Well, for each hypothesis there’s a 95% chance of escaping a Type I error, but the probability of escaping a Type I error six times in a row is 0.95 6 = 0.7351. 1−0.7351 = 0.2649, so if you test all six pairs at the 0.05 level, you’re more likely than one chance in four to get a false positive , finding a difference between two fats when there’s actually no difference.

In general, if you have r treatments, there are r ( r −1)/2 pairs of means to compare. If you test each pair at significance level α, the overall probability of a Type I error is 1 − (1−α) r ( r −1)/2 . The table at right shows the effective α for various numbers of treatments when the nominal α is 0.05 or 0.01. You can see that testing multiple hypotheses increases your α dramatically. Even with just three treatments, the effective α is almost three times the nominal α. This is clearly unacceptable.

Why not just lower your alpha? Because as you lower your α you increase your β, the chance of a Type II error. β represents the probability of a false negative , failing to find a difference in fats when there actually is a difference. This, too, is unacceptable.

So you have to find a way to test all the pairs of means at the same time, in one test. The solution is an extension of the t test to multiple samples, and it’s called ANOVA. ( If you have only two treatments, ANOVA computes the same p-value as a two-sample t test, but at the cost of extra effort.)

How does the ANOVA procedure compute a p-value? This section shows you the formulas and carries through the computations for the example with fat for frying donuts .

Remember, long ago in a galaxy called Descriptive Statistics, how the variance was defined: find the mean, then for each data point take the square of its difference from the mean. Add up all those squares, and you have SS( x ), the sum of squared deviations in x . The variance was SS( x ) divided by the degrees of freedom n −1, so it was a kind of average or mean squared deviation. You probably learned the shortcut computational formulas:

SS( x ) = ∑ x ² − (∑ x )²/ n or SS( x ) = ∑ x ² − n x̅ ²

s ² = MS( x ) = SS( x )/ df where df = n −1

In 1-way ANOVA, we extend those concepts a bit. First you partition SS( x ) into between-treatments and within-treatments parts, SS B and SS W . Then you compute the mean square deviations:

MS B is called the between-treatments mean square, between-groups variance, or factor MS . It measures the variability associated with the different treatment levels or different values of the factor.
MS W is called the within-treatments mean square, within-group variance, pooled variance, or error MS . It measures the variability that is not associated with the different treatments.

Finally you divide the two to obtain your test statistic, F = MS B /MS W , and you look up the p-value in a table of the F distribution.

(The F distribution is named after “the celebrated R.A. Fisher” ( Kuzma & Bohnenblust 2005 [full citation at https://BrownMath.com/swt/sources.htm#so_Kuzma2005] , 176). You may have already seen the F distribution in computing a different ratio of variances, as part of testing the variances of two populations for equality.)

There are several ways to compute the variability, but they all come up with the same answers and this method in Spiegel and Stephens 1999 [full citation in “References”, below] pages 367–368 is as easy as any:

r is the number of treatments.
n j , x̅ j , s j for each treatment are the sample size, sample mean, and sample standard deviation.

x̅ = ∑ n j x̅ j / N

You begin with the treatment means x̅ j ={72, 85, 76, 62} and the overall mean x̅ =73.75, then compute

SS B = (6×72²+6×85²+6×76²+6×62²) − 24×73.75² = 1636.5

MS B = 1636.5 / 3 = 545.4

The next step depends on whether you know the standard deviations s j of the samples. If you don’t, then you jump to the third row of the table to compute the overall sum of squares:

∑ x ² = 64² + 72² + 68² + … + 70² + 68² = 134192

SS tot = ∑ x ² − N x̅ ² = 134192 − 24×73.75² = 3654.5

Then you find SS W by subtracting the “between” sum of squares SS B from the overall sum of squares SS tot :

SS W = SS tot −SS B = 3654.5−1636.5 = 2018.0

MS W = 2018.0 / 20 = 100.9

Now you’re almost there. You want to know whether the variability between treatments, MS B , is greater than the variability within treatments, MS W . If it’s enough greater, then you conclude that there is a real difference between at least some of the treatment means and therefore that the factor has a real effect. To determine this, divide

F = MS B /MS W = 5.41

This is the F statistic. The F distribution is a one-tailed distribution that depends on both degrees of freedom, df B and df W .

At long last, you look up F=5.41 with 3 and 20 degrees of freedom, and you find a p-value of 0.0069. The interpretation is the usual one: there’s only a 0.0069 chance of getting an F statistic greater than 5.41 (or higher variability between treatments relative to the variability within treatments) if there is actually no difference between treatments. Since the p-value is less than α, you conclude that there is a difference.

Usually you’re interested in the contrast between two treatments, but you can also estimate the population mean for an individual treatment. You do use a t interval, as you would when you have only one sample, but the standard error and degrees of freedom are different ( NIST 2012 [full citation in “References”, below] section 7.4.3.6).

To compute a confidence interval on an individual mean for the j th treatment, use

standard error = √ MS W / n j

Therefore the margin of error, which is the half-width of the confidence interval, is

E = t(α/2, df W ) · √ MS W / n j

Example : Refer back to the fats for frying donuts . Estimate the population mean for Fat 2 with 95% confidence? In other words, if you fried a great many batches of donuts in Fat 2, how much fat per batch would be absorbed, on average?

Solution : First, marshal your data:

sample mean for Fat 2: x̅ 2 = 85

sample size: n 2 = 6

degrees of freedom: df W = 20 (from the ANOVA table )

MS W = 100.9 (also from the table)

TI-83 or TI-84 users , please see an easy procedure below .

Computation by Hand

Begin by finding the critical t. Since 1−α = 0.95, α/2 = 0.025. You therefore need t(0.025,20). You can find this from a table:

t(0.025,20) = 2.0860

Next, find the standard error. This is

standard error = √ MS W / n j = √ 100.9/6 = 4.1008

Now you’re ready to finish the confidence interval. The margin of error is

E = t(α/2, df ) · √ MS W / n j = 2.0860×4.1008 = 8.5541

Therefore the confidence interval is

μ 2 = 85 ± 8.6 g (95% confidence)

76.4 g ≤ μ 2 ≤ 93.6 g (95% confidence)

Conclusion : You’re 95% confident that the true mean amount of fat absorbed by a batch of donuts fried in Fat 2 is between 76.4 g and 93.6 g.

TI-83/84 Procedure

Your TI calculator is set up to do the necessary calculations, but there’s one glitch because the degrees of freedom is not based on the size of the individual sample, as it is in a regular t interval. So you have to “spoof” the calculator as follows.

Press [ STAT ] [ ◄ ] [ 8 ] to bring up the TInterval screen. First I’ll tell you what to enter; then I’ll explain why.

x̅ : mean of the treatment sample, here 85
Sx: √ MS W *( df W +1)/ n j , here √ 100.9*21/6
n : df W +1, here 21
C-Level: as specified in the problem, here .95

Now, what’s up with n and Sx? Well, the calculator uses n to compute degrees of freedom for critical t as n −1. You want degrees of freedom to be df W , so you lie to the calculator and enter the value of n as df W +1 (20+1 = 21).

But that creates a new problem. The calculator also divides s by √ n to come up with the standard error. But you want it to use n j (6) and not your fake n (21). So you have to multiply MS W by df W +1 and divide by n j to trick the calculator into using the value you actually want.

By the way, why is MS W inside the square root sign? Because the calculator wants a standard deviation, but MS W is a variance. As you know, standard deviation is the square root of variance.

Lowry 1988 [full citation in “References”, below] chapter 14 part 2 mentions a measure that is usually neglected in ANOVA: η². (η is the Greek letter eta, which rhymes with beta.)

η² = SS B /SS tot , the ratio of sum of squares between groups to total sum of squares. For the donut-frying example ,

η² = SS B /SS tot = 1636.5 / 3654.5 = 0.45

What does this tell you? η² measures how much of the total variability in the dependent variable is associated with the variation in treatments. For the donut example, η² = 0.45 tells you that 45% of the variability in fat absorption among the batches is associated with the choice of fat.

Updated links to references here and here .
20 Oct 2020 : Improved rendering of square roots of formulas. Italicized variable names. Converted page from HTML 4.01 to HTML5.
(intervening changes suppressed)
31 Jan 2009 : First publication.

Updates and new info: https://BrownMath.com/stat/

Site Map | Searches | Home Page | Contact

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

11.4 One-Way ANOVA and Hypothesis Tests for Three or More Population Means

Learning objectives.

Conduct and interpret hypothesis tests for three or more population means using one-way ANOVA.

The purpose of a one-way ANOVA (analysis of variance) test is to determine the existence of a statistically significant difference among the means of three or more populations. The test actually uses variances to help determine if the population means are equal or not.

Throughout this section, we will use subscripts to identify the values for the means, sample sizes, and standard deviations for the populations:

[latex]k[/latex] is the number of populations under study, [latex]n[/latex] is the total number of observations in all of the samples combined, and [latex]\overline{\overline{x}}[/latex] is the mean of the sample means.

[latex]\begin{eqnarray*} n & = & n_1+n_2+\cdots+n_k \\ \\ \overline{\overline{x}} & = & \frac{n_1 \times \overline{x}_1 +n_2 \times \overline{x}_2 +\cdots+n_k \times \overline{x}_k}{n} \end{eqnarray*}[/latex]

One-Way ANOVA

A predictor variable is called a factor or independent variable . For example age, temperature, and gender are factors. The groups or samples are often referred to as treatments . This terminology comes from the use of ANOVA procedures in medical and psychological research to determine if there is a difference in the effects of different treatments.

A local college wants to compare the mean GPA for players on four of its sports teams: basketball, baseball, hockey, and lacrosse. A random sample of players was taken from each team and their GPA recorded in the table below.

In this example, the factor is the sports team.

[latex]\begin{eqnarray*} k & = & 4 \\ \\ n & = & n_1+n_2+n_3+n_4 \\ & = & 5+5+5+5 \\ & = & 20 \\ \\ \overline{\overline{x}} & = & \frac{n_1 \times \overline{x}_1+n_2 \times \overline{x}_2+n_3 \times \overline{x}_3+n_4 \times \overline{x}_4}{n} \\ & = & \frac{5 \times 3.22+5 \times 3.02+5 \times 3+5 \times 2.94}{20} \\& = & 3.045 \end{eqnarray*}[/latex]

The following assumptions are required to use a one-way ANOVA test:

Each population from which a sample is taken is normally distributed.
All samples are randomly selected and independently taken from the populations.
The populations are assumed to have equal variances.
The population data is numerical (interval or ratio level).

The logic behind one-way ANOVA is to compare population means based on two independent estimates of the (assumed) equal variance [latex]\sigma^2[/latex] between the populations:

One estimate of the equal variance [latex]\sigma^2[/latex] is based on the variability among the sample means themselves (called the between-groups estimate of population variance).
One estimate of the equal variance [latex]\sigma^2[/latex] is based on the variability of the data within each sample (called the within-groups estimate of population variance).

The one-way ANOVA procedure compares these two estimates of the population variance [latex]\sigma^2[/latex] to determine if the population means are equal or if there is a difference in the population means. Because ANOVA involves the comparison of two estimates of variance, an [latex]F[/latex]-distribution is used to conduct the ANOVA test. The test statistic is an [latex]F[/latex]-score that is the ratio of the two estimates of population variance:

[latex]\displaystyle{F=\frac{\mbox{variance between groups}}{\mbox{variance within groups}}}[/latex]

The degrees of freedom for the [latex]F[/latex]-distribution are [latex]df_1=k-1[/latex] and [latex]df_2=n-k[/latex] where [latex]k[/latex] is the number of populations and [latex]n[/latex] is the total number of observations in all of the samples combined.

The variance between groups estimate of the population variance is called the mean square due to treatment , [latex]MST[/latex]. The [latex]MST[/latex] is the estimate of the population variance determined by the variance of the sample means from the overall sample mean [latex]\overline{\overline{x}}[/latex]. When the population means are equal, [latex]MST[/latex] provides an unbiased estimate of the population variance. When the population means are not equal, [latex]MST[/latex] provides an overestimate of the population variance.

[latex]\begin{eqnarray*} SST & = & n_1 \times (\overline{x}_1-\overline{\overline{x}})^2+n_2\times (\overline{x}_2-\overline{\overline{x}})^2+ \cdots +n_k \times (\overline{x}_k-\overline{\overline{x}})^2 \\ \\ MST & =& \frac{SST}{k-1} \end{eqnarray*}[/latex]

The variance within groups estimate of the population variance is called the mean square due to error , [latex]MSE[/latex]. The [latex]MSE[/latex] is the pooled estimate of the population variance using the sample variances as estimates for the population variance. The [latex]MSE[/latex] always provides an unbiased estimate of the population variance because it is not affected by whether or not the population means are equal.

[latex]\begin{eqnarray*} SSE & = & (n_1-1) \times s_1^2+ (n_2-1) \times s_2^2+ \cdots + (n_k-1) \times s_k^2\\ \\ MSE & =& \frac{SSE}{n -k} \end{eqnarray*}[/latex]

The one-way ANOVA test depends on the fact that the variance between groups [latex]MST[/latex] is influenced by differences between the population means, which results in [latex]MST[/latex] being either an unbiased or overestimate of the population variance. Because the variance within groups [latex]MSE[/latex] compares values of each group to its own group mean, [latex]MSE[/latex] is not affected by differences between the population means and is always an unbiased estimate of the population variance.

The null hypothesis in a one-way ANOVA test is that the population means are all equal and the alternative hypothesis is that there is a difference in the population means. The [latex]F[/latex]-score for the one-way ANOVA test is [latex]\displaystyle{F=\frac{MST}{MSE}}[/latex] with [latex]df_1=k-1[/latex] and [latex]df_2=n-k[/latex]. The p -value for the test is the area in the right tail of the [latex]F[/latex]-distribution, to the right of the [latex]F[/latex]-score.

When the variance between groups [latex]MST[/latex] and variance within groups [latex]MSE[/latex] are close in value, the [latex]F[/latex]-score is close to 1 and results in a large p -value. In this case, the conclusion is that the population means are equal.
When the variance between groups [latex]MST[/latex] is significantly larger than the variability within groups [latex]MSE[/latex], the [latex]F[/latex]-score is large and results in a small p -value. In this case, the conclusion is that there is a difference in the population means.

Steps to Conduct a Hypothesis Test for Three or More Population Means

Verify that the one-way ANOVA assumptions are met.

[latex]\begin{eqnarray*} \\ H_0: & & \mu_1=\mu_2=\cdots=\mu_k\end{eqnarray*}[/latex].

[latex]\begin{eqnarray*} \\ H_a: & & \mbox{at least one population mean is different from the others} \\ \\ \end{eqnarray*}[/latex]

Collect the sample information for the test and identify the significance level [latex]\alpha[/latex].

[latex]\begin{eqnarray*}F & = & \frac{MST}{MSE} \\ \\ df_1 & = & k-1 \\ \\ df_2 & = & n-k \\ \\ \end{eqnarray*}[/latex]

The results of the sample data are significant. There is sufficient evidence to conclude that the null hypothesis [latex]H_0[/latex] is an incorrect belief and that the alternative hypothesis [latex]H_a[/latex] is most likely correct.
The results of the sample data are not significant. There is not sufficient evidence to conclude that the alternative hypothesis [latex]H_a[/latex] may be correct.
Write down a concluding sentence specific to the context of the question.

Assume the populations are normally distributed and have equal variances. At the 5% significance level, is there a difference in the average GPA between the sports team.

Let basketball be population 1, let baseball be population 2, let hockey be population 3, and let lacrosse be population 4. From the question we have the following information:

Previously, we found [latex]k=4[/latex], [latex]n=20[/latex], and [latex]\overline{\overline{x}}=3.045[/latex].

Hypotheses:

[latex]\begin{eqnarray*} H_0: & & \mu_1=\mu_2=\mu_3=\mu_4 \\ H_a: & & \mbox{at least one population mean is different from the others} \end{eqnarray*}[/latex]

To calculate out the [latex]F[/latex]-score, we need to find [latex]MST[/latex] and [latex]MSE[/latex].

[latex]\begin{eqnarray*} SST & = & n_1 \times (\overline{x}_1-\overline{\overline{x}})^2+n_2\times (\overline{x}_2-\overline{\overline{x}})^2+n_3 \times (\overline{x}_3-\overline{\overline{x}})^2 +n_4 \times (\overline{x}_4-\overline{\overline{x}})^2\\ & = & 5 \times (3.22-3.045)^2+5 \times (3.02-3.045)^2+5 \times (3-3.045)^2 \\ & & +5 \times (2.94 -3.045)^2 \\ & = & 0.2215 \\ \\ MST & = & \frac{SST}{k-1} \\ & = & \frac{0.2215 }{4-1} \\ & = & 0.0738...\\ \\ SSE & = & (n_1-1) \times s_1^2+ (n_2-1) \times s_2^2+ (n_3-1) \times s_3^2+ (n_4-1) \times s_4^2\\ & = &( 5-1) \times 0.277+(5-1) \times 0.487+(5-1) \times 0.56 +(5-1)\times 0.623 \\ & = & 7.788 \\ \\ MSE & = & \frac{SSE}{n-k} \\ & = & \frac{7.788 }{20-4} \\ & = & 0.48675\end{eqnarray*}[/latex]

The p -value is the area in the right tail of the [latex]F[/latex]-distribution. To use the f.dist.rt function, we need to calculate out the [latex]F[/latex]-score and the degrees of freedom:

[latex]\begin{eqnarray*} F & = &\frac{MST}{MSE} \\ & = & \frac{0.0738...}{0.48675} \\ & = & 0.15168... \\ \\ df_1 & = & k-1 \\ & = & 4-1 \\ & = & 3 \\ \\df_2 & = & n-k \\ & = & 20-4 \\ & = & 16\end{eqnarray*}[/latex]

So the p -value[latex]=0.9271[/latex].

Conclusion:

Because p -value[latex]=0.9271 \gt 0.05=\alpha[/latex], we do not reject the null hypothesis. At the 5% significance level there is enough evidence to suggest that the mean GPA for the sports teams are the same.

The null hypothesis [latex]\mu_1=\mu_2=\mu_3=\mu_4[/latex] is the claim that the mean GPA for the sports teams are all equal.
The alternative hypothesis is the claim that at least one of the population means is not equal to the others. The alternative hypothesis does not say that all of the population means are not equal, only that at least one of them is not equal to the others.
The function is f.dist.rt because we are finding the area in the right tail of an [latex]F[/latex]-distribution.
Field 1 is the value of [latex]F[/latex].
Field 2 is the value of [latex]df_1[/latex].
Field 3 is the value of [latex]df_2[/latex].
The p -value of 0.9271 is a large probability compared to the significance level, and so is likely to happen assuming the null hypothesis is true. This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis. In other words, the population means are all equal.

ANOVA Summary Tables

The calculation of the [latex]MST[/latex], [latex]MSE[/latex], and the [latex]F[/latex]-score for a one-way ANOVA test can be time consuming, even with the help of software like Excel. However, Excel has a built-in one-way ANOVA summary table that not only generates the averages, variances, [latex]MST[/latex] and [latex]MSE[/latex], but also calculates the required [latex]F[/latex]-score and p -value for the test.

USING EXCEL TO CREATE A ONE-WAY ANOVA SUMMARY TABLE

In order to create a one-way ANOVA summary table, we need to use the Analysis ToolPak. Follow these instructions to add the Analysis ToolPak.

Enter the data into an Excel worksheet.
Go to the Data tab and click on Data Analysis . If you do not see Data Analysis in the Data tab, you will need to install the Analysis ToolPak.
In the Data Analysis window, select Anova: Single Factor . Click OK .
In the Inpu t range, enter the cell range for the data.
In the Grouped By box, select rows if your data is entered as rows (the default is columns).
Click on Labels in first row if the you included the column headings in the input range.
In the Alpha box, enter the significance level for the test.
From the Output Options , select the location where you want the output to appear.

This website provides additional information on using Excel to create a one-way ANOVA summary table.

Because we are using the p -value approach to hypothesis testing, it is not crucial that we enter the actual significance level we are using for the test. The p -value (the area in the right tail of the [latex]F[/latex]-distribution) is not affected by significance level. For the critical-value approach to hypothesis testing, we must enter the correct significance level for the test because the critical value does depend on the significance level.

Let basketball be population 1, let baseball be population 2, let hockey be population 3, and let lacrosse be population 4.

The ANOVA summary table generated by Excel is shown below:

The p -value for the test is in the P -value column of the between groups row . So the p -value[latex]=0.9271[/latex].

In the top part of the ANOVA summary table (under the Summary heading), we have the averages and variances for each of the groups (basketball, baseball, hockey, and lacrosse).
The value of [latex]SST[/latex] (in the SS column of the between groups row).
The value of [latex]MST[/latex] (in the MS column of the between group s row).
The value of [latex]SSE[/latex] (in the SS column of the within groups row).
The value of [latex]MSE[/latex] (in the MS column of the within groups row).
The value of the [latex]F[/latex]-score (in the F column of the between groups row).
The p -value (in the p -value column of the between groups row).

A fourth grade class is studying the environment. One of the assignments is to grow bean plants in different soils. Tommy chose to grow his bean plants in soil found outside his classroom mixed with dryer lint. Tara chose to grow her bean plants in potting soil bought at the local nursery. Nick chose to grow his bean plants in soil from his mother’s garden. No chemicals were used on the plants, only water. They were grown inside the classroom next to a large window. Each child grew five plants. At the end of the growing period, each plant was measured, producing the data (in inches) in the table below.

Assume the heights of the plants are normally distribution and have equal variance. At the 5% significance level, does it appear that the three media in which the bean plants were grown produced the same mean height?

Let Tommy’s plants be population 1, let Tara’s plants be population 2, and let Nick’s plants be population 3.

[latex]\begin{eqnarray*} H_0: & & \mu_1=\mu_2=\mu_3 \\ H_a: & & \mbox{at least one population mean is different from the others} \end{eqnarray*}[/latex]

So the p -value[latex]=0.8760[/latex].

Because p -value[latex]=0.8760 \gt 0.05=\alpha[/latex], we do not reject the null hypothesis. At the 5% significance level there is enough evidence to suggest that the mean heights of the plants grown in three media are the same.

The null hypothesis [latex]\mu_1=\mu_2=\mu_3[/latex] is the claim that the mean heights of the plants grown in the three different media are all equal.
The p -value of 0.8760 is a large probability compared to the significance level, and so is likely to happen assuming the null hypothesis is true. This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis. In other words, the population means are all equal.

A statistics professor wants to study the average GPA of students in four different programs: marketing, management, accounting, and human resources. The professor took a random sample of GPAs of students in those programs at the end of the past semester. The data is recorded in the table below.

Assume the GPAs of the students are normally distributed and have equal variance. At the 5% significance level, is there a difference in the average GPA of the students in the different programs?

Let marketing be population 1, let management be population 2, let accounting be population 3, and let human resources be population 4.

[latex]\begin{eqnarray*} H_0: & & \mu_1=\mu_2=\mu_3=\mu_4\\ H_a: & & \mbox{at least one population mean is different from the others} \end{eqnarray*}[/latex]

So the p -value[latex]=0.0462[/latex].

Because p -value[latex]=0.0462 \lt 0.05=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis. At the 5% significance level there is enough evidence to suggest that there is a difference in the average GPA of the students in the different programs.

A manufacturing company runs three different production lines to produce one of its products. The company wants to know if the average production rate is the same for the three lines. For each production line, a sample of eight hour shifts was taken and the number of items produced during each shift was recorded in the table below.

Assume the numbers of items produced on each line during an eight hour shift are normally distributed and have equal variance. At the 1% significance level, is there a difference in the average production rate for the three lines?

Let Line 1 be population 1, let Line 2 be population 2, and let Line 3 be population 3.

So the p -value[latex]=0.0073[/latex].

Because p -value[latex]=0.0073 \lt 0.01=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis. At the 1% significance level there is enough evidence to suggest that there is a difference in the average production rate of the three lines.

Concept Review

A one-way ANOVA hypothesis test determines if several population means are equal. In order to conduct a one-way ANOVA test, the following assumptions must be met:

Each population from which a sample is taken is assumed to be normal.
All samples are randomly selected and independent.

The analysis of variance procedure compares the variation between groups [latex]MST[/latex] to the variation within groups [latex]MSE[/latex]. The ratio of these two estimates of variance is the [latex]F[/latex]-score from an [latex]F[/latex]-distribution with [latex]df_1=k-1[/latex] and [latex]df_2=n-k[/latex]. The p -value for the test is the area in the right tail of the [latex]F[/latex]-distribution. The statistics used in an ANOVA test are summarized in the ANOVA summary table generated by Excel.

The one-way ANOVA hypothesis test for three or more population means is a well established process:

Write down the null and alternative hypotheses in terms of the population means. The null hypothesis is the claim that the population means are all equal and the alternative hypothesis is the claim that at least one of the population means is different from the others.
Collect the sample information for the test and identify the significance level.
The p -value is the area in the right tail of the [latex]F[/latex]-distribution. Use the ANOVA summary table generated by Excel to find the p -value.
Compare the p -value to the significance level and state the outcome of the test.

Attribution

“ 13.1 One-Way ANOVA “ and “ 13.2 The F Distribution and the F-Ratio “ in Introductory Statistics by OpenStax is licensed under a Creative Commons Attribution 4.0 International License .

One and Two Sample Tests and ANOVA

1
| 2
| 3
| 4
| 5
Contributing Authors:
Learning Objectives
One-Sample Tests
Parametric One-sample T-test
Boston Data and Assumption Checking
One-sample t-test for the mean μ
Non-parametric Wilcoxon Signed-Rank Test
Two-sample Tests
Two-sample Paired Test
Parametric Two-sample T-test
Comparison of Variance
Non-parametric Wilcoxon Test

Tests for More Than Two Samples

Parametric analysis of variance (anova), bmi for adults widget.

Non-parametric Kruskal-Wallis Test

Intro to R Contents

Common R Commands

In this section, we consider comparisons among more than two groups parametrically, using analysis of variance (ANOVA), as well as non-parametrically, using the Kruskal-Wallis test.

To test if the means are equal for more than two groups we perform an analysis of variance test. An ANOVA test will determine if the grouping variable explains a significant portion of the variability in the dependent variable. If so, we would expect that the mean of your dependent variable will be different in each group. The assumptions of an ANOVA test are as follows:

Independent observations
The dependent variable follows a normal distribution in each group
Equal variance of the dependent variable in each group

Here, we will use the Pima.tr dataset. According to National Heart Lung and Blood Institute (NHLBI) website ( http://www.nhlbisupport.com/bmi/ ), BMI can be classified into 4 categories:

Underweight: < 18.5
Normal weight: 18.5 ~ 24.9
Overweight: 25 ~ 29.9
Obesity: >= 30

Suppose we want to compare the means of plasma glucose concentration for our four BMI categories. We will conduct analysis of variance using bmi.cat variable as a factor.

> bmi.cat <- factor(bmi.cat)

> bmi.anova <- aov(glu ~ bmi.cat)

Before looking at the result, you may be interested in checking each category's glucose concentration average. One way it can be done is using the tapply() function. But alternatively, we can also use another function.

> print(model.tables(bmi.anova, "means"))

Tables of means

bmi.cat

Underweight/Normal weight Overweight Obesity

108.5 116.7 129.3

rep 25.0 43.0 132.0

Apparently, the glucose level varies in different categories. We can now request the ANOVA table for this analysis to check if the hypothesis testing result matches our observation in summary statistics.

> summary(bmi.anova)

Df Sum Sq Mean Sq F value Pr(>F)

bmi.cat 2 11984 5992 6.2932 0.002242 **

Residuals 197 187575 952

H 0 : The mean glucose is equal for all levels of bmi categories.
H a : At least one of the bmi categories has a mean glucose that is not the same as the other bmi categories.

We see that we reject the null hypothesis that the mean glucose is equal for all levels of bmi categories (F 2,197 = 6.29, p-value = 0.002242). The plasma glucose concentration means in at least two categories are significantly different.

Performing many tests will increase the probability of finding one of them to be significant; that is, the p-values tend to be exaggerated (our type I error rate increases). A common adjustment method is the Bonferroni correction, which adjusts for multiple comparisons by changing the level of significance α for each test to α / (# of tests). Thus, if we were performing 10 tests to maintain a level of significance α of 0.05 we adjust for multiple testing using the Bonferroni correction by using 0.05/10 = 0.005 as our new level of significance.

A function called pairwise.t.test computes all possible two-group comparisons.

> pairwise.t.test(glu, bmi.cat, p.adj = "none")

Pairwise comparisons using t tests with pooled SD

data: glu and bmi.cat

Underweight/Normalweight Overweight

Overweight 0.2910 -

Obesity 0.0023 0.0213

P value adjustment method: none

From this result we reject the null hypothesis that the mean glucose for those who are obese is equal to the mean glucose for those who are underweight/normal weight (p-value = 0.0023). We also reject the null hypothesis that the mean glucose for those who are obese is equal to the mean glucose for those who are overweight (p-value = 0.0213). We fail to reject the null hypothesis that the mean glucose for those who are overweight is equal to the mean glucose for those who are underweight (p-value = 0.2910).

We can also make adjustments for multiple comparisons, like so:

> pairwise.t.test(glu, bmi.cat, p.adj = "bonferroni")

Underweight/Normal weight Overweight

Overweight 0.8729 -

Obesity 0.0069 0.0639

P value adjustment method: bonferroni

However, the Bonferroni correction is very conservative. Here, we introduce an alternative multiple comparison approach using Tukey's procedure:

> TukeyHSD(bmi.anova)

Tukey multiple comparisons of means

95% family-wise confidence level

Fit: aov(formula = glu ~ bmi.cat)

$bmi.cat

diff lwr upr p adj

Overweight-Underweight/Normalweight 8.217674 -10.1099039 26.54525 0.5407576

Obesity-Underweight/Normal weight 20.792727 4.8981963 36.68726 0.0064679

Obesity-Overweight 12.575053 -0.2203125 25.37042 0.0552495

From the pairwise comparison, what do we find regarding the plasma glucose in the different weight categories?

It is important to note that when testing the assumptions of an ANOVA, the var.test function can only be performed for two groups at a time. To look at the assumption of equal variance for more than two groups, we can use side-by-side boxplots:

> boxplot(glu~bmi.cat)

To determine whether or not the assumption of equal variance is met we look to see if the spread is equal for each of the groups.

We can also conduct a formal test for homogeneity of variances when we have more than two groups. This test is called Bartlett's Test , which assumes normality. The procedure is performed as follows:

> bartlett.test(glu~bmi.cat)

Bartlett test of homogeneity of variances

data: glu by bmi.cat

Bartlett's K-squared = 3.6105, df = 2, p-value = 0.1644

H 0 : The variability in glucose is equal for all bmi categories.

H a : The variability in glucose is not equal for all bmi categories.

We fail to reject the null hypothesis that the variability in glucose is equal for all bmi categories (Bartlett's K-squared = 3.6105, df = 2, p-value = 0.1644).

return to top | previous page | next page

Creative Commons license Attribution Non-commercial

Multiple Hypothesis Testing

Reference work entry
pp 1468–1469
Cite this reference work entry

Roger Higdon 5

2801 Accesses

1 Citations

Multiple comparisons ; Multiple testing

The multiple hypothesis testing problem occurs when a number of individual hypothesis tests are considered simultaneously. In this case, the significance or the error rate of individual tests no longer represents the error rate of the combined set of tests. Multiple hypothesis testing methods correct error rates for this issue.

Characteristics

In conventional hypothesis testing, the level of significance or type I error rate (the probability of wrongly rejecting the null hypothesis) for a single test is less than the probability of making an error on at least one test in a multiple hypothesis testing situation. While this is typically not an issue when testing a small number of preplanned hypotheses, the likelihood of making false discoveries is greatly increased when there are large numbers of unplanned or exploratory tests conducted based on the significance level or type I error rate from a single test. Therefore, it is...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime
Available as PDF
Read on any device
Instant download
Own it forever
Available as EPUB and PDF
Durable hardcover edition
Dispatched in 3 to 5 business days
Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Barrett T et al (2007) NCBI GEO: mining tens of millions of expression profiles–database and tools update. Nucleic Acids Res 35:D760–D765

Article CAS PubMed Google Scholar

Dudoit S et al (2003) Multiple hypothesis testing in microarray experiments. Stat Sci 18:71–103

Article Google Scholar

Higdon R, van Belle G, Kolker E (2008) A note on the false discovery rate and inconsistent comparison between experiments. Bioinformatics 24:1225–1228

Hochberg Y, Tamahane A (2009) Multiple comparison procedures. Wiley, New York

Google Scholar

Download references

Author information

Authors and affiliations.

Seattle Children’s Research Institute, 1900 9th Ave, C9S-9, 98101, Seattle, WA, USA

Dr. Roger Higdon

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roger Higdon .

Editor information

Editors and affiliations.

Biomedical Sciences Research Institute, University of Ulster, Coleraine, UK

Werner Dubitzky

Department of Computer Science, University of Rostock, Rostock, Germany

Olaf Wolkenhauer

Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea

Kwang-Hyun Cho

Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA

Hiroki Yokota

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry.

Higdon, R. (2013). Multiple Hypothesis Testing. In: Dubitzky, W., Wolkenhauer, O., Cho, KH., Yokota, H. (eds) Encyclopedia of Systems Biology. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9863-7_1211

Download citation

DOI : https://doi.org/10.1007/978-1-4419-9863-7_1211

Publisher Name : Springer, New York, NY

Print ISBN : 978-1-4419-9862-0

Online ISBN : 978-1-4419-9863-7

eBook Packages : Biomedical and Life Sciences Reference Module Biomedical and Life Sciences

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Publish with us

Policies and ethics

Find a journal
Track your research

IMAGES

Hypothesis Testing : Infographics
Hypothesis testing Infographics by: Mariz Turdanes
What is Hypothesis Testing? Types and Methods
Everything You Need To Know about Hypothesis Testing
PPT
Hypothesis Testing

VIDEO

Hypothesis Test
Hypothesis Test Overview

COMMENTS

Comparing More Than Two Means: One-Way ANOVA - BrownMath.com
So you have to find a way to test all the pairs of means at the same time, in one test. The solution is an extension of the t test to multiple samples, and it’s called ANOVA. (If you have only two treatments, ANOVA computes the same p-value as a two-sample t test, but at the cost of extra effort.)
11.4 One-Way ANOVA and Hypothesis Tests for Three or More ...
Conduct and interpret hypothesis tests for three or more population means using one-way ANOVA. The purpose of a one-way ANOVA (analysis of variance) test is to determine the existence of a statistically significant difference among the means of three or more populations.
10.3 - Multiple Comparisons | STAT 500 - Statistics Online
Multiple comparisons conducts an analysis of all possible pairwise means. For example, with three brands of cigarettes, A, B, and C, if the ANOVA test was significant, then multiple comparison methods would compare the three possible pairwise comparisons:
5.5: Comparing many Means with ANOVA (Special Topic)
In this section, we will learn a new method called analysis of variance (ANOVA) and a new test statistic called F. ANOVA uses a single hypothesis test to check whether the means across many groups are equal: H 0: The mean outcome is the same across all groups.
Tests for More Than Two Samples - Boston University School of ...
In this section, we consider comparisons among more than two groups parametrically, using analysis of variance (ANOVA), as well as non-parametrically, using the Kruskal-Wallis test.
Hypothesis Testing | A Step-by-Step Guide with Easy Examples
There are 5 main steps in hypothesis testing: State your research hypothesis as a null hypothesis and alternate hypothesis (H o) and (H a or H 1). Collect data in a way designed to test the hypothesis. Perform an appropriate statistical test. Decide whether to reject or fail to reject your null hypothesis.
Multiple Hypothesis Testing - SpringerLink
Definition. The multiple hypothesis testing problem occurs when a number of individual hypothesis tests are considered simultaneously. In this case, the significance or the error rate of individual tests no longer represents the error rate of the combined set of tests.
Lecture 10: Multiple Testing - University of Washington
When people say “adjusting p-values for the number of hypothesis tests performed” what they mean is. controlling the Type I error rate. Very active area of statistics - many different methods have been described. Although these varied approaches have the same goal, they go about it in fundamentally different ways.
Multiple Hypothesis Tests - Statpower
The primary hypotheses in a testing situation are the elements of the universal set of all hypotheses of interest. Closure. The closure of a set of hypotheses is the collection of the original set plus all distinct hypotheses formed by intersections of the hypotheses in the original set.
Lecture 11: Multiple hypothesis test - UW Faculty Web Server
The multiple hypothesis testing is the scenario that we are conducting several hypothesis tests at the same time. Suppose we have ntests, each leads to a p-value.