t-test Calculator

Table of contents

Welcome to our t-test calculator! Here you can not only easily perform one-sample t-tests , but also two-sample t-tests , as well as paired t-tests .

Do you prefer to find the p-value from t-test, or would you rather find the t-test critical values? Well, this t-test calculator can do both! 😊

What does a t-test tell you? Take a look at the text below, where we explain what actually gets tested when various types of t-tests are performed. Also, we explain when to use t-tests (in particular, whether to use the z-test vs. t-test) and what assumptions your data should satisfy for the results of a t-test to be valid. If you've ever wanted to know how to do a t-test by hand, we provide the necessary t-test formula, as well as tell you how to determine the number of degrees of freedom in a t-test.

When to use a t-test?

A t-test is one of the most popular statistical tests for location , i.e., it deals with the population(s) mean value(s).

There are different types of t-tests that you can perform:

  • A one-sample t-test;
  • A two-sample t-test; and
  • A paired t-test.

In the next section , we explain when to use which. Remember that a t-test can only be used for one or two groups . If you need to compare three (or more) means, use the analysis of variance ( ANOVA ) method.

The t-test is a parametric test, meaning that your data has to fulfill some assumptions :

  • The data points are independent; AND
  • The data, at least approximately, follow a normal distribution .

If your sample doesn't fit these assumptions, you can resort to nonparametric alternatives. Visit our Mann–Whitney U test calculator or the Wilcoxon rank-sum test calculator to learn more. Other possibilities include the Wilcoxon signed-rank test or the sign test.

Which t-test?

Your choice of t-test depends on whether you are studying one group or two groups:

One sample t-test

Choose the one-sample t-test to check if the mean of a population is equal to some pre-set hypothesized value .

The average volume of a drink sold in 0.33 l cans — is it really equal to 330 ml?

The average weight of people from a specific city — is it different from the national average?

Two-sample t-test

Choose the two-sample t-test to check if the difference between the means of two populations is equal to some pre-determined value when the two samples have been chosen independently of each other.

In particular, you can use this test to check whether the two groups are different from one another .

The average difference in weight gain in two groups of people: one group was on a high-carb diet and the other on a high-fat diet.

The average difference in the results of a math test from students at two different universities.

This test is sometimes referred to as an independent samples t-test , or an unpaired samples t-test .

Paired t-test

A paired t-test is used to investigate the change in the mean of a population before and after some experimental intervention , based on a paired sample, i.e., when each subject has been measured twice: before and after treatment.

In particular, you can use this test to check whether, on average, the treatment has had any effect on the population .

The change in student test performance before and after taking a course.

The change in blood pressure in patients before and after administering some drug.

How to do a t-test?

So, you've decided which t-test to perform. These next steps will tell you how to calculate the p-value from t-test or its critical values, and then which decision to make about the null hypothesis.

Decide on the alternative hypothesis :

Use a two-tailed t-test if you only care whether the population's mean (or, in the case of two populations, the difference between the populations' means) agrees or disagrees with the pre-set value.

Use a one-tailed t-test if you want to test whether this mean (or difference in means) is greater/less than the pre-set value.

Compute your T-score value :

Formulas for the test statistic in t-tests include the sample size , as well as its mean and standard deviation . The exact formula depends on the t-test type — check the sections dedicated to each particular test for more details.

Determine the degrees of freedom for the t-test:

The degrees of freedom are the number of observations in a sample that are free to vary as we estimate statistical parameters. In the simplest case, the number of degrees of freedom equals your sample size minus the number of parameters you need to estimate . Again, the exact formula depends on the t-test you want to perform — check the sections below for details.

The degrees of freedom are essential, as they determine the distribution followed by your T-score (under the null hypothesis). If there are d degrees of freedom, then the distribution of the test statistics is the t-Student distribution with d degrees of freedom . This distribution has a shape similar to N(0,1) (bell-shaped and symmetric) but has heavier tails . If the number of degrees of freedom is large (>30), which generically happens for large samples, the t-Student distribution is practically indistinguishable from N(0,1).

💡 The t-Student distribution owes its name to William Sealy Gosset, who, in 1908, published his paper on the t-test under the pseudonym "Student". Gosset worked at the famous Guinness Brewery in Dublin, Ireland, and devised the t-test as an economical way to monitor the quality of beer. Cheers! 🍺🍺🍺

p-value from t-test

Recall that the p-value is the probability (calculated under the assumption that the null hypothesis is true) that the test statistic will produce values at least as extreme as the T-score produced for your sample . As probabilities correspond to areas under the density function, p-value from t-test can be nicely illustrated with the help of the following pictures:

p-value from t-test

The following formulae say how to calculate p-value from t-test. By cdf t,d we denote the cumulative distribution function of the t-Student distribution with d degrees of freedom:

p-value from left-tailed t-test:

p-value = cdf t,d (t score )

p-value from right-tailed t-test:

p-value = 1 − cdf t,d (t score )

p-value from two-tailed t-test:

p-value = 2 × cdf t,d (−|t score |)

or, equivalently: p-value = 2 − 2 × cdf t,d (|t score |)

However, the cdf of the t-distribution is given by a somewhat complicated formula. To find the p-value by hand, you would need to resort to statistical tables, where approximate cdf values are collected, or to specialized statistical software. Fortunately, our t-test calculator determines the p-value from t-test for you in the blink of an eye!

t-test critical values

Recall, that in the critical values approach to hypothesis testing, you need to set a significance level, α, before computing the critical values , which in turn give rise to critical regions (a.k.a. rejection regions).

Formulas for critical values employ the quantile function of t-distribution, i.e., the inverse of the cdf :

Critical value for left-tailed t-test: cdf t,d -1 (α)

critical region:

(-∞, cdf t,d -1 (α)]

Critical value for right-tailed t-test: cdf t,d -1 (1-α)

[cdf t,d -1 (1-α), ∞)

Critical values for two-tailed t-test: ±cdf t,d -1 (1-α/2)

(-∞, -cdf t,d -1 (1-α/2)] ∪ [cdf t,d -1 (1-α/2), ∞)

To decide the fate of the null hypothesis, just check if your T-score lies within the critical region:

If your T-score belongs to the critical region , reject the null hypothesis and accept the alternative hypothesis.

If your T-score is outside the critical region , then you don't have enough evidence to reject the null hypothesis.

How to use our t-test calculator

Choose the type of t-test you wish to perform:

A one-sample t-test (to test the mean of a single group against a hypothesized mean);

A two-sample t-test (to compare the means for two groups); or

A paired t-test (to check how the mean from the same group changes after some intervention).

Two-tailed;

Left-tailed; or

Right-tailed.

This t-test calculator allows you to use either the p-value approach or the critical regions approach to hypothesis testing!

Enter your T-score and the number of degrees of freedom . If you don't know them, provide some data about your sample(s): sample size, mean, and standard deviation, and our t-test calculator will compute the T-score and degrees of freedom for you .

Once all the parameters are present, the p-value, or critical region, will immediately appear underneath the t-test calculator, along with an interpretation!

One-sample t-test

The null hypothesis is that the population mean is equal to some value μ 0 \mu_0 μ 0 ​ .

The alternative hypothesis is that the population mean is:

  • different from μ 0 \mu_0 μ 0 ​ ;
  • smaller than μ 0 \mu_0 μ 0 ​ ; or
  • greater than μ 0 \mu_0 μ 0 ​ .

One-sample t-test formula :

  • μ 0 \mu_0 μ 0 ​ — Mean postulated in the null hypothesis;
  • n n n — Sample size;
  • x ˉ \bar{x} x ˉ — Sample mean; and
  • s s s — Sample standard deviation.

Number of degrees of freedom in t-test (one-sample) = n − 1 n-1 n − 1 .

The null hypothesis is that the actual difference between these groups' means, μ 1 \mu_1 μ 1 ​ , and μ 2 \mu_2 μ 2 ​ , is equal to some pre-set value, Δ \Delta Δ .

The alternative hypothesis is that the difference μ 1 − μ 2 \mu_1 - \mu_2 μ 1 ​ − μ 2 ​ is:

  • Different from Δ \Delta Δ ;
  • Smaller than Δ \Delta Δ ; or
  • Greater than Δ \Delta Δ .

In particular, if this pre-determined difference is zero ( Δ = 0 \Delta = 0 Δ = 0 ):

The null hypothesis is that the population means are equal.

The alternate hypothesis is that the population means are:

  • μ 1 \mu_1 μ 1 ​ and μ 2 \mu_2 μ 2 ​ are different from one another;
  • μ 1 \mu_1 μ 1 ​ is smaller than μ 2 \mu_2 μ 2 ​ ; and
  • μ 1 \mu_1 μ 1 ​ is greater than μ 2 \mu_2 μ 2 ​ .

Formally, to perform a t-test, we should additionally assume that the variances of the two populations are equal (this assumption is called the homogeneity of variance ).

There is a version of a t-test that can be applied without the assumption of homogeneity of variance: it is called a Welch's t-test . For your convenience, we describe both versions.

Two-sample t-test if variances are equal

Use this test if you know that the two populations' variances are the same (or very similar).

Two-sample t-test formula (with equal variances) :

where s p s_p s p ​ is the so-called pooled standard deviation , which we compute as:

  • Δ \Delta Δ — Mean difference postulated in the null hypothesis;
  • n 1 n_1 n 1 ​ — First sample size;
  • x ˉ 1 \bar{x}_1 x ˉ 1 ​ — Mean for the first sample;
  • s 1 s_1 s 1 ​ — Standard deviation in the first sample;
  • n 2 n_2 n 2 ​ — Second sample size;
  • x ˉ 2 \bar{x}_2 x ˉ 2 ​ — Mean for the second sample; and
  • s 2 s_2 s 2 ​ — Standard deviation in the second sample.

Number of degrees of freedom in t-test (two samples, equal variances) = n 1 + n 2 − 2 n_1 + n_2 - 2 n 1 ​ + n 2 ​ − 2 .

Two-sample t-test if variances are unequal (Welch's t-test)

Use this test if the variances of your populations are different.

Two-sample Welch's t-test formula if variances are unequal:

  • s 1 s_1 s 1 ​ — Standard deviation in the first sample;
  • s 2 s_2 s 2 ​ — Standard deviation in the second sample.

The number of degrees of freedom in a Welch's t-test (two-sample t-test with unequal variances) is very difficult to count. We can approximate it with the help of the following Satterthwaite formula :

Alternatively, you can take the smaller of n 1 − 1 n_1 - 1 n 1 ​ − 1 and n 2 − 1 n_2 - 1 n 2 ​ − 1 as a conservative estimate for the number of degrees of freedom.

🔎 The Satterthwaite formula for the degrees of freedom can be rewritten as a scaled weighted harmonic mean of the degrees of freedom of the respective samples: n 1 − 1 n_1 - 1 n 1 ​ − 1 and n 2 − 1 n_2 - 1 n 2 ​ − 1 , and the weights are proportional to the standard deviations of the corresponding samples.

As we commonly perform a paired t-test when we have data about the same subjects measured twice (before and after some treatment), let us adopt the convention of referring to the samples as the pre-group and post-group.

The null hypothesis is that the true difference between the means of pre- and post-populations is equal to some pre-set value, Δ \Delta Δ .

The alternative hypothesis is that the actual difference between these means is:

Typically, this pre-determined difference is zero. We can then reformulate the hypotheses as follows:

The null hypothesis is that the pre- and post-means are the same, i.e., the treatment has no impact on the population .

The alternative hypothesis:

  • The pre- and post-means are different from one another (treatment has some effect);
  • The pre-mean is smaller than the post-mean (treatment increases the result); or
  • The pre-mean is greater than the post-mean (treatment decreases the result).

Paired t-test formula

In fact, a paired t-test is technically the same as a one-sample t-test! Let us see why it is so. Let x 1 , . . . , x n x_1, ... , x_n x 1 ​ , ... , x n ​ be the pre observations and y 1 , . . . , y n y_1, ... , y_n y 1 ​ , ... , y n ​ the respective post observations. That is, x i , y i x_i, y_i x i ​ , y i ​ are the before and after measurements of the i -th subject.

For each subject, compute the difference, d i : = x i − y i d_i := x_i - y_i d i ​ := x i ​ − y i ​ . All that happens next is just a one-sample t-test performed on the sample of differences d 1 , . . . , d n d_1, ... , d_n d 1 ​ , ... , d n ​ . Take a look at the formula for the T-score :

Δ \Delta Δ — Mean difference postulated in the null hypothesis;

n n n — Size of the sample of differences, i.e., the number of pairs;

x ˉ \bar{x} x ˉ — Mean of the sample of differences; and

s s s  — Standard deviation of the sample of differences.

Number of degrees of freedom in t-test (paired): n − 1 n - 1 n − 1

t-test vs Z-test

We use a Z-test when we want to test the population mean of a normally distributed dataset, which has a known population variance . If the number of degrees of freedom is large, then the t-Student distribution is very close to N(0,1).

Hence, if there are many data points (at least 30), you may swap a t-test for a Z-test, and the results will be almost identical. However, for small samples with unknown variance, remember to use the t-test because, in such cases, the t-Student distribution differs significantly from the N(0,1)!

🙋 Have you concluded you need to perform the z-test? Head straight to our z-test calculator !

What is a t-test?

A t-test is a widely used statistical test that analyzes the means of one or two groups of data. For instance, a t-test is performed on medical data to determine whether a new drug really helps.

What are different types of t-tests?

Different types of t-tests are:

  • One-sample t-test;
  • Two-sample t-test; and
  • Paired t-test.

How to find the t value in a one sample t-test?

To find the t-value:

  • Subtract the null hypothesis mean from the sample mean value.
  • Divide the difference by the standard deviation of the sample.
  • Multiply the resultant with the square root of the sample size.

.css-slt4t3.css-slt4t3{color:#2B3148;background-color:transparent;font-family:"Roboto","Helvetica","Arial",sans-serif;font-size:20px;line-height:24px;overflow:visible;padding-top:0px;position:relative;}.css-slt4t3.css-slt4t3:after{content:'';-webkit-transform:scale(0);-moz-transform:scale(0);-ms-transform:scale(0);transform:scale(0);position:absolute;border:2px solid #EA9430;border-radius:2px;inset:-8px;z-index:1;}.css-slt4t3 .js-external-link-button.link-like,.css-slt4t3 .js-external-link-anchor{color:inherit;border-radius:1px;-webkit-text-decoration:underline;text-decoration:underline;}.css-slt4t3 .js-external-link-button.link-like:hover,.css-slt4t3 .js-external-link-anchor:hover,.css-slt4t3 .js-external-link-button.link-like:active,.css-slt4t3 .js-external-link-anchor:active{text-decoration-thickness:2px;text-shadow:1px 0 0;}.css-slt4t3 .js-external-link-button.link-like:focus-visible,.css-slt4t3 .js-external-link-anchor:focus-visible{outline:transparent 2px dotted;box-shadow:0 0 0 2px #6314E6;}.css-slt4t3 p,.css-slt4t3 div{margin:0px;display:block;}.css-slt4t3 pre{margin:0px;display:block;}.css-slt4t3 pre code{display:block;width:-webkit-fit-content;width:-moz-fit-content;width:fit-content;}.css-slt4t3 pre:not(:first-child){padding-top:8px;}.css-slt4t3 ul,.css-slt4t3 ol{display:block margin:0px;padding-left:20px;}.css-slt4t3 ul li,.css-slt4t3 ol li{padding-top:8px;}.css-slt4t3 ul ul,.css-slt4t3 ol ul,.css-slt4t3 ul ol,.css-slt4t3 ol ol{padding-top:0px;}.css-slt4t3 ul:not(:first-child),.css-slt4t3 ol:not(:first-child){padding-top:4px;} .css-4okk7a{margin:auto;background-color:white;overflow:auto;overflow-wrap:break-word;word-break:break-word;}.css-4okk7a code,.css-4okk7a kbd,.css-4okk7a pre,.css-4okk7a samp{font-family:monospace;}.css-4okk7a code{padding:2px 4px;color:#444;background:#ddd;border-radius:4px;}.css-4okk7a figcaption,.css-4okk7a caption{text-align:center;}.css-4okk7a figcaption{font-size:12px;font-style:italic;overflow:hidden;}.css-4okk7a h3{font-size:1.75rem;}.css-4okk7a h4{font-size:1.5rem;}.css-4okk7a .mathBlock{font-size:24px;-webkit-padding-start:4px;padding-inline-start:4px;}.css-4okk7a .mathBlock .katex{font-size:24px;text-align:left;}.css-4okk7a .math-inline{background-color:#f0f0f0;display:inline-block;font-size:inherit;padding:0 3px;}.css-4okk7a .videoBlock,.css-4okk7a .imageBlock{margin-bottom:16px;}.css-4okk7a .imageBlock__image-align--left,.css-4okk7a .videoBlock__video-align--left{float:left;}.css-4okk7a .imageBlock__image-align--right,.css-4okk7a .videoBlock__video-align--right{float:right;}.css-4okk7a .imageBlock__image-align--center,.css-4okk7a .videoBlock__video-align--center{display:block;margin-left:auto;margin-right:auto;clear:both;}.css-4okk7a .imageBlock__image-align--none,.css-4okk7a .videoBlock__video-align--none{clear:both;margin-left:0;margin-right:0;}.css-4okk7a .videoBlock__video--wrapper{position:relative;padding-bottom:56.25%;height:0;}.css-4okk7a .videoBlock__video--wrapper iframe{position:absolute;top:0;left:0;width:100%;height:100%;}.css-4okk7a .videoBlock__caption{text-align:left;}@font-face{font-family:'KaTeX_AMS';src:url(/katex-fonts/KaTeX_AMS-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_AMS-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_AMS-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Caligraphic';src:url(/katex-fonts/KaTeX_Caligraphic-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_Caligraphic-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_Caligraphic-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_Caligraphic';src:url(/katex-fonts/KaTeX_Caligraphic-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Caligraphic-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Caligraphic-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Fraktur';src:url(/katex-fonts/KaTeX_Fraktur-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_Fraktur-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_Fraktur-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_Fraktur';src:url(/katex-fonts/KaTeX_Fraktur-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Fraktur-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Fraktur-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_Main-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-BoldItalic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-BoldItalic.woff) format('woff'),url(/katex-fonts/KaTeX_Main-BoldItalic.ttf) format('truetype');font-weight:bold;font-style:italic;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-Italic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-Italic.woff) format('woff'),url(/katex-fonts/KaTeX_Main-Italic.ttf) format('truetype');font-weight:normal;font-style:italic;}@font-face{font-family:'KaTeX_Main';src:url(/katex-fonts/KaTeX_Main-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Main-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Main-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Math';src:url(/katex-fonts/KaTeX_Math-BoldItalic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Math-BoldItalic.woff) format('woff'),url(/katex-fonts/KaTeX_Math-BoldItalic.ttf) format('truetype');font-weight:bold;font-style:italic;}@font-face{font-family:'KaTeX_Math';src:url(/katex-fonts/KaTeX_Math-Italic.woff2) format('woff2'),url(/katex-fonts/KaTeX_Math-Italic.woff) format('woff'),url(/katex-fonts/KaTeX_Math-Italic.ttf) format('truetype');font-weight:normal;font-style:italic;}@font-face{font-family:'KaTeX_SansSerif';src:url(/katex-fonts/KaTeX_SansSerif-Bold.woff2) format('woff2'),url(/katex-fonts/KaTeX_SansSerif-Bold.woff) format('woff'),url(/katex-fonts/KaTeX_SansSerif-Bold.ttf) format('truetype');font-weight:bold;font-style:normal;}@font-face{font-family:'KaTeX_SansSerif';src:url(/katex-fonts/KaTeX_SansSerif-Italic.woff2) format('woff2'),url(/katex-fonts/KaTeX_SansSerif-Italic.woff) format('woff'),url(/katex-fonts/KaTeX_SansSerif-Italic.ttf) format('truetype');font-weight:normal;font-style:italic;}@font-face{font-family:'KaTeX_SansSerif';src:url(/katex-fonts/KaTeX_SansSerif-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_SansSerif-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_SansSerif-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Script';src:url(/katex-fonts/KaTeX_Script-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Script-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Script-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size1';src:url(/katex-fonts/KaTeX_Size1-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size1-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size1-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size2';src:url(/katex-fonts/KaTeX_Size2-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size2-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size2-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size3';src:url(/katex-fonts/KaTeX_Size3-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size3-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size3-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Size4';src:url(/katex-fonts/KaTeX_Size4-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Size4-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Size4-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}@font-face{font-family:'KaTeX_Typewriter';src:url(/katex-fonts/KaTeX_Typewriter-Regular.woff2) format('woff2'),url(/katex-fonts/KaTeX_Typewriter-Regular.woff) format('woff'),url(/katex-fonts/KaTeX_Typewriter-Regular.ttf) format('truetype');font-weight:normal;font-style:normal;}.css-4okk7a .katex{font:normal 1.21em KaTeX_Main,Times New Roman,serif;line-height:1.2;text-indent:0;text-rendering:auto;}.css-4okk7a .katex *{-ms-high-contrast-adjust:none!important;border-color:currentColor;}.css-4okk7a .katex .katex-version::after{content:'0.13.13';}.css-4okk7a .katex .katex-mathml{position:absolute;clip:rect(1px, 1px, 1px, 1px);padding:0;border:0;height:1px;width:1px;overflow:hidden;}.css-4okk7a .katex .katex-html>.newline{display:block;}.css-4okk7a .katex .base{position:relative;display:inline-block;white-space:nowrap;width:-webkit-min-content;width:-moz-min-content;width:-webkit-min-content;width:-moz-min-content;width:min-content;}.css-4okk7a .katex .strut{display:inline-block;}.css-4okk7a .katex .textbf{font-weight:bold;}.css-4okk7a .katex .textit{font-style:italic;}.css-4okk7a .katex .textrm{font-family:KaTeX_Main;}.css-4okk7a .katex .textsf{font-family:KaTeX_SansSerif;}.css-4okk7a .katex .texttt{font-family:KaTeX_Typewriter;}.css-4okk7a .katex .mathnormal{font-family:KaTeX_Math;font-style:italic;}.css-4okk7a .katex .mathit{font-family:KaTeX_Main;font-style:italic;}.css-4okk7a .katex .mathrm{font-style:normal;}.css-4okk7a .katex .mathbf{font-family:KaTeX_Main;font-weight:bold;}.css-4okk7a .katex .boldsymbol{font-family:KaTeX_Math;font-weight:bold;font-style:italic;}.css-4okk7a .katex .amsrm{font-family:KaTeX_AMS;}.css-4okk7a .katex .mathbb,.css-4okk7a .katex .textbb{font-family:KaTeX_AMS;}.css-4okk7a .katex .mathcal{font-family:KaTeX_Caligraphic;}.css-4okk7a .katex .mathfrak,.css-4okk7a .katex .textfrak{font-family:KaTeX_Fraktur;}.css-4okk7a .katex .mathtt{font-family:KaTeX_Typewriter;}.css-4okk7a .katex .mathscr,.css-4okk7a .katex .textscr{font-family:KaTeX_Script;}.css-4okk7a .katex .mathsf,.css-4okk7a .katex .textsf{font-family:KaTeX_SansSerif;}.css-4okk7a .katex .mathboldsf,.css-4okk7a .katex .textboldsf{font-family:KaTeX_SansSerif;font-weight:bold;}.css-4okk7a .katex .mathitsf,.css-4okk7a .katex .textitsf{font-family:KaTeX_SansSerif;font-style:italic;}.css-4okk7a .katex .mainrm{font-family:KaTeX_Main;font-style:normal;}.css-4okk7a .katex .vlist-t{display:inline-table;table-layout:fixed;border-collapse:collapse;}.css-4okk7a .katex .vlist-r{display:table-row;}.css-4okk7a .katex .vlist{display:table-cell;vertical-align:bottom;position:relative;}.css-4okk7a .katex .vlist>span{display:block;height:0;position:relative;}.css-4okk7a .katex .vlist>span>span{display:inline-block;}.css-4okk7a .katex .vlist>span>.pstrut{overflow:hidden;width:0;}.css-4okk7a .katex .vlist-t2{margin-right:-2px;}.css-4okk7a .katex .vlist-s{display:table-cell;vertical-align:bottom;font-size:1px;width:2px;min-width:2px;}.css-4okk7a .katex .vbox{display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;-webkit-flex-direction:column;-ms-flex-direction:column;flex-direction:column;-webkit-align-items:baseline;-webkit-box-align:baseline;-ms-flex-align:baseline;align-items:baseline;}.css-4okk7a .katex .hbox{display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;width:100%;}.css-4okk7a .katex .thinbox{display:-webkit-inline-box;display:-webkit-inline-flex;display:-ms-inline-flexbox;display:inline-flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;width:0;max-width:0;}.css-4okk7a .katex .msupsub{text-align:left;}.css-4okk7a .katex .mfrac>span>span{text-align:center;}.css-4okk7a .katex .mfrac .frac-line{display:inline-block;width:100%;border-bottom-style:solid;}.css-4okk7a .katex .mfrac .frac-line,.css-4okk7a .katex .overline .overline-line,.css-4okk7a .katex .underline .underline-line,.css-4okk7a .katex .hline,.css-4okk7a .katex .hdashline,.css-4okk7a .katex .rule{min-height:1px;}.css-4okk7a .katex .mspace{display:inline-block;}.css-4okk7a .katex .llap,.css-4okk7a .katex .rlap,.css-4okk7a .katex .clap{width:0;position:relative;}.css-4okk7a .katex .llap>.inner,.css-4okk7a .katex .rlap>.inner,.css-4okk7a .katex .clap>.inner{position:absolute;}.css-4okk7a .katex .llap>.fix,.css-4okk7a .katex .rlap>.fix,.css-4okk7a .katex .clap>.fix{display:inline-block;}.css-4okk7a .katex .llap>.inner{right:0;}.css-4okk7a .katex .rlap>.inner,.css-4okk7a .katex .clap>.inner{left:0;}.css-4okk7a .katex .clap>.inner>span{margin-left:-50%;margin-right:50%;}.css-4okk7a .katex .rule{display:inline-block;border:solid 0;position:relative;}.css-4okk7a .katex .overline .overline-line,.css-4okk7a .katex .underline .underline-line,.css-4okk7a .katex .hline{display:inline-block;width:100%;border-bottom-style:solid;}.css-4okk7a .katex .hdashline{display:inline-block;width:100%;border-bottom-style:dashed;}.css-4okk7a .katex .sqrt>.root{margin-left:0.27777778em;margin-right:-0.55555556em;}.css-4okk7a .katex .sizing.reset-size1.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size1{font-size:1em;}.css-4okk7a .katex .sizing.reset-size1.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size2{font-size:1.2em;}.css-4okk7a .katex .sizing.reset-size1.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size3{font-size:1.4em;}.css-4okk7a .katex .sizing.reset-size1.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size4{font-size:1.6em;}.css-4okk7a .katex .sizing.reset-size1.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size5{font-size:1.8em;}.css-4okk7a .katex .sizing.reset-size1.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size6{font-size:2em;}.css-4okk7a .katex .sizing.reset-size1.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size7{font-size:2.4em;}.css-4okk7a .katex .sizing.reset-size1.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size8{font-size:2.88em;}.css-4okk7a .katex .sizing.reset-size1.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size9{font-size:3.456em;}.css-4okk7a .katex .sizing.reset-size1.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size10{font-size:4.148em;}.css-4okk7a .katex .sizing.reset-size1.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size1.size11{font-size:4.976em;}.css-4okk7a .katex .sizing.reset-size2.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size1{font-size:0.83333333em;}.css-4okk7a .katex .sizing.reset-size2.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size2{font-size:1em;}.css-4okk7a .katex .sizing.reset-size2.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size3{font-size:1.16666667em;}.css-4okk7a .katex .sizing.reset-size2.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size4{font-size:1.33333333em;}.css-4okk7a .katex .sizing.reset-size2.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size5{font-size:1.5em;}.css-4okk7a .katex .sizing.reset-size2.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size6{font-size:1.66666667em;}.css-4okk7a .katex .sizing.reset-size2.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size7{font-size:2em;}.css-4okk7a .katex .sizing.reset-size2.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size8{font-size:2.4em;}.css-4okk7a .katex .sizing.reset-size2.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size9{font-size:2.88em;}.css-4okk7a .katex .sizing.reset-size2.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size10{font-size:3.45666667em;}.css-4okk7a .katex .sizing.reset-size2.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size2.size11{font-size:4.14666667em;}.css-4okk7a .katex .sizing.reset-size3.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size1{font-size:0.71428571em;}.css-4okk7a .katex .sizing.reset-size3.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size2{font-size:0.85714286em;}.css-4okk7a .katex .sizing.reset-size3.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size3{font-size:1em;}.css-4okk7a .katex .sizing.reset-size3.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size4{font-size:1.14285714em;}.css-4okk7a .katex .sizing.reset-size3.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size5{font-size:1.28571429em;}.css-4okk7a .katex .sizing.reset-size3.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size6{font-size:1.42857143em;}.css-4okk7a .katex .sizing.reset-size3.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size7{font-size:1.71428571em;}.css-4okk7a .katex .sizing.reset-size3.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size8{font-size:2.05714286em;}.css-4okk7a .katex .sizing.reset-size3.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size9{font-size:2.46857143em;}.css-4okk7a .katex .sizing.reset-size3.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size10{font-size:2.96285714em;}.css-4okk7a .katex .sizing.reset-size3.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size3.size11{font-size:3.55428571em;}.css-4okk7a .katex .sizing.reset-size4.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size1{font-size:0.625em;}.css-4okk7a .katex .sizing.reset-size4.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size2{font-size:0.75em;}.css-4okk7a .katex .sizing.reset-size4.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size3{font-size:0.875em;}.css-4okk7a .katex .sizing.reset-size4.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size4{font-size:1em;}.css-4okk7a .katex .sizing.reset-size4.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size5{font-size:1.125em;}.css-4okk7a .katex .sizing.reset-size4.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size6{font-size:1.25em;}.css-4okk7a .katex .sizing.reset-size4.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size7{font-size:1.5em;}.css-4okk7a .katex .sizing.reset-size4.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size8{font-size:1.8em;}.css-4okk7a .katex .sizing.reset-size4.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size9{font-size:2.16em;}.css-4okk7a .katex .sizing.reset-size4.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size10{font-size:2.5925em;}.css-4okk7a .katex .sizing.reset-size4.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size4.size11{font-size:3.11em;}.css-4okk7a .katex .sizing.reset-size5.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size1{font-size:0.55555556em;}.css-4okk7a .katex .sizing.reset-size5.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size2{font-size:0.66666667em;}.css-4okk7a .katex .sizing.reset-size5.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size3{font-size:0.77777778em;}.css-4okk7a .katex .sizing.reset-size5.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size4{font-size:0.88888889em;}.css-4okk7a .katex .sizing.reset-size5.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size5{font-size:1em;}.css-4okk7a .katex .sizing.reset-size5.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size6{font-size:1.11111111em;}.css-4okk7a .katex .sizing.reset-size5.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size7{font-size:1.33333333em;}.css-4okk7a .katex .sizing.reset-size5.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size8{font-size:1.6em;}.css-4okk7a .katex .sizing.reset-size5.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size9{font-size:1.92em;}.css-4okk7a .katex .sizing.reset-size5.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size10{font-size:2.30444444em;}.css-4okk7a .katex .sizing.reset-size5.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size5.size11{font-size:2.76444444em;}.css-4okk7a .katex .sizing.reset-size6.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size1{font-size:0.5em;}.css-4okk7a .katex .sizing.reset-size6.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size2{font-size:0.6em;}.css-4okk7a .katex .sizing.reset-size6.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size3{font-size:0.7em;}.css-4okk7a .katex .sizing.reset-size6.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size4{font-size:0.8em;}.css-4okk7a .katex .sizing.reset-size6.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size5{font-size:0.9em;}.css-4okk7a .katex .sizing.reset-size6.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size6{font-size:1em;}.css-4okk7a .katex .sizing.reset-size6.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size7{font-size:1.2em;}.css-4okk7a .katex .sizing.reset-size6.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size8{font-size:1.44em;}.css-4okk7a .katex .sizing.reset-size6.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size9{font-size:1.728em;}.css-4okk7a .katex .sizing.reset-size6.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size10{font-size:2.074em;}.css-4okk7a .katex .sizing.reset-size6.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size6.size11{font-size:2.488em;}.css-4okk7a .katex .sizing.reset-size7.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size1{font-size:0.41666667em;}.css-4okk7a .katex .sizing.reset-size7.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size2{font-size:0.5em;}.css-4okk7a .katex .sizing.reset-size7.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size3{font-size:0.58333333em;}.css-4okk7a .katex .sizing.reset-size7.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size4{font-size:0.66666667em;}.css-4okk7a .katex .sizing.reset-size7.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size5{font-size:0.75em;}.css-4okk7a .katex .sizing.reset-size7.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size6{font-size:0.83333333em;}.css-4okk7a .katex .sizing.reset-size7.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size7{font-size:1em;}.css-4okk7a .katex .sizing.reset-size7.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size8{font-size:1.2em;}.css-4okk7a .katex .sizing.reset-size7.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size9{font-size:1.44em;}.css-4okk7a .katex .sizing.reset-size7.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size10{font-size:1.72833333em;}.css-4okk7a .katex .sizing.reset-size7.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size7.size11{font-size:2.07333333em;}.css-4okk7a .katex .sizing.reset-size8.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size1{font-size:0.34722222em;}.css-4okk7a .katex .sizing.reset-size8.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size2{font-size:0.41666667em;}.css-4okk7a .katex .sizing.reset-size8.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size3{font-size:0.48611111em;}.css-4okk7a .katex .sizing.reset-size8.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size4{font-size:0.55555556em;}.css-4okk7a .katex .sizing.reset-size8.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size5{font-size:0.625em;}.css-4okk7a .katex .sizing.reset-size8.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size6{font-size:0.69444444em;}.css-4okk7a .katex .sizing.reset-size8.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size7{font-size:0.83333333em;}.css-4okk7a .katex .sizing.reset-size8.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size8{font-size:1em;}.css-4okk7a .katex .sizing.reset-size8.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size9{font-size:1.2em;}.css-4okk7a .katex .sizing.reset-size8.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size10{font-size:1.44027778em;}.css-4okk7a .katex .sizing.reset-size8.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size8.size11{font-size:1.72777778em;}.css-4okk7a .katex .sizing.reset-size9.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size1{font-size:0.28935185em;}.css-4okk7a .katex .sizing.reset-size9.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size2{font-size:0.34722222em;}.css-4okk7a .katex .sizing.reset-size9.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size3{font-size:0.40509259em;}.css-4okk7a .katex .sizing.reset-size9.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size4{font-size:0.46296296em;}.css-4okk7a .katex .sizing.reset-size9.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size5{font-size:0.52083333em;}.css-4okk7a .katex .sizing.reset-size9.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size6{font-size:0.5787037em;}.css-4okk7a .katex .sizing.reset-size9.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size7{font-size:0.69444444em;}.css-4okk7a .katex .sizing.reset-size9.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size8{font-size:0.83333333em;}.css-4okk7a .katex .sizing.reset-size9.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size9{font-size:1em;}.css-4okk7a .katex .sizing.reset-size9.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size10{font-size:1.20023148em;}.css-4okk7a .katex .sizing.reset-size9.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size9.size11{font-size:1.43981481em;}.css-4okk7a .katex .sizing.reset-size10.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size1{font-size:0.24108004em;}.css-4okk7a .katex .sizing.reset-size10.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size2{font-size:0.28929605em;}.css-4okk7a .katex .sizing.reset-size10.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size3{font-size:0.33751205em;}.css-4okk7a .katex .sizing.reset-size10.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size4{font-size:0.38572806em;}.css-4okk7a .katex .sizing.reset-size10.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size5{font-size:0.43394407em;}.css-4okk7a .katex .sizing.reset-size10.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size6{font-size:0.48216008em;}.css-4okk7a .katex .sizing.reset-size10.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size7{font-size:0.57859209em;}.css-4okk7a .katex .sizing.reset-size10.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size8{font-size:0.69431051em;}.css-4okk7a .katex .sizing.reset-size10.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size9{font-size:0.83317261em;}.css-4okk7a .katex .sizing.reset-size10.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size10{font-size:1em;}.css-4okk7a .katex .sizing.reset-size10.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size10.size11{font-size:1.19961427em;}.css-4okk7a .katex .sizing.reset-size11.size1,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size1{font-size:0.20096463em;}.css-4okk7a .katex .sizing.reset-size11.size2,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size2{font-size:0.24115756em;}.css-4okk7a .katex .sizing.reset-size11.size3,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size3{font-size:0.28135048em;}.css-4okk7a .katex .sizing.reset-size11.size4,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size4{font-size:0.32154341em;}.css-4okk7a .katex .sizing.reset-size11.size5,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size5{font-size:0.36173633em;}.css-4okk7a .katex .sizing.reset-size11.size6,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size6{font-size:0.40192926em;}.css-4okk7a .katex .sizing.reset-size11.size7,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size7{font-size:0.48231511em;}.css-4okk7a .katex .sizing.reset-size11.size8,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size8{font-size:0.57877814em;}.css-4okk7a .katex .sizing.reset-size11.size9,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size9{font-size:0.69453376em;}.css-4okk7a .katex .sizing.reset-size11.size10,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size10{font-size:0.83360129em;}.css-4okk7a .katex .sizing.reset-size11.size11,.css-4okk7a .katex .fontsize-ensurer.reset-size11.size11{font-size:1em;}.css-4okk7a .katex .delimsizing.size1{font-family:KaTeX_Size1;}.css-4okk7a .katex .delimsizing.size2{font-family:KaTeX_Size2;}.css-4okk7a .katex .delimsizing.size3{font-family:KaTeX_Size3;}.css-4okk7a .katex .delimsizing.size4{font-family:KaTeX_Size4;}.css-4okk7a .katex .delimsizing.mult .delim-size1>span{font-family:KaTeX_Size1;}.css-4okk7a .katex .delimsizing.mult .delim-size4>span{font-family:KaTeX_Size4;}.css-4okk7a .katex .nulldelimiter{display:inline-block;width:0.12em;}.css-4okk7a .katex .delimcenter{position:relative;}.css-4okk7a .katex .op-symbol{position:relative;}.css-4okk7a .katex .op-symbol.small-op{font-family:KaTeX_Size1;}.css-4okk7a .katex .op-symbol.large-op{font-family:KaTeX_Size2;}.css-4okk7a .katex .op-limits>.vlist-t{text-align:center;}.css-4okk7a .katex .accent>.vlist-t{text-align:center;}.css-4okk7a .katex .accent .accent-body{position:relative;}.css-4okk7a .katex .accent .accent-body:not(.accent-full){width:0;}.css-4okk7a .katex .overlay{display:block;}.css-4okk7a .katex .mtable .vertical-separator{display:inline-block;min-width:1px;}.css-4okk7a .katex .mtable .arraycolsep{display:inline-block;}.css-4okk7a .katex .mtable .col-align-c>.vlist-t{text-align:center;}.css-4okk7a .katex .mtable .col-align-l>.vlist-t{text-align:left;}.css-4okk7a .katex .mtable .col-align-r>.vlist-t{text-align:right;}.css-4okk7a .katex .svg-align{text-align:left;}.css-4okk7a .katex svg{display:block;position:absolute;width:100%;height:inherit;fill:currentColor;stroke:currentColor;fill-rule:nonzero;fill-opacity:1;stroke-width:1;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;}.css-4okk7a .katex svg path{stroke:none;}.css-4okk7a .katex img{border-style:none;min-width:0;min-height:0;max-width:none;max-height:none;}.css-4okk7a .katex .stretchy{width:100%;display:block;position:relative;overflow:hidden;}.css-4okk7a .katex .stretchy::before,.css-4okk7a .katex .stretchy::after{content:'';}.css-4okk7a .katex .hide-tail{width:100%;position:relative;overflow:hidden;}.css-4okk7a .katex .halfarrow-left{position:absolute;left:0;width:50.2%;overflow:hidden;}.css-4okk7a .katex .halfarrow-right{position:absolute;right:0;width:50.2%;overflow:hidden;}.css-4okk7a .katex .brace-left{position:absolute;left:0;width:25.1%;overflow:hidden;}.css-4okk7a .katex .brace-center{position:absolute;left:25%;width:50%;overflow:hidden;}.css-4okk7a .katex .brace-right{position:absolute;right:0;width:25.1%;overflow:hidden;}.css-4okk7a .katex .x-arrow-pad{padding:0 0.5em;}.css-4okk7a .katex .cd-arrow-pad{padding:0 0.55556em 0 0.27778em;}.css-4okk7a .katex .x-arrow,.css-4okk7a .katex .mover,.css-4okk7a .katex .munder{text-align:center;}.css-4okk7a .katex .boxpad{padding:0 0.3em 0 0.3em;}.css-4okk7a .katex .fbox,.css-4okk7a .katex .fcolorbox{box-sizing:border-box;border:0.04em solid;}.css-4okk7a .katex .cancel-pad{padding:0 0.2em 0 0.2em;}.css-4okk7a .katex .cancel-lap{margin-left:-0.2em;margin-right:-0.2em;}.css-4okk7a .katex .sout{border-bottom-style:solid;border-bottom-width:0.08em;}.css-4okk7a .katex .angl{box-sizing:border-box;border-top:0.049em solid;border-right:0.049em solid;margin-right:0.03889em;}.css-4okk7a .katex .anglpad{padding:0 0.03889em 0 0.03889em;}.css-4okk7a .katex .eqn-num::before{counter-increment:katexEqnNo;content:'(' counter(katexEqnNo) ')';}.css-4okk7a .katex .mml-eqn-num::before{counter-increment:mmlEqnNo;content:'(' counter(mmlEqnNo) ')';}.css-4okk7a .katex .mtr-glue{width:50%;}.css-4okk7a .katex .cd-vert-arrow{display:inline-block;position:relative;}.css-4okk7a .katex .cd-label-left{display:inline-block;position:absolute;right:calc(50% + 0.3em);text-align:left;}.css-4okk7a .katex .cd-label-right{display:inline-block;position:absolute;left:calc(50% + 0.3em);text-align:right;}.css-4okk7a .katex-display{display:block;margin:1em 0;text-align:center;}.css-4okk7a .katex-display>.katex{display:block;white-space:nowrap;}.css-4okk7a .katex-display>.katex>.katex-html{display:block;position:relative;}.css-4okk7a .katex-display>.katex>.katex-html>.tag{position:absolute;right:0;}.css-4okk7a .katex-display.leqno>.katex>.katex-html>.tag{left:0;right:auto;}.css-4okk7a .katex-display.fleqn>.katex{text-align:left;padding-left:2em;}.css-4okk7a body{counter-reset:katexEqnNo mmlEqnNo;}.css-4okk7a table{width:-webkit-max-content;width:-moz-max-content;width:max-content;}.css-4okk7a .tableBlock{max-width:100%;margin-bottom:1rem;overflow-y:scroll;}.css-4okk7a .tableBlock thead,.css-4okk7a .tableBlock thead th{border-bottom:1px solid #333!important;}.css-4okk7a .tableBlock th,.css-4okk7a .tableBlock td{padding:10px;text-align:left;}.css-4okk7a .tableBlock th{font-weight:bold!important;}.css-4okk7a .tableBlock caption{caption-side:bottom;color:#555;font-size:12px;font-style:italic;text-align:center;}.css-4okk7a .tableBlock caption>p{margin:0;}.css-4okk7a .tableBlock th>p,.css-4okk7a .tableBlock td>p{margin:0;}.css-4okk7a .tableBlock [data-background-color='aliceblue']{background-color:#f0f8ff;color:#000;}.css-4okk7a .tableBlock [data-background-color='black']{background-color:#000;color:#fff;}.css-4okk7a .tableBlock [data-background-color='chocolate']{background-color:#d2691e;color:#fff;}.css-4okk7a .tableBlock [data-background-color='cornflowerblue']{background-color:#6495ed;color:#fff;}.css-4okk7a .tableBlock [data-background-color='crimson']{background-color:#dc143c;color:#fff;}.css-4okk7a .tableBlock [data-background-color='darkblue']{background-color:#00008b;color:#fff;}.css-4okk7a .tableBlock [data-background-color='darkseagreen']{background-color:#8fbc8f;color:#000;}.css-4okk7a .tableBlock [data-background-color='deepskyblue']{background-color:#00bfff;color:#000;}.css-4okk7a .tableBlock [data-background-color='gainsboro']{background-color:#dcdcdc;color:#000;}.css-4okk7a .tableBlock [data-background-color='grey']{background-color:#808080;color:#fff;}.css-4okk7a .tableBlock [data-background-color='lemonchiffon']{background-color:#fffacd;color:#000;}.css-4okk7a .tableBlock [data-background-color='lightpink']{background-color:#ffb6c1;color:#000;}.css-4okk7a .tableBlock [data-background-color='lightsalmon']{background-color:#ffa07a;color:#000;}.css-4okk7a .tableBlock [data-background-color='lightskyblue']{background-color:#87cefa;color:#000;}.css-4okk7a .tableBlock [data-background-color='mediumblue']{background-color:#0000cd;color:#fff;}.css-4okk7a .tableBlock [data-background-color='omnigrey']{background-color:#f0f0f0;color:#000;}.css-4okk7a .tableBlock [data-background-color='white']{background-color:#fff;color:#000;}.css-4okk7a .tableBlock [data-text-align='center']{text-align:center;}.css-4okk7a .tableBlock [data-text-align='left']{text-align:left;}.css-4okk7a .tableBlock [data-text-align='right']{text-align:right;}.css-4okk7a .tableBlock [data-vertical-align='bottom']{vertical-align:bottom;}.css-4okk7a .tableBlock [data-vertical-align='middle']{vertical-align:middle;}.css-4okk7a .tableBlock [data-vertical-align='top']{vertical-align:top;}.css-4okk7a .tableBlock__font-size--xxsmall{font-size:10px;}.css-4okk7a .tableBlock__font-size--xsmall{font-size:12px;}.css-4okk7a .tableBlock__font-size--small{font-size:14px;}.css-4okk7a .tableBlock__font-size--large{font-size:18px;}.css-4okk7a .tableBlock__border--some tbody tr:not(:last-child){border-bottom:1px solid #e2e5e7;}.css-4okk7a .tableBlock__border--bordered td,.css-4okk7a .tableBlock__border--bordered th{border:1px solid #e2e5e7;}.css-4okk7a .tableBlock__border--borderless tbody+tbody,.css-4okk7a .tableBlock__border--borderless td,.css-4okk7a .tableBlock__border--borderless th,.css-4okk7a .tableBlock__border--borderless tr,.css-4okk7a .tableBlock__border--borderless thead,.css-4okk7a .tableBlock__border--borderless thead th{border:0!important;}.css-4okk7a .tableBlock:not(.tableBlock__table-striped) tbody tr{background-color:unset!important;}.css-4okk7a .tableBlock__table-striped tbody tr:nth-of-type(odd){background-color:#f9fafc!important;}.css-4okk7a .tableBlock__table-compactl th,.css-4okk7a .tableBlock__table-compact td{padding:3px!important;}.css-4okk7a .tableBlock__full-size{width:100%;}.css-4okk7a .textBlock{margin-bottom:16px;}.css-4okk7a .textBlock__text-formatting--finePrint{font-size:12px;}.css-4okk7a .textBlock__text-infoBox{padding:0.75rem 1.25rem;margin-bottom:1rem;border:1px solid transparent;border-radius:0.25rem;}.css-4okk7a .textBlock__text-infoBox p{margin:0;}.css-4okk7a .textBlock__text-infoBox--primary{background-color:#cce5ff;border-color:#b8daff;color:#004085;}.css-4okk7a .textBlock__text-infoBox--secondary{background-color:#e2e3e5;border-color:#d6d8db;color:#383d41;}.css-4okk7a .textBlock__text-infoBox--success{background-color:#d4edda;border-color:#c3e6cb;color:#155724;}.css-4okk7a .textBlock__text-infoBox--danger{background-color:#f8d7da;border-color:#f5c6cb;color:#721c24;}.css-4okk7a .textBlock__text-infoBox--warning{background-color:#fff3cd;border-color:#ffeeba;color:#856404;}.css-4okk7a .textBlock__text-infoBox--info{background-color:#d1ecf1;border-color:#bee5eb;color:#0c5460;}.css-4okk7a .textBlock__text-infoBox--dark{background-color:#d6d8d9;border-color:#c6c8ca;color:#1b1e21;}.css-4okk7a .text-overline{-webkit-text-decoration:overline;text-decoration:overline;}.css-4okk7a.css-4okk7a{color:#2B3148;background-color:transparent;font-family:"Roboto","Helvetica","Arial",sans-serif;font-size:20px;line-height:24px;overflow:visible;padding-top:0px;position:relative;}.css-4okk7a.css-4okk7a:after{content:'';-webkit-transform:scale(0);-moz-transform:scale(0);-ms-transform:scale(0);transform:scale(0);position:absolute;border:2px solid #EA9430;border-radius:2px;inset:-8px;z-index:1;}.css-4okk7a .js-external-link-button.link-like,.css-4okk7a .js-external-link-anchor{color:inherit;border-radius:1px;-webkit-text-decoration:underline;text-decoration:underline;}.css-4okk7a .js-external-link-button.link-like:hover,.css-4okk7a .js-external-link-anchor:hover,.css-4okk7a .js-external-link-button.link-like:active,.css-4okk7a .js-external-link-anchor:active{text-decoration-thickness:2px;text-shadow:1px 0 0;}.css-4okk7a .js-external-link-button.link-like:focus-visible,.css-4okk7a .js-external-link-anchor:focus-visible{outline:transparent 2px dotted;box-shadow:0 0 0 2px #6314E6;}.css-4okk7a p,.css-4okk7a div{margin:0px;display:block;}.css-4okk7a pre{margin:0px;display:block;}.css-4okk7a pre code{display:block;width:-webkit-fit-content;width:-moz-fit-content;width:fit-content;}.css-4okk7a pre:not(:first-child){padding-top:8px;}.css-4okk7a ul,.css-4okk7a ol{display:block margin:0px;padding-left:20px;}.css-4okk7a ul li,.css-4okk7a ol li{padding-top:8px;}.css-4okk7a ul ul,.css-4okk7a ol ul,.css-4okk7a ul ol,.css-4okk7a ol ol{padding-top:0px;}.css-4okk7a ul:not(:first-child),.css-4okk7a ol:not(:first-child){padding-top:4px;} Test setup

Choose test type

t-test for the population mean, μ, based on one independent sample . Null hypothesis H 0 : μ = μ 0  

Alternative hypothesis H 1

Test details

Significance level α

The probability that we reject a true H 0 (type I error).

Degrees of freedom

Calculated as sample size minus one.

Test results

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Inference for Comparing 2 Population Means (HT for 2 Means, independent samples)

More of the good stuff! We will need to know how to label the null and alternative hypothesis, calculate the test statistic, and then reach our conclusion using the critical value method or the p-value method.

The Test Statistic for a Test of 2 Means from Independent Samples:

[latex]t = \displaystyle \frac{(\bar{x_1} - \bar{x_2}) - (\mu_1 - \mu_2)}{\sqrt{\displaystyle \frac{s_1^2}{n_1} + \displaystyle \frac{s_2^2}{n_2}}}[/latex]

What the different symbols mean:

[latex]n_1[/latex] is the sample size for the first group

[latex]n_2[/latex] is the sample size for the second group

[latex]df[/latex], the degrees of freedom, is the smaller of [latex]n_1 - 1[/latex] and [latex]n_2 - 1[/latex]

[latex]\mu_1[/latex] is the population mean from the first group

[latex]\mu_2[/latex] is the population mean from the second group

[latex]\bar{x_1}[/latex] is the sample mean for the first group

[latex]\bar{x_2}[/latex] is the sample mean for the second group

[latex]s_1[/latex] is the sample standard deviation for the first group

[latex]s_2[/latex] is the sample standard deviation for the second group

[latex]\alpha[/latex] is the significance level , usually given within the problem, or if not given, we assume it to be 5% or 0.05

Assumptions when conducting a Test for 2 Means from Independent Samples:

  • We do not know the population standard deviations, and we do not assume they are equal
  • The two samples or groups are independent
  • Both samples are simple random samples
  • Both populations are Normally distributed OR both samples are large ([latex]n_1 > 30[/latex] and [latex]n_2 > 30[/latex])

Steps to conduct the Test for 2 Means from Independent Samples:

  • Identify all the symbols listed above (all the stuff that will go into the formulas). This includes [latex]n_1[/latex] and [latex]n_2[/latex], [latex]df[/latex], [latex]\mu_1[/latex] and [latex]\mu_2[/latex], [latex]\bar{x_1}[/latex] and [latex]\bar{x_2}[/latex], [latex]s_1[/latex] and [latex]s_2[/latex], and [latex]\alpha[/latex]
  • Identify the null and alternative hypotheses
  • Calculate the test statistic, [latex]t = \displaystyle \frac{(\bar{x_1} - \bar{x_2}) - (\mu_1 - \mu_2)}{\sqrt{\displaystyle \frac{s_1^2}{n_1} + \displaystyle \frac{s_2^2}{n_2}}}[/latex]
  • Find the critical value(s) OR the p-value OR both
  • Apply the Decision Rule
  • Write up a conclusion for the test

Example 1: Study on the effectiveness of stents for stroke patients [1]

In this study , researchers randomly assigned stroke patients to two groups: one received the current standard care (control) and the other received a stent surgery in addition to the standard care (stent treatment). If the stents work, the treatment group should have a lower average disability score . Do the results give convincing statistical evidence that the stent treatment reduces the average disability from stroke?

Since we are being asked for convincing statistical evidence, a hypothesis test should be conducted. In this case, we are dealing with averages from two samples or groups (the patients with stent treatment and patients receiving the standard care), so we will conduct a Test of 2 Means.

  • [latex]n_1 = 98[/latex] is the sample size for the first group
  • [latex]n_2 = 93[/latex] is the sample size for the second group
  • [latex]df[/latex], the degrees of freedom, is the smaller of [latex]98 - 1 = 97[/latex] and [latex]93 - 1 = 92[/latex], so [latex]df = 92[/latex]
  • [latex]\bar{x_1} = 2.26[/latex] is the sample mean for the first group
  • [latex]\bar{x_2} = 3.23[/latex] is the sample mean for the second group
  • [latex]s_1 = 1.78[/latex] is the sample standard deviation for the first group
  • [latex]s_2 = 1.78[/latex] is the sample standard deviation for the second group
  • [latex]\alpha = 0.05[/latex] (we were not told a specific value in the problem, so we are assuming it is 5%)
  • One additional assumption we extend from the null hypothesis is that [latex]\mu_1 - \mu_2 = 0[/latex]; this means that in our formula, those variables cancel out
  • [latex]H_{0}: \mu_1 = \mu_2[/latex]
  • [latex]H_{A}: \mu_1 < \mu_2[/latex]
  • [latex]t = \displaystyle \frac{(\bar{x_1} - \bar{x_2}) - (\mu_1 - \mu_2)}{\sqrt{\displaystyle \frac{s_1^2}{n_1} + \displaystyle \frac{s_2^2}{n_2}}} = \displaystyle \frac{(2.26 - 3.23) - 0)}{\sqrt{\displaystyle \frac{1.78^2}{98} + \displaystyle \frac{1.78^2}{93}}} = -3.76[/latex]
  • StatDisk : We can conduct this test using StatDisk. The nice thing about StatDisk is that it will also compute the test statistic. From the main menu above we click on Analysis, Hypothesis Testing, and then Mean Two Independent Samples. From there enter the 0.05 significance, along with the specific values as outlined in the picture below in Step 2. Notice the alternative hypothesis is the [latex]<[/latex] option. Enter the sample size, mean, and standard deviation for each group, and make sure that unequal variances is selected. Now we click on Evaluate. If you check the values, the test statistic is reported in the Step 3 display, as well as the P-Value of 0.00011.
  • Applying the Decision Rule: We now compare this to our significance level, which is 0.05. If the p-value is smaller or equal to the alpha level, we have enough evidence for our claim, otherwise we do not. Here, [latex]p-value = 0.00011[/latex], which is definitely smaller than [latex]\alpha = 0.05[/latex], so we have enough evidence for the alternative hypothesis…but what does this mean?
  • Conclusion: Because our p-value  of [latex]0.00011[/latex] is less than our [latex]\alpha[/latex] level of [latex]0.05[/latex], we reject [latex]H_{0}[/latex]. We have convincing statistical evidence that the stent treatment reduces the average disability from stroke.

Example 2: Home Run Distances

In 1998, Sammy Sosa and Mark McGwire (2 players in Major League Baseball) were on pace to set a new home run record. At the end of the season McGwire ended up with 70 home runs, and Sosa ended up with 66. The home run distances were recorded and compared (sometimes a player’s home run distance is used to measure their “power”). Do the results give convincing statistical evidence that the home run distances are different from each other? Who would you say “hit the ball farther” in this comparison?

Since we are being asked for convincing statistical evidence, a hypothesis test should be conducted. In this case, we are dealing with averages from two samples or groups (the home run distances), so we will conduct a Test of 2 Means.

  • [latex]n_1 = 70[/latex] is the sample size for the first group
  • [latex]n_2 = 66[/latex] is the sample size for the second group
  • [latex]df[/latex], the degrees of freedom, is the smaller of [latex]70 - 1 = 69[/latex] and [latex]66 - 1 = 65[/latex], so [latex]df = 65[/latex]
  • [latex]\bar{x_1} = 418.5[/latex] is the sample mean for the first group
  • [latex]\bar{x_2} = 404.8[/latex] is the sample mean for the second group
  • [latex]s_1 = 45.5[/latex] is the sample standard deviation for the first group
  • [latex]s_2 = 35.7[/latex] is the sample standard deviation for the second group
  • [latex]H_{A}: \mu_1 \neq \mu_2[/latex]
  • [latex]t = \displaystyle \frac{(\bar{x_1} - \bar{x_2}) - (\mu_1 - \mu_2)}{\sqrt{\displaystyle \frac{s_1^2}{n_1} + \displaystyle \frac{s_2^2}{n_2}}} = \displaystyle \frac{(418.5 - 404.8) - 0)}{\sqrt{\displaystyle \frac{45.5^2}{70} + \displaystyle \frac{35.7^2}{65}}} = 1.95[/latex]
  • StatDisk : We can conduct this test using StatDisk. The nice thing about StatDisk is that it will also compute the test statistic. From the main menu above we click on Analysis, Hypothesis Testing, and then Mean Two Independent Samples. From there enter the 0.05 significance, along with the specific values as outlined in the picture below in Step 2. Notice the alternative hypothesis is the [latex]\neq[/latex] option. Enter the sample size, mean, and standard deviation for each group, and make sure that unequal variances is selected. Now we click on Evaluate. If you check the values, the test statistic is reported in the Step 3 display, as well as the P-Value of 0.05221.
  • Applying the Decision Rule: We now compare this to our significance level, which is 0.05. If the p-value is smaller or equal to the alpha level, we have enough evidence for our claim, otherwise we do not. Here, [latex]p-value = 0.05221[/latex], which is larger than [latex]\alpha = 0.05[/latex], so we do not have enough evidence for the alternative hypothesis…but what does this mean?
  • Conclusion: Because our p-value  of [latex]0.05221[/latex] is larger than our [latex]\alpha[/latex] level of [latex]0.05[/latex], we fail to reject [latex]H_{0}[/latex]. We do not have convincing statistical evidence that the home run distances are different.
  • Follow-up commentary: But what does this mean? There actually was a difference, right? If we take McGwire’s average and subtract Sosa’s average we get a difference of 13.7. What this result indicates is that the difference is not statistically significant; it could be due more to random chance than something meaningful. Other factors, such as sample size, could also be a determining factor (with a larger sample size, the difference may have been more meaningful).
  • Adapted from the Skew The Script curriculum ( skewthescript.org ), licensed under CC BY-NC-Sa 4.0 ↵

Basic Statistics Copyright © by Allyn Leon is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

AP®︎/College Statistics

Course: ap®︎/college statistics   >   unit 11.

  • Hypotheses for a two-sample t test
  • Example of hypotheses for paired and two-sample t tests
  • Writing hypotheses to test the difference of means

Two-sample t test for difference of means

  • Test statistic in a two-sample t test
  • P-value in a two-sample t test
  • Conclusion for a two-sample t test using a P-value
  • Conclusion for a two-sample t test using a confidence interval
  • Making conclusions about the difference of means

Want to join the conversation?

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

Video transcript

Power and Sample Size Determination

Lisa Sullivan, PhD

Professor of Biosatistics

Boston Univeristy School of Public Health

Title logo - a jumble of words related to sample size and statistical power

Introduction

A critically important aspect of any study is determining the appropriate sample size to answer the research question. This module will focus on formulas that can be used to estimate the sample size needed to produce a confidence interval estimate with a specified margin of error (precision) or to ensure that a test of hypothesis has a high probability of detecting a meaningful difference in the parameter.

Studies should be designed to include a sufficient number of participants to adequately address the research question. Studies that have either an inadequate number of participants or an excessively large number of participants are both wasteful in terms of participant and investigator time, resources to conduct the assessments, analytic efforts and so on. These situations can also be viewed as unethical as participants may have been put at risk as part of a study that was unable to answer an important question. Studies that are much larger than they need to be to answer the research questions are also wasteful.

The formulas presented here generate estimates of the necessary sample size(s) required based on statistical criteria. However, in many studies, the sample size is determined by financial or logistical constraints. For example, suppose a study is proposed to evaluate a new screening test for Down Syndrome.  Suppose that the screening test is based on analysis of a blood sample taken from women early in pregnancy. In order to evaluate the properties of the screening test (e.g., the sensitivity and specificity), each pregnant woman will be asked to provide a blood sample and in addition to undergo an amniocentesis. The amniocentesis is included as the gold standard and the plan is to compare the results of the screening test to the results of the amniocentesis. Suppose that the collection and processing of the blood sample costs $250 per participant and that the amniocentesis costs $900 per participant. These financial constraints alone might substantially limit the number of women that can be enrolled. Just as it is important to consider both statistical and clinical significance when interpreting results of a statistical analysis, it is also important to weigh both statistical and logistical issues in determining the sample size for a study.

Learning Objectives

After completing this module, the student will be able to:

  • Provide examples demonstrating how the margin of error, effect size and variability of the outcome affect sample size computations.
  • Compute the sample size required to estimate population parameters with precision.
  • Interpret statistical power in tests of hypothesis.
  • Compute the sample size required to ensure high power when hypothesis testing.

Issues in Estimating Sample Size for Confidence Intervals Estimates

The module on confidence intervals provided methods for estimating confidence intervals for various parameters (e.g., μ , p, ( μ 1 - μ 2 ),   μ d , (p 1 -p 2 )). Confidence intervals for every parameter take the following general form:

Point Estimate + Margin of Error

In the module on confidence intervals we derived the formula for the confidence interval for μ as

In practice we use the sample standard deviation to estimate the population standard deviation. Note that there is an alternative formula for estimating the mean of a continuous outcome in a single population, and it is used when the sample size is small (n<30). It involves a value from the t distribution, as opposed to one from the standard normal distribution, to reflect the desired level of confidence. When performing sample size computations, we use the large sample formula shown here. [Note: The resultant sample size might be small, and in the analysis stage, the appropriate confidence interval formula must be used.]

The point estimate for the population mean is the sample mean and the margin of error is

In planning studies, we want to determine the sample size needed to ensure that the margin of error is sufficiently small to be informative. For example, suppose we want to estimate the mean weight of female college students. We conduct a study and generate a 95% confidence interval as follows 125 + 40 pounds, or 85 to 165 pounds. The margin of error is so wide that the confidence interval is uninformative. To be informative, an investigator might want the margin of error to be no more than 5 or 10 pounds (meaning that the 95% confidence interval would have a width (lower limit to upper limit) of 10 or 20 pounds). In order to determine the sample size needed, the investigator must specify the desired margin of error . It is important to note that this is not a statistical issue, but a clinical or a practical one. For example, suppose we want to estimate the mean birth weight of infants born to mothers who smoke cigarettes during pregnancy. Birth weights in infants clearly have a much more restricted range than weights of female college students. Therefore, we would probably want to generate a confidence interval for the mean birth weight that has a margin of error not exceeding 1 or 2 pounds.

The margin of error in the one sample confidence interval for μ can be written as follows:

Our goal is to determine the sample size, n, that ensures that the margin of error, " E ," does not exceed a specified value. We can take the formula above and, with some algebra, solve for n :

First, multipy both sides of the equation by the square root of n . Then cancel out the square root of n from the numerator and denominator on the right side of the equation (since any number divided by itself is equal to 1). This leaves:

Now divide both sides by "E" and cancel out "E" from the numerator and denominator on the left side. This leaves:

Finally, square both sides of the equation to get:

This formula generates the sample size, n , required to ensure that the margin of error, E , does not exceed a specified value. To solve for n , we must input " Z ," " σ ," and " E ."  

  • Z is the value from the table of probabilities of the standard normal distribution for the desired confidence level (e.g., Z = 1.96 for 95% confidence)
  • E is the margin of error that the investigator specifies as important from a clinical or practical standpoint.
  • σ is the standard deviation of the outcome of interest.

Sometimes it is difficult to estimate σ . When we use the sample size formula above (or one of the other formulas that we will present in the sections that follow), we are planning a study to estimate the unknown mean of a particular outcome variable in a population. It is unlikely that we would know the standard deviation of that variable. In sample size computations, investigators often use a value for the standard deviation from a previous study or a study done in a different, but comparable, population. The sample size computation is not an application of statistical inference and therefore it is reasonable to use an appropriate estimate for the standard deviation. The estimate can be derived from a different study that was reported in the literature; some investigators perform a small pilot study to estimate the standard deviation. A pilot study usually involves a small number of participants (e.g., n=10) who are selected by convenience, as opposed to by random sampling. Data from the participants in the pilot study can be used to compute a sample standard deviation, which serves as a good estimate for σ in the sample size formula. Regardless of how the estimate of the variability of the outcome is derived, it should always be conservative (i.e., as large as is reasonable), so that the resultant sample size is not too small.

Sample Size for One Sample, Continuous Outcome

In studies where the plan is to estimate the mean of a continuous outcome variable in a single population, the formula for determining sample size is given below:

where Z is the value from the standard normal distribution reflecting the confidence level that will be used (e.g., Z = 1.96 for 95%), σ is the standard deviation of the outcome variable and E is the desired margin of error. The formula above generates the minimum number of subjects required to ensure that the margin of error in the confidence interval for μ does not exceed E .  

An investigator wants to estimate the mean systolic blood pressure in children with congenital heart disease who are between the ages of 3 and 5. How many children should be enrolled in the study? The investigator plans on using a 95% confidence interval (so Z=1.96) and wants a margin of error of 5 units. The standard deviation of systolic blood pressure is unknown, but the investigators conduct a literature search and find that the standard deviation of systolic blood pressures in children with other cardiac defects is between 15 and 20. To estimate the sample size, we consider the larger standard deviation in order to obtain the most conservative (largest) sample size. 

In order to ensure that the 95% confidence interval estimate of the mean systolic blood pressure in children between the ages of 3 and 5 with congenital heart disease is within 5 units of the true mean, a sample of size 62 is needed. [ Note : We always round up; the sample size formulas always generate the minimum number of subjects needed to ensure the specified precision.] Had we assumed a standard deviation of 15, the sample size would have been n=35. Because the estimates of the standard deviation were derived from studies of children with other cardiac defects, it would be advisable to use the larger standard deviation and plan for a study with 62 children. Selecting the smaller sample size could potentially produce a confidence interval estimate with a larger margin of error. 

An investigator wants to estimate the mean birth weight of infants born full term (approximately 40 weeks gestation) to mothers who are 19 years of age and under. The mean birth weight of infants born full-term to mothers 20 years of age and older is 3,510 grams with a standard deviation of 385 grams. How many women 19 years of age and under must be enrolled in the study to ensure that a 95% confidence interval estimate of the mean birth weight of their infants has a margin of error not exceeding 100 grams? Try to work through the calculation before you look at the answer.

Sample Size for One Sample, Dichotomous Outcome 

In studies where the plan is to estimate the proportion of successes in a dichotomous outcome variable (yes/no) in a single population, the formula for determining sample size is:

where Z is the value from the standard normal distribution reflecting the confidence level that will be used (e.g., Z = 1.96 for 95%) and E is the desired margin of error. p is the proportion of successes in the population. Here we are planning a study to generate a 95% confidence interval for the unknown population proportion, p . The equation to determine the sample size for determining p seems to require knowledge of p, but this is obviously this is a circular argument, because if we knew the proportion of successes in the population, then a study would not be necessary! What we really need is an approximate value of p or an anticipated value. The range of p is 0 to 1, and therefore the range of p(1-p) is 0 to 1. The value of p that maximizes p(1-p) is p=0.5. Consequently, if there is no information available to approximate p, then p=0.5 can be used to generate the most conservative, or largest, sample size.

Example 2:  

An investigator wants to estimate the proportion of freshmen at his University who currently smoke cigarettes (i.e., the prevalence of smoking). How many freshmen should be involved in the study to ensure that a 95% confidence interval estimate of the proportion of freshmen who smoke is within 5% of the true proportion?

Because we have no information on the proportion of freshmen who smoke, we use 0.5 to estimate the sample size as follows:

In order to ensure that the 95% confidence interval estimate of the proportion of freshmen who smoke is within 5% of the true proportion, a sample of size 385 is needed.

Suppose that a similar study was conducted 2 years ago and found that the prevalence of smoking was 27% among freshmen. If the investigator believes that this is a reasonable estimate of prevalence 2 years later, it can be used to plan the next study. Using this estimate of p, what sample size is needed (assuming that again a 95% confidence interval will be used and we want the same level of precision)?

An investigator wants to estimate the prevalence of breast cancer among women who are between 40 and 45 years of age living in Boston. How many women must be involved in the study to ensure that the estimate is precise? National data suggest that 1 in 235 women are diagnosed with breast cancer by age 40. This translates to a proportion of 0.0043 (0.43%) or a prevalence of 43 per 10,000 women. Suppose the investigator wants the estimate to be within 10 per 10,000 women with 95% confidence. The sample size is computed as follows:

A sample of size n=16,448 will ensure that a 95% confidence interval estimate of the prevalence of breast cancer is within 0.10 (or to within 10 women per 10,000) of its true value. This is a situation where investigators might decide that a sample of this size is not feasible. Suppose that the investigators thought a sample of size 5,000 would be reasonable from a practical point of view. How precisely can we estimate the prevalence with a sample of size n=5,000? Recall that the confidence interval formula to estimate prevalence is:

Assuming that the prevalence of breast cancer in the sample will be close to that based on national data, we would expect the margin of error to be approximately equal to the following:

Thus, with n=5,000 women, a 95% confidence interval would be expected to have a margin of error of 0.0018 (or 18 per 10,000). The investigators must decide if this would be sufficiently precise to answer the research question. Note that the above is based on the assumption that the prevalence of breast cancer in Boston is similar to that reported nationally. This may or may not be a reasonable assumption. In fact, it is the objective of the current study to estimate the prevalence in Boston. The research team, with input from clinical investigators and biostatisticians, must carefully evaluate the implications of selecting a sample of size n = 5,000, n = 16,448 or any size in between.

Sample Sizes for Two Independent Samples, Continuous Outcome

In studies where the plan is to estimate the difference in means between two independent populations, the formula for determining the sample sizes required in each comparison group is given below:

where n i is the sample size required in each group (i=1,2), Z is the value from the standard normal distribution reflecting the confidence level that will be used and E is the desired margin of error. σ again reflects the standard deviation of the outcome variable. Recall from the module on confidence intervals that, when we generated a confidence interval estimate for the difference in means, we used Sp, the pooled estimate of the common standard deviation, as a measure of variability in the outcome (based on pooling the data), where Sp is computed as follows:

If data are available on variability of the outcome in each comparison group, then Sp can be computed and used in the sample size formula. However, it is more often the case that data on the variability of the outcome are available from only one group, often the untreated (e.g., placebo control) or unexposed group. When planning a clinical trial to investigate a new drug or procedure, data are often available from other trials that involved a placebo or an active control group (i.e., a standard medication or treatment given for the condition under study). The standard deviation of the outcome variable measured in patients assigned to the placebo, control or unexposed group can be used to plan a future trial, as illustrated below.  

Note that the formula for the sample size generates sample size estimates for samples of equal size. If a study is planned where different numbers of patients will be assigned or different numbers of patients will comprise the comparison groups, then alternative formulas can be used.  

An investigator wants to plan a clinical trial to evaluate the efficacy of a new drug designed to increase HDL cholesterol (the "good" cholesterol). The plan is to enroll participants and to randomly assign them to receive either the new drug or a placebo. HDL cholesterol will be measured in each participant after 12 weeks on the assigned treatment. Based on prior experience with similar trials, the investigator expects that 10% of all participants will be lost to follow up or will drop out of the study over 12 weeks. A 95% confidence interval will be estimated to quantify the difference in mean HDL levels between patients taking the new drug as compared to placebo. The investigator would like the margin of error to be no more than 3 units. How many patients should be recruited into the study?  

The sample sizes are computed as follows:

A major issue is determining the variability in the outcome of interest (σ), here the standard deviation of HDL cholesterol. To plan this study, we can use data from the Framingham Heart Study. In participants who attended the seventh examination of the Offspring Study and were not on treatment for high cholesterol, the standard deviation of HDL cholesterol is 17.1. We will use this value and the other inputs to compute the sample sizes as follows:

Samples of size n 1 =250 and n 2 =250 will ensure that the 95% confidence interval for the difference in mean HDL levels will have a margin of error of no more than 3 units. Again, these sample sizes refer to the numbers of participants with complete data. The investigators hypothesized a 10% attrition (or drop-out) rate (in both groups). In order to ensure that the total sample size of 500 is available at 12 weeks, the investigator needs to recruit more participants to allow for attrition.  

N (number to enroll) * (% retained) = desired sample size

Therefore N (number to enroll) = desired sample size/(% retained)

N = 500/0.90 = 556

If they anticipate a 10% attrition rate, the investigators should enroll 556 participants. This will ensure N=500 with complete data at the end of the trial.

An investigator wants to compare two diet programs in children who are obese. One diet is a low fat diet, and the other is a low carbohydrate diet. The plan is to enroll children and weigh them at the start of the study. Each child will then be randomly assigned to either the low fat or the low carbohydrate diet. Each child will follow the assigned diet for 8 weeks, at which time they will again be weighed. The number of pounds lost will be computed for each child. Based on data reported from diet trials in adults, the investigator expects that 20% of all children will not complete the study. A 95% confidence interval will be estimated to quantify the difference in weight lost between the two diets and the investigator would like the margin of error to be no more than 3 pounds. How many children should be recruited into the study?  

Again the issue is determining the variability in the outcome of interest (σ), here the standard deviation in pounds lost over 8 weeks. To plan this study, investigators use data from a published study in adults. Suppose one such study compared the same diets in adults and involved 100 participants in each diet group. The study reported a standard deviation in weight lost over 8 weeks on a low fat diet of 8.4 pounds and a standard deviation in weight lost over 8 weeks on a low carbohydrate diet of 7.7 pounds. These data can be used to estimate the common standard deviation in weight lost as follows:

We now use this value and the other inputs to compute the sample sizes:

Samples of size n 1 =56 and n 2 =56 will ensure that the 95% confidence interval for the difference in weight lost between diets will have a margin of error of no more than 3 pounds. Again, these sample sizes refer to the numbers of children with complete data. The investigators anticipate a 20% attrition rate. In order to ensure that the total sample size of 112 is available at 8 weeks, the investigator needs to recruit more participants to allow for attrition.  

N = 112/0.80 = 140

Sample Size for Matched Samples, Continuous Outcome

In studies where the plan is to estimate the mean difference of a continuous outcome based on matched data, the formula for determining sample size is given below:

where Z is the value from the standard normal distribution reflecting the confidence level that will be used (e.g., Z = 1.96 for 95%), E is the desired margin of error, and σ d is the standard deviation of the difference scores. It is extremely important that the standard deviation of the difference scores (e.g., the difference based on measurements over time or the difference between matched pairs) is used here to appropriately estimate the sample size.    

Sample Sizes for Two Independent Samples, Dichotomous Outcome

In studies where the plan is to estimate the difference in proportions between two independent populations (i.e., to estimate the risk difference), the formula for determining the sample sizes required in each comparison group is:

where n i is the sample size required in each group (i=1,2), Z is the value from the standard normal distribution reflecting the confidence level that will be used (e.g., Z = 1.96 for 95%), and E is the desired margin of error. p 1 and p 2 are the proportions of successes in each comparison group. Again, here we are planning a study to generate a 95% confidence interval for the difference in unknown proportions, and the formula to estimate the sample sizes needed requires p 1 and p 2 . In order to estimate the sample size, we need approximate values of p 1 and p 2 . The values of p 1 and p 2 that maximize the sample size are p 1 =p 2 =0.5. Thus, if there is no information available to approximate p 1 and p 2 , then 0.5 can be used to generate the most conservative, or largest, sample sizes.    

Similar to the situation for two independent samples and a continuous outcome at the top of this page, it may be the case that data are available on the proportion of successes in one group, usually the untreated (e.g., placebo control) or unexposed group. If so, the known proportion can be used for both p 1 and p 2 in the formula shown above. The formula shown above generates sample size estimates for samples of equal size. If a study is planned where different numbers of patients will be assigned or different numbers of patients will comprise the comparison groups, then alternative formulas can be used. Interested readers can see Fleiss for more details. 4

An investigator wants to estimate the impact of smoking during pregnancy on premature delivery. Normal pregnancies last approximately 40 weeks and premature deliveries are those that occur before 37 weeks. The 2005 National Vital Statistics report indicates that approximately 12% of infants are born prematurely in the United States. 5 The investigator plans to collect data through medical record review and to generate a 95% confidence interval for the difference in proportions of infants born prematurely to women who smoked during pregnancy as compared to those who did not. How many women should be enrolled in the study to ensure that the 95% confidence interval for the difference in proportions has a margin of error of no more than 4%?

The sample sizes (i.e., numbers of women who smoked and did not smoke during pregnancy) can be computed using the formula shown above. National data suggest that 12% of infants are born prematurely. We will use that estimate for both groups in the sample size computation.

Samples of size n 1 =508 women who smoked during pregnancy and n 2 =508 women who did not smoke during pregnancy will ensure that the 95% confidence interval for the difference in proportions who deliver prematurely will have a margin of error of no more than 4%.

Is attrition an issue here? 

Issues in Estimating Sample Size for Hypothesis Testing

In the module on hypothesis testing for means and proportions, we introduced techniques for means, proportions, differences in means, and differences in proportions. While each test involved details that were specific to the outcome of interest (e.g., continuous or dichotomous) and to the number of comparison groups (one, two, more than two), there were common elements to each test. For example, in each test of hypothesis, there are two errors that can be committed. The first is called a Type I error and refers to the situation where we incorrectly reject H 0 when in fact it is true.   In the first step of any test of hypothesis, we select a level of significance, α , and α = P(Type I error) = P(Reject H 0 | H 0 is true). Because we purposely select a small value for α , we control the probability of committing a Type I error. The second type of error is called a Type II error and it is defined as the probability we do not reject H 0 when it is false. The probability of a Type II error is denoted β , and β =P(Type II error) = P(Do not Reject H 0 | H 0 is false). In hypothesis testing, we usually focus on power, which is defined as the probability that we reject H 0 when it is false, i.e., power = 1- β = P(Reject H 0 | H 0 is false). Power is the probability that a test correctly rejects a false null hypothesis. A good test is one with low probability of committing a Type I error (i.e., small α ) and high power (i.e., small β, high power).  

Here we present formulas to determine the sample size required to ensure that a test has high power. The sample size computations depend on the level of significance, aα, the desired power of the test (equivalent to 1-β), the variability of the outcome, and the effect size. The effect size is the difference in the parameter of interest that represents a clinically meaningful difference. Similar to the margin of error in confidence interval applications, the effect size is determined based on clinical or practical criteria and not statistical criteria.  

The concept of statistical power can be difficult to grasp. Before presenting the formulas to determine the sample sizes required to ensure high power in a test, we will first discuss power from a conceptual point of view.  

Suppose we want to test the following hypotheses at aα=0.05:  H 0 : μ = 90 versus H 1 : μ ≠ 90. To test the hypotheses, suppose we select a sample of size n=100. For this example, assume that the standard deviation of the outcome is σ=20. We compute the sample mean and then must decide whether the sample mean provides evidence to support the alternative hypothesis or not. This is done by computing a test statistic and comparing the test statistic to an appropriate critical value. If the null hypothesis is true (μ=90), then we are likely to select a sample whose mean is close in value to 90. However, it is also possible to select a sample whose mean is much larger or much smaller than 90. Recall from the Central Limit Theorem (see page 11 in the module on Probability), that for large n (here n=100 is sufficiently large), the distribution of the sample means is approximately normal with a mean of

If the null hypothesis is true, it is possible to observe any sample mean shown in the figure below; all are possible under H 0 : μ = 90.  

Normal distribution of X when the mean of X is 90. A bell-shaped curve with a value of X-90 at the center.

Rejection Region for Test H 0 : μ = 90 versus H 1 : μ ≠ 90 at α =0.05

Standard normal distribution showing a mean of 90. The rejection areas are in the two tails at the extremes above and below the mean. If the alpha level is 0.05, then each tail accounts for an arean of 0.025.

The areas in the two tails of the curve represent the probability of a Type I Error, α= 0.05. This concept was discussed in the module on Hypothesis Testing.  

Now, suppose that the alternative hypothesis, H 1 , is true (i.e., μ ≠ 90) and that the true mean is actually 94. The figure below shows the distributions of the sample mean under the null and alternative hypotheses.The values of the sample mean are shown along the horizontal axis.  

Two overlapping normal distributions, one depicting the null hypothesis with a mean of 90 and the other showing the alternative hypothesis with a mean of 94. A more complete explanation of the figure is provided in the text below the figure.

If the true mean is 94, then the alternative hypothesis is true. In our test, we selected α = 0.05 and reject H 0 if the observed sample mean exceeds 93.92 (focusing on the upper tail of the rejection region for now). The critical value (93.92) is indicated by the vertical line. The probability of a Type II error is denoted β, and β = P(Do not Reject H 0 | H 0 is false), i.e., the probability of not rejecting the null hypothesis if the null hypothesis were true. β is shown in the figure above as the area under the rightmost curve (H 1 ) to the left of the vertical line (where we do not reject H 0 ). Power is defined as 1- β = P(Reject H 0 | H 0 is false) and is shown in the figure as the area under the rightmost curve (H 1 ) to the right of the vertical line (where we reject H 0 ).  

Note that β and power are related to α, the variability of the outcome and the effect size. From the figure above we can see what happens to β and power if we increase α. Suppose, for example, we increase α to α=0.10.The upper critical value would be 92.56 instead of 93.92. The vertical line would shift to the left, increasing α, decreasing β and increasing power. While a better test is one with higher power, it is not advisable to increase α as a means to increase power. Nonetheless, there is a direct relationship between α and power (as α increases, so does power).

β and power are also related to the variability of the outcome and to the effect size. The effect size is the difference in the parameter of interest (e.g., μ) that represents a clinically meaningful difference. The figure above graphically displays α, β, and power when the difference in the mean under the null as compared to the alternative hypothesis is 4 units (i.e., 90 versus 94). The figure below shows the same components for the situation where the mean under the alternative hypothesis is 98.

Overlapping bell-shaped distributions - one with a mean of 90 and the other with a mean of 98

Notice that there is much higher power when there is a larger difference between the mean under H 0 as compared to H 1 (i.e., 90 versus 98). A statistical test is much more likely to reject the null hypothesis in favor of the alternative if the true mean is 98 than if the true mean is 94. Notice also in this case that there is little overlap in the distributions under the null and alternative hypotheses. If a sample mean of 97 or higher is observed it is very unlikely that it came from a distribution whose mean is 90. In the previous figure for H 0 : μ = 90 and H 1 : μ = 94, if we observed a sample mean of 93, for example, it would not be as clear as to whether it came from a distribution whose mean is 90 or one whose mean is 94.

Ensuring That a Test Has High Power

In designing studies most people consider power of 80% or 90% (just as we generally use 95% as the confidence level for confidence interval estimates). The inputs for the sample size formulas include the desired power, the level of significance and the effect size. The effect size is selected to represent a clinically meaningful or practically important difference in the parameter of interest, as we will illustrate.  

The formulas we present below produce the minimum sample size to ensure that the test of hypothesis will have a specified probability of rejecting the null hypothesis when it is false (i.e., a specified power). In planning studies, investigators again must account for attrition or loss to follow-up. The formulas shown below produce the number of participants needed with complete data, and we will illustrate how attrition is addressed in planning studies.

In studies where the plan is to perform a test of hypothesis comparing the mean of a continuous outcome variable in a single population to a known mean, the hypotheses of interest are:

H 0 : μ = μ 0 and H 1 : μ ≠ μ 0 where μ 0 is the known mean (e.g., a historical control). The formula for determining sample size to ensure that the test has a specified power is given below:

where α is the selected level of significance and Z 1-α /2 is the value from the standard normal distribution holding 1- α/2 below it. For example, if α=0.05, then 1- α/2 = 0.975 and Z=1.960. 1- β is the selected power, and Z 1-β is the value from the standard normal distribution holding 1- β below it. Sample size estimates for hypothesis testing are often based on achieving 80% or 90% power. The Z 1-β values for these popular scenarios are given below:

  • For 80% power Z 0.80 = 0.84
  • For 90% power Z 0.90 =1.282

ES is the effect size , defined as follows:

where μ 0 is the mean under H 0 , μ 1 is the mean under H 1 and σ is the standard deviation of the outcome of interest. The numerator of the effect size, the absolute value of the difference in means | μ 1 - μ 0 |, represents what is considered a clinically meaningful or practically important difference in means. Similar to the issue we faced when planning studies to estimate confidence intervals, it can sometimes be difficult to estimate the standard deviation. In sample size computations, investigators often use a value for the standard deviation from a previous study or a study performed in a different but comparable population. Regardless of how the estimate of the variability of the outcome is derived, it should always be conservative (i.e., as large as is reasonable), so that the resultant sample size will not be too small.

Example 7:  

An investigator hypothesizes that in people free of diabetes, fasting blood glucose, a risk factor for coronary heart disease, is higher in those who drink at least 2 cups of coffee per day. A cross-sectional study is planned to assess the mean fasting blood glucose levels in people who drink at least two cups of coffee per day. The mean fasting blood glucose level in people free of diabetes is reported as 95.0 mg/dL with a standard deviation of 9.8 mg/dL. 7 If the mean blood glucose level in people who drink at least 2 cups of coffee per day is 100 mg/dL, this would be important clinically. How many patients should be enrolled in the study to ensure that the power of the test is 80% to detect this difference? A two sided test will be used with a 5% level of significance.  

The effect size is computed as:

The effect size represents the meaningful difference in the population mean - here 95 versus 100, or 0.51 standard deviation units different. We now substitute the effect size and the appropriate Z values for the selected α and power to compute the sample size.

Therefore, a sample of size n=31 will ensure that a two-sided test with α =0.05 has 80% power to detect a 5 mg/dL difference in mean fasting blood glucose levels.

In the planned study, participants will be asked to fast overnight and to provide a blood sample for analysis of glucose levels. Based on prior experience, the investigators hypothesize that 10% of the participants will fail to fast or will refuse to follow the study protocol. Therefore, a total of 35 participants will be enrolled in the study to ensure that 31 are available for analysis (see below).

N (number to enroll) * (% following protocol) = desired sample size

N = 31/0.90 = 35.

Sample Size for One Sample, Dichotomous Outcome

In studies where the plan is to perform a test of hypothesis comparing the proportion of successes in a dichotomous outcome variable in a single population to a known proportion, the hypotheses of interest are:

where p 0 is the known proportion (e.g., a historical control). The formula for determining the sample size to ensure that the test has a specified power is given below:

where α is the selected level of significance and Z 1-α /2 is the value from the standard normal distribution holding 1- α/2 below it. 1- β is the selected power and   Z 1-β is the value from the standard normal distribution holding 1- β below it , and ES is the effect size, defined as follows:

where p 0 is the proportion under H 0 and p 1 is the proportion under H 1 . The numerator of the effect size, the absolute value of the difference in proportions |p 1 -p 0 |, again represents what is considered a clinically meaningful or practically important difference in proportions.  

Example 8:  

A recent report from the Framingham Heart Study indicated that 26% of people free of cardiovascular disease had elevated LDL cholesterol levels, defined as LDL > 159 mg/dL. 9 An investigator hypothesizes that a higher proportion of patients with a history of cardiovascular disease will have elevated LDL cholesterol. How many patients should be studied to ensure that the power of the test is 90% to detect a 5% difference in the proportion with elevated LDL cholesterol? A two sided test will be used with a 5% level of significance.  

We first compute the effect size: 

We now substitute the effect size and the appropriate Z values for the selected α and power to compute the sample size.

A sample of size n=869 will ensure that a two-sided test with α =0.05 has 90% power to detect a 5% difference in the proportion of patients with a history of cardiovascular disease who have an elevated LDL cholesterol level.

A medical device manufacturer produces implantable stents. During the manufacturing process, approximately 10% of the stents are deemed to be defective. The manufacturer wants to test whether the proportion of defective stents is more than 10%. If the process produces more than 15% defective stents, then corrective action must be taken. Therefore, the manufacturer wants the test to have 90% power to detect a difference in proportions of this magnitude. How many stents must be evaluated? For you computations, use a two-sided test with a 5% level of significance. (Do the computation yourself, before looking at the answer.)

In studies where the plan is to perform a test of hypothesis comparing the means of a continuous outcome variable in two independent populations, the hypotheses of interest are:

where μ 1 and μ 2 are the means in the two comparison populations. The formula for determining the sample sizes to ensure that the test has a specified power is:

where n i is the sample size required in each group (i=1,2), α is the selected level of significance and Z 1-α /2 is the value from the standard normal distribution holding 1- α /2 below it, and 1- β is the selected power and Z 1-β is the value from the standard normal distribution holding 1- β below it. ES is the effect size, defined as:

where | μ 1 - μ 2 | is the absolute value of the difference in means between the two groups expected under the alternative hypothesis, H 1 . σ is the standard deviation of the outcome of interest. Recall from the module on Hypothesis Testing that, when we performed tests of hypothesis comparing the means of two independent groups, we used Sp, the pooled estimate of the common standard deviation, as a measure of variability in the outcome.

Sp is computed as follows:

If data are available on variability of the outcome in each comparison group, then Sp can be computed and used to generate the sample sizes. However, it is more often the case that data on the variability of the outcome are available from only one group, usually the untreated (e.g., placebo control) or unexposed group. When planning a clinical trial to investigate a new drug or procedure, data are often available from other trials that may have involved a placebo or an active control group (i.e., a standard medication or treatment given for the condition under study). The standard deviation of the outcome variable measured in patients assigned to the placebo, control or unexposed group can be used to plan a future trial, as illustrated.  

 Note also that the formula shown above generates sample size estimates for samples of equal size. If a study is planned where different numbers of patients will be assigned or different numbers of patients will comprise the comparison groups, then alternative formulas can be used (see Howell 3 for more details).

An investigator is planning a clinical trial to evaluate the efficacy of a new drug designed to reduce systolic blood pressure. The plan is to enroll participants and to randomly assign them to receive either the new drug or a placebo. Systolic blood pressures will be measured in each participant after 12 weeks on the assigned treatment. Based on prior experience with similar trials, the investigator expects that 10% of all participants will be lost to follow up or will drop out of the study. If the new drug shows a 5 unit reduction in mean systolic blood pressure, this would represent a clinically meaningful reduction. How many patients should be enrolled in the trial to ensure that the power of the test is 80% to detect this difference? A two sided test will be used with a 5% level of significance.  

In order to compute the effect size, an estimate of the variability in systolic blood pressures is needed. Analysis of data from the Framingham Heart Study showed that the standard deviation of systolic blood pressure was 19.0. This value can be used to plan the trial.  

The effect size is:

Samples of size n 1 =232 and n 2 = 232 will ensure that the test of hypothesis will have 80% power to detect a 5 unit difference in mean systolic blood pressures in patients receiving the new drug as compared to patients receiving the placebo. However, the investigators hypothesized a 10% attrition rate (in both groups), and to ensure a total sample size of 232 they need to allow for attrition.  

N = 232/0.90 = 258.

The investigator must enroll 258 participants to be randomly assigned to receive either the new drug or placebo.

An investigator is planning a study to assess the association between alcohol consumption and grade point average among college seniors. The plan is to categorize students as heavy drinkers or not using 5 or more drinks on a typical drinking day as the criterion for heavy drinking. Mean grade point averages will be compared between students classified as heavy drinkers versus not using a two independent samples test of means. The standard deviation in grade point averages is assumed to be 0.42 and a meaningful difference in grade point averages (relative to drinking status) is 0.25 units. How many college seniors should be enrolled in the study to ensure that the power of the test is 80% to detect a 0.25 unit difference in mean grade point averages? Use a two-sided test with a 5% level of significance.  

Answer  

In studies where the plan is to perform a test of hypothesis on the mean difference in a continuous outcome variable based on matched data, the hypotheses of interest are:

where μ d is the mean difference in the population. The formula for determining the sample size to ensure that the test has a specified power is given below:

where α is the selected level of significance and Z 1-α/2 is the value from the standard normal distribution holding 1- α/2 below it, 1- β is the selected power and Z 1-β is the value from the standard normal distribution holding 1- β below it and ES is the effect size, defined as follows:

where μ d is the mean difference expected under the alternative hypothesis, H 1 , and σ d is the standard deviation of the difference in the outcome (e.g., the difference based on measurements over time or the difference between matched pairs).    

   

Example 10:

An investigator wants to evaluate the efficacy of an acupuncture treatment for reducing pain in patients with chronic migraine headaches. The plan is to enroll patients who suffer from migraine headaches. Each will be asked to rate the severity of the pain they experience with their next migraine before any treatment is administered. Pain will be recorded on a scale of 1-100 with higher scores indicative of more severe pain. Each patient will then undergo the acupuncture treatment. On their next migraine (post-treatment), each patient will again be asked to rate the severity of the pain. The difference in pain will be computed for each patient. A two sided test of hypothesis will be conducted, at α =0.05, to assess whether there is a statistically significant difference in pain scores before and after treatment. How many patients should be involved in the study to ensure that the test has 80% power to detect a difference of 10 units on the pain scale? Assume that the standard deviation in the difference scores is approximately 20 units.    

First compute the effect size:

Then substitute the effect size and the appropriate Z values for the selected α and power to compute the sample size.

A sample of size n=32 patients with migraine will ensure that a two-sided test with α =0.05 has 80% power to detect a mean difference of 10 points in pain before and after treatment, assuming that all 32 patients complete the treatment.

Sample Sizes for Two Independent Samples, Dichotomous Outcomes

In studies where the plan is to perform a test of hypothesis comparing the proportions of successes in two independent populations, the hypotheses of interest are:

H 0 : p 1 = p 2 versus H 1 : p 1 ≠ p 2

where p 1 and p 2 are the proportions in the two comparison populations. The formula for determining the sample sizes to ensure that the test has a specified power is given below:

where n i is the sample size required in each group (i=1,2), α is the selected level of significance and Z 1-α/2 is the value from the standard normal distribution holding 1- α/2 below it, and 1- β is the selected power and Z 1-β is the value from the standard normal distribution holding 1- β below it. ES is the effect size, defined as follows: 

where |p 1 - p 2 | is the absolute value of the difference in proportions between the two groups expected under the alternative hypothesis, H 1 , and p is the overall proportion, based on pooling the data from the two comparison groups (p can be computed by taking the mean of the proportions in the two comparison groups, assuming that the groups will be of approximately equal size).  

Example 11: 

An investigator hypothesizes that there is a higher incidence of flu among students who use their athletic facility regularly than their counterparts who do not. The study will be conducted in the spring. Each student will be asked if they used the athletic facility regularly over the past 6 months and whether or not they had the flu. A test of hypothesis will be conducted to compare the proportion of students who used the athletic facility regularly and got flu with the proportion of students who did not and got flu. During a typical year, approximately 35% of the students experience flu. The investigators feel that a 30% increase in flu among those who used the athletic facility regularly would be clinically meaningful. How many students should be enrolled in the study to ensure that the power of the test is 80% to detect this difference in the proportions? A two sided test will be used with a 5% level of significance.  

We first compute the effect size by substituting the proportions of students in each group who are expected to develop flu, p 1 =0.46 (i.e., 0.35*1.30=0.46) and p 2 =0.35 and the overall proportion, p=0.41 (i.e., (0.46+0.35)/2):

We now substitute the effect size and the appropriate Z values for the selected α and power to compute the sample size.  

Samples of size n 1 =324 and n 2 =324 will ensure that the test of hypothesis will have 80% power to detect a 30% difference in the proportions of students who develop flu between those who do and do not use the athletic facilities regularly.

Donor Feces? Really? Clostridium difficile (also referred to as "C. difficile" or "C. diff.") is a bacterial species that can be found in the colon of humans, although its numbers are kept in check by other normal flora in the colon. Antibiotic therapy sometimes diminishes the normal flora in the colon to the point that C. difficile flourishes and causes infection with symptoms ranging from diarrhea to life-threatening inflammation of the colon. Illness from C. difficile most commonly affects older adults in hospitals or in long term care facilities and typically occurs after use of antibiotic medications. In recent years, C. difficile infections have become more frequent, more severe and more difficult to treat. Ironically, C. difficile is first treated by discontinuing antibiotics, if they are still being prescribed. If that is unsuccessful, the infection has been treated by switching to another antibiotic. However, treatment with another antibiotic frequently does not cure the C. difficile infection. There have been sporadic reports of successful treatment by infusing feces from healthy donors into the duodenum of patients suffering from C. difficile. (Yuk!) This re-establishes the normal microbiota in the colon, and counteracts the overgrowth of C. diff. The efficacy of this approach was tested in a randomized clinical trial reported in the New England Journal of Medicine (Jan. 2013). The investigators planned to randomly assign patients with recurrent C. difficile infection to either antibiotic therapy or to duodenal infusion of donor feces. In order to estimate the sample size that would be needed, the investigators assumed that the feces infusion would be successful 90% of the time, and antibiotic therapy would be successful in 60% of cases. How many subjects will be needed in each group to ensure that the power of the study is 80% with a level of significance α = 0.05?

Determining the appropriate design of a study is more important than the statistical analysis; a poorly designed study can never be salvaged, whereas a poorly analyzed study can be re-analyzed. A critical component in study design is the determination of the appropriate sample size. The sample size must be large enough to adequately answer the research question, yet not too large so as to involve too many patients when fewer would have sufficed. The determination of the appropriate sample size involves statistical criteria as well as clinical or practical considerations. Sample size determination involves teamwork; biostatisticians must work closely with clinical investigators to determine the sample size that will address the research question of interest with adequate precision or power to produce results that are clinically meaningful.

The following table summarizes the sample size formulas for each scenario described here. The formulas are organized by the proposed analysis, a confidence interval estimate or a test of hypothesis.

  • Buschman NA, Foster G, Vickers P. Adolescent girls and their babies: achieving optimal birth weight. Gestational weight gain and pregnancy outcome in terms of gestation at delivery and infant birth weight: a comparison between adolescents under 16 and adult women. Child: Care, Health and Development. 2001; 27(2):163-171.
  • Feuer EJ, Wun LM. DEVCAN: Probability of Developing or Dying of Cancer. Version 4.0 .Bethesda, MD: National Cancer Institute, 1999.
  • Howell DC. Statistical Methods for Psychology. Boston, MA: Duxbury Press, 1982.
  • Fleiss JL. Statistical Methods for Rates and Proportions. New York, NY: John Wiley and Sons, Inc.,1981.
  • National Center for Health Statistics. Health, United States, 2005 with Chartbook on Trends in the Health of Americans. Hyattsville, MD : US Government Printing Office; 2005.  
  • Plaskon LA, Penson DF, Vaughan TL, Stanford JL. Cigarette smoking and risk of prostate cancer in middle-aged men. Cancer Epidemiology Biomarkers & Prevention. 2003; 12: 604-609.
  • Rutter MK, Meigs JB, Sullivan LM, D'Agostino RB, Wilson PW. C-reactive protein, the metabolic syndrome and prediction of cardiovascular events in the Framingham Offspring Study. Circulation. 2004;110: 380-385.
  • Ramachandran V, Sullivan LM, Wilson PW, Sempos CT, Sundstrom J, Kannel WB, Levy D, D'Agostino RB. Relative importance of borderline and elevated levels of coronary heart disease risk factors. Annals of Internal Medicine. 2005; 142: 393-402.
  • Wechsler H, Lee JE, Kuo M, Lee H. College Binge Drinking in the 1990s:A Continuing Problem Results of the Harvard School of Public Health 1999 College Health, 2000; 48: 199-210.

Answers to Selected Problems

Answer to birth weight question - page 3.

An investigator wants to estimate the mean birth weight of infants born full term (approximately 40 weeks gestation) to mothers who are 19 years of age and under. The mean birth weight of infants born full-term to mothers 20 years of age and older is 3,510 grams with a standard deviation of 385 grams. How many women 19 years of age and under must be enrolled in the study to ensure that a 95% confidence interval estimate of the mean birth weight of their infants has a margin of error not exceeding 100 grams?

In order to ensure that the 95% confidence interval estimate of the mean birthweight is within 100 grams of the true mean, a sample of size 57 is needed. In planning the study, the investigator must consider the fact that some women may deliver prematurely. If women are enrolled into the study during pregnancy, then more than 57 women will need to be enrolled so that after excluding those who deliver prematurely, 57 with outcome information will be available for analysis. For example, if 5% of the women are expected to delivery prematurely (i.e., 95% will deliver full term), then 60 women must be enrolled to ensure that 57 deliver full term. The number of women that must be enrolled, N, is computed as follows:

                                                        N (number to enroll) * (% retained) = desired sample size

                                                        N (0.95) = 57

                                                        N = 57/0.95 = 60.

 Answer Freshmen Smoking - Page 4

In order to ensure that the 95% confidence interval estimate of the proportion of freshmen who smoke is within 5% of the true proportion, a sample of size 303 is needed. Notice that this sample size is substantially smaller than the one estimated above. Having some information on the magnitude of the proportion in the population will always produce a sample size that is less than or equal to the one based on a population proportion of 0.5. However, the estimate must be realistic.

Answer to Medical Device Problem - Page 7

A medical device manufacturer produces implantable stents. During the manufacturing process, approximately 10% of the stents are deemed to be defective. The manufacturer wants to test whether the proportion of defective stents is more than 10%. If the process produces more than 15% defective stents, then corrective action must be taken. Therefore, the manufacturer wants the test to have 90% power to detect a difference in proportions of this magnitude. How many stents must be evaluated? For you computations, use a two-sided test with a 5% level of significance.

Then substitute the effect size and the appropriate z values for the selected alpha and power to comute the sample size.

A sample size of 364 stents will ensure that a two-sided test with α=0.05 has 90% power to detect a 0.05, or 5%, difference in jthe proportion of defective stents produced.

Answer to Alcohol and GPA - Page 8

An investigator is planning a study to assess the association between alcohol consumption and grade point average among college seniors. The plan is to categorize students as heavy drinkers or not using 5 or more drinks on a typical drinking day as the criterion for heavy drinking. Mean grade point averages will be compared between students classified as heavy drinkers versus not using a two independent samples test of means. The standard deviation in grade point averages is assumed to be 0.42 and a meaningful difference in grade point averages (relative to drinking status) is 0.25 units. How many college seniors should be enrolled in the study to ensure that the power of the test is 80% to detect a 0.25 unit difference in mean grade point averages? Use a two-sided test with a 5% level of significance.

First compute the effect size.

Now substitute the effect size and the appropriate z values for alpha and power to compute the sample size.

Sample sizes of n i =44 heavy drinkers and 44 who drink few fewer than five drinks per typical drinking day will ensure that the test of hypothesis has 80% power to detect a 0.25 unit difference in mean grade point averages.

Answer to Donor Feces - Page 8

We first compute the effect size by substituting the proportions of patients expected to be cured with each treatment, p 1 =0.6 and p 2 =0.9, and the overall proportion, p=0.75:

We now substitute the effect size and the appropriate Z values for the selected a and power to compute the sample size.

Samples of size n 1 =33 and n 2 =33 will ensure that the test of hypothesis will have 80% power to detect this difference in the proportions of patients who are cured of C. diff. by feces infusion versus antibiotic therapy.

In fact, the investigators enrolled 38 into each group to allow for attrition. Nevertheless, the study was stopped after an interim analysis. Of 16 patients in the infusion group, 13 (81%) had resolution of C. difficile–associated diarrhea after the first infusion. The 3 remaining patients received a second infusion with feces from a different donor, with resolution in 2 patients. Resolution of C. difficile infection occurred in only 4 of 13 patients (31%) receiving the antibiotic vancomycin.

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

5.5 - hypothesis testing for two-sample proportions.

We are now going to develop the hypothesis test for the difference of two proportions for independent samples. The hypothesis test follows the same steps as one group.

These notes are going to go into a little bit of math and formulas to help demonstrate the logic behind hypothesis testing for two groups. If this starts to get a little confusion, just skim over it for a general understanding! Remember we can rely on the software to do the calculations for us, but it is good to have a basic understanding of the logic!

We will use the sampling distribution of \(\hat{p}_1-\hat{p}_2\) as we did for the confidence interval.

For a test for two proportions, we are interested in the difference between two groups. If the difference is zero, then they are not different (i.e., they are equal). Therefore, the null hypothesis will always be:

\(H_0\colon p_1-p_2=0\)

Another way to look at it is \(H_0\colon p_1=p_2\). This is worth stopping to think about. Remember, in hypothesis testing, we assume the null hypothesis is true. In this case, it means that \(p_1\) and \(p_2\) are equal. Under this assumption, then \(\hat{p}_1\) and \(\hat{p}_2\) are both estimating the same proportion. Think of this proportion as \(p^*\).

Therefore, the sampling distribution of both proportions, \(\hat{p}_1\) and \(\hat{p}_2\), will, under certain conditions, be approximately normal centered around \(p^*\), with standard error \(\sqrt{\dfrac{p^*(1-p^*)}{n_i}}\), for \(i=1, 2\).

We take this into account by finding an estimate for this \(p^*\) using the two-sample proportions. We can calculate an estimate of \(p^*\) using the following formula:

\(\hat{p}^*=\dfrac{x_1+x_2}{n_1+n_2}\)

This value is the total number in the desired categories \((x_1+x_2)\) from both samples over the total number of sampling units in the combined sample \((n_1+n_2)\).

Putting everything together, if we assume \(p_1=p_2\), then the sampling distribution of \(\hat{p}_1-\hat{p}_2\) will be approximately normal with mean 0 and standard error of \(\sqrt{p^*(1-p^*)\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}\), under certain conditions.

\(z^*=\dfrac{(\hat{p}_1-\hat{p}_2)-0}{\sqrt{\hat{p}^*(1-\hat{p}^*)\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)}}\)

...will follow a standard normal distribution.

Finally, we can develop our hypothesis test for \(p_1-p_2\).

Hypothesis Testing for Two-Sample Proportions

Conditions :

\(n_1\hat{p}_1\), \(n_1(1-\hat{p}_1)\), \(n_2\hat{p}_2\), and \(n_2(1-\hat{p}_2)\) are all greater than five

Test Statistic:

\(z^*=\dfrac{\hat{p}_1-\hat{p}_2-0}{\sqrt{\hat{p}^*(1-\hat{p}^*)\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)}}\)

...where \(\hat{p}^*=\dfrac{x_1+x_2}{n_1+n_2}\).

The critical values, p-values, and decisions will all follow the same steps as those from a hypothesis test for a one-sample proportion.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

10: Hypothesis Testing with Two Samples

  • Last updated
  • Save as PDF
  • Page ID 23470

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

You have learned to conduct hypothesis tests on single means and single proportions. You will expand upon that in this chapter. You will compare two means or two proportions to each other. The general procedure is still the same, just expanded. To compare two means or two proportions, you work with two groups. The groups are classified either as independent or matched pairs. Independent groups consist of two samples that are independent, that is, sample values selected from one population are not related in any way to sample values selected from the other population. Matched pairs consist of two samples that are dependent. The parameter tested using matched pairs is the population mean. The parameters tested using independent groups are either population means or population proportions.

  • 10.0: Prelude to Hypothesis Testing with Two Samples This chapter deals with the following hypothesis tests: Independent groups (samples are independent) Test of two population means. Test of two population proportions. Matched or paired samples (samples are dependent) Test of the two population proportions by testing one population mean of differences.
  • 10.1: Two Population Means with Unknown Standard Deviations The comparison of two population means is very common. A difference between the two samples depends on both the means and the standard deviations. Very different means can occur by chance if there is great variation among the individual samples.
  • 10.2: Two Population Means with Known Standard Deviations Even though this situation is not likely (knowing the population standard deviations is not likely), the following example illustrates hypothesis testing for independent means, known population standard deviations.
  • 10.3: Comparing Two Independent Population Proportions Comparing two proportions, like comparing two means, is common. If two estimated proportions are different, it may be due to a difference in the populations or it may be due to chance. A hypothesis test can help determine if a difference in the estimated proportions reflects a difference in the population proportions.
  • 10.4: Matched or Paired Samples When using a hypothesis test for matched or paired samples, the following characteristics should be present: Simple random sampling is used. Sample sizes are often small. Two measurements (samples) are drawn from the same pair of individuals or objects. Differences are calculated from the matched or paired samples. The differences form the sample that is used for the hypothesis test. Either the matched pairs have differences that come from a population that is normal or the number of difference
  • 10.5: Hypothesis Testing for Two Means and Two Proportions (Worksheet) A statistics Worksheet: The student will select the appropriate distributions to use in each case. The student will conduct hypothesis tests and interpret the results.
  • 10.E: Hypothesis Testing with Two Samples (Exercises) These are homework exercises to accompany the Textmap created for "Introductory Statistics" by OpenStax.

Contributors and Attributions

Barbara Illowsky and Susan Dean (De Anza College) with many other contributing authors. Content produced by OpenStax College is licensed under a Creative Commons Attribution License 4.0 license. Download for free at http://cnx.org/contents/[email protected] .

IMAGES

  1. Hypothesis t-test: Two Independent Samples

    hypothesis test two sample sizes

  2. Two Sample Z Hypothesis Test

    hypothesis test two sample sizes

  3. Ch8: Hypothesis Testing (2 Samples)

    hypothesis test two sample sizes

  4. Two Sample t Test (Independent Samples)

    hypothesis test two sample sizes

  5. Two Sample T Hypothesis Tests

    hypothesis test two sample sizes

  6. Hypothesis Testing Example Two Sample t-Test

    hypothesis test two sample sizes

VIDEO

  1. Hypothesis Test

  2. Hypothesis Test

  3. Hypothesis Test with Two Means: Independent Samples

  4. Hypothesis test(Two sample propotions) using Excel || Ft.Nirmal Bajacharya

  5. Hypothesis Test Two Population Means Using Statcrunch Example 1

  6. Hypothesis Testing for Mean: p-value is more than the level of significance (Hat Size Example)

COMMENTS

  1. 10: Hypothesis Testing with Two Samples

    10.5: Matched or Paired Samples When using a hypothesis test for matched or paired samples, the following characteristics should be present: Simple random sampling is used. Sample sizes are often small. Two measurements (samples) are drawn from the same pair of individuals or objects. Differences are calculated from the matched or paired samples.

  2. Two Sample t-test: Definition, Formula, and Example

    where x 1 and x 2 are the sample means, n 1 and n 2 are the sample sizes, and where s p is calculated as: ... 0.05, and 0.01) then you can reject the null hypothesis. Two Sample t-test: Assumptions. For the results of a two sample t-test to be valid, the following assumptions should be met:

  3. 25.3

    Let's take a look at two examples that illustrate the kind of sample size calculation we can make to ensure our hypothesis test has sufficient power. Example 25-4 Section Let \(X\) denote the crop yield of corn measured in the number of bushels per acre.

  4. t-test Calculator

    Choose the two-sample t-test to check if the difference between the means of two populations is equal to some pre-determined value when the two samples have been chosen independently of each other. ... μ 0 \mu_0 μ 0 — Mean postulated in the null hypothesis; n n n — Sample size; x ...

  5. Hypothesis Testing: 2 Means (Independent Samples)

    Since we are being asked for convincing statistical evidence, a hypothesis test should be conducted. In this case, we are dealing with averages from two samples or groups (the home run distances), so we will conduct a Test of 2 Means. n1 = 70 n 1 = 70 is the sample size for the first group. n2 = 66 n 2 = 66 is the sample size for the second group.

  6. PDF Two-Sample Problems

    population means, the size of the two samples must be taken into account. This is because the t distribution is used to make these inferences. There are two ways of computing degrees of freedom: 1) Use technology. 2) Choose the smaller of: df 1 and df 2 Subtract 1 from each sample size.

  7. Hypothesis Testing

    Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test. Step 4: Decide whether to reject or fail to reject your null hypothesis. Step 5: Present your findings. Other interesting articles. Frequently asked questions about hypothesis testing.

  8. Two-sample hypothesis testing

    In statistical hypothesis testing, a two-sample test is a test performed on the data of two random samples, each independently obtained from a different given population. The purpose of the test is to determine whether the difference between these two populations is statistically significant . There are a large number of statistical tests that ...

  9. Two-sample t test for difference of means

    And let's assume that we are working with a significance level of 0.05. So pause the video, and conduct the two sample T test here, to see whether there's evidence that the sizes of tomato plants differ between the fields. Alright, now let's work through this together. So like always, let's first construct our null hypothesis.

  10. Hypothesis Testing: Two Samples

    When performing a hypothesis test comparing matched or paired samples, the following points hold true: Simple random sampling is used. Sample sizes are often small. Two measurements (samples) are drawn from the same pair of individuals or objects. Differences are calculated from the matched or paired samples.

  11. Power and Sample Size Determination

    Sample size estimates for hypothesis testing are often based on achieving 80% or 90% power. ... Sample Sizes for Two Independent Samples, Continuous Outcome. In studies where the plan is to perform a test of hypothesis comparing the means of a continuous outcome variable in two independent populations, the hypotheses of interest are: ...

  12. Two Sample Z-Test: Definition, Formula, and Example

    This tutorial provides an introduction to the two sample z-test, including a definition, formula, and example. Statology. ... and 0.01) then you can reject the null hypothesis. Two Sample Z-Test: Assumptions. For the results of a two sample z-test to be valid, the following assumptions should be met: ... (sample 1 size) = 20; x 2 (sample 2 mean ...

  13. 5.5

    For a test for two proportions, we are interested in the difference between two groups. If the difference is zero, then they are not different (i.e., they are equal). Therefore, the null hypothesis will always be: H 0: p 1 − p 2 = 0. Another way to look at it is H 0: p 1 = p 2.

  14. Two Sample t-test Calculator

    A two sample t-test is used to test whether or not the means of two populations are equal. ... (sample 1 size) x 2 (sample 2 mean) s 2 (sample 2 standard deviation) n 2 (sample 2 size) t = -1.608761. ... I noticed that in using the calculators for hypothesis testing, the default level of significance is 0.05 and you cannot change it. ...

  15. Hypothesis Testing Explained (How I Wish It Was Explained to Me)

    Step 4. Computing the sample size. We want to perform a classical A/B test, i.e. splitting users into two equally sized groups such that: control group → existing landing page; treatment group → new landing page. How many users should we put into the control and treatment group to achieve the target confusion matrix we have set previously?

  16. PDF Tutorial 3: Power and Sample Size for the Two-sample t-test with Equal

    of power and sample size estimation for the two independent-sample case with unequal variances. Overview of Power Analysis and Sample Size Estimation . A hypothesis is a claim or statement about one or more population parameters, e.g. a mean or a proportion. A hypothesis test is a statistical method of using data to quantify

  17. 10: Hypothesis Testing with Two Samples

    10.4: Matched or Paired Samples When using a hypothesis test for matched or paired samples, the following characteristics should be present: Simple random sampling is used. Sample sizes are often small. Two measurements (samples) are drawn from the same pair of individuals or objects. Differences are calculated from the matched or paired samples.

  18. How to Calculate Sample Size Needed for Power

    Statistical power and sample size analysis provides both numeric and graphical results, as shown below. The text output indicates that we need 15 samples per group (total of 30) to have a 90% chance of detecting a difference of 5 units. The dot on the Power Curve corresponds to the information in the text output.

  19. Hypothesis Test: Difference in Means

    The first step is to state the null hypothesis and an alternative hypothesis. Null hypothesis: μ 1 - μ 2 = 0. Alternative hypothesis: μ 1 - μ 2 ≠ 0. Note that these hypotheses constitute a two-tailed test. The null hypothesis will be rejected if the difference between sample means is too big or if it is too small.

  20. Hypothesis Test: Difference in Proportions

    where p 1 is the sample proportion in sample 1, where p 2 is the sample proportion in sample 2, n 1 is the size of sample 1, and n 2 is the size of sample 2. Since we have a two-tailed test, the P-value is the probability that the z-score is less than -2.13 or greater than 2.13.

  21. Sample size determination

    Toggle Required sample sizes for hypothesis tests subsection. 3.1 Tables. 3.2 Mead's resource equation. 3.3 Cumulative distribution function. 4 Stratified sample size. 5 Qualitative research. 6 See ... The table shown on the right can be used in a two-sample t-test to estimate the sample sizes of an experimental group and a control group that ...

  22. Kolmogorov-Smirnov test

    and in general by = ⁡ (),so that the condition reads , > ⁡ +. Here, again, the larger the sample sizes, the more sensitive the minimal bound: For a given ratio of sample sizes (e.g. =), the minimal bound scales in the size of either of the samples according to its inverse square root. Note that the two-sample test checks whether the two data samples come from the same distribution.

  23. Bayesian hypothesis testing for equality of high-dimensional means

    Abstract. The classical Hotelling's T 2 test and Bayesian hypothesis tests breakdown for the problem of comparing two high-dimensional population means due to the singularity of the pooled sample covariance matrices when the model dimension p exceeds the sample size n.In this paper, we develop a simple closed-form Bayesian testing procedure based on a split-and-merge technique.

  24. Determining Sample Size for BI Hypothesis Tests

    How do you determine the sample size needed for your hypothesis test? Powered by AI and the LinkedIn community. 1. Define Goals Be the first to add your personal experience. 2. ...

  25. How to Perform a t-test with Unequal Sample Sizes

    #perform independent samples t-test t. test (program1, program2, var. equal = TRUE) Two Sample t-test data: program1 and program2 t = -0.5988, df = 518, p-value = 0.5496 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -14.52474 7.73875 sample estimates: mean of x mean of y 80.5661 83.9591 # ...

  26. One sample hypothesis test for a population mean copy

    Assumption for a z-test: for a population mean is that the sample mean is drawn from a normal distribution Testing a null hypothesis To test a null hypothesis for a population mean, we compare the sample value, with the corresponding null value E.g., the sample mean in a question was 195 but we want to see if the company sells an average of 200 ...

  27. A systematic review of null hypothesis significance testing, sample

    Following recent calls to examine the replicability of behavioral research, I examined two key determinants of replicability, namely sample sizes and statistical power, in research using the Implicit Relational Assessment Procedure (IRAP). A systematic review was used to gather all published studies employing the IRAP and extract their designs and sample sizes. The use of Null Hypothesis ...

  28. Power of a test

    Increasing sample size is often the easiest way to boost the statistical power of a test. How increased sample size translates to higher power is a measure of the efficiency of the test - for example, ... (α = 0.05, two-tail) to reject the null hypothesis of zero correlation. However, in doing this study we are probably more interested in ...