Men
Women
You can clearly see some overlap in the body fat measurements for the men and women in our sample, but also some differences. Just by looking at the data, it's hard to draw any solid conclusions about whether the underlying populations of men and women at the gym have the same mean body fat. That is the value of statistical tests – they provide a common, statistically valid way to make decisions, so that everyone makes the same decision on the same set of data values.
Let’s start by answering: Is the two-sample t -test an appropriate method to evaluate the difference in body fat between men and women?
Before jumping into analysis, we should always take a quick look at the data. The figure below shows histograms and summary statistics for the men and women.
The two histograms are on the same scale. From a quick look, we can see that there are no very unusual points, or outliers . The data look roughly bell-shaped, so our initial idea of a normal distribution seems reasonable.
Examining the summary statistics, we see that the standard deviations are similar. This supports the idea of equal variances. We can also check this using a test for variances.
Based on these observations, the two-sample t -test appears to be an appropriate method to test for a difference in means.
For each group, we need the average, standard deviation and sample size. These are shown in the table below.
Women | 10 | 22.29 | 5.32 |
Men | 13 | 14.95 | 6.84 |
Without doing any testing, we can see that the averages for men and women in our samples are not the same. But how different are they? Are the averages “close enough” for us to conclude that mean body fat is the same for the larger population of men and women at the gym? Or are the averages too different for us to make this conclusion?
We'll further explain the principles underlying the two sample t -test in the statistical details section below, but let's first proceed through the steps from beginning to end. We start by calculating our test statistic. This calculation begins with finding the difference between the two averages:
$ 22.29 - 14.95 = 7.34 $
This difference in our samples estimates the difference between the population means for the two groups.
Next, we calculate the pooled standard deviation. This builds a combined estimate of the overall standard deviation. The estimate adjusts for different group sizes. First, we calculate the pooled variance:
$ s_p^2 = \frac{((n_1 - 1)s_1^2) + ((n_2 - 1)s_2^2)} {n_1 + n_2 - 2} $
$ s_p^2 = \frac{((10 - 1)5.32^2) + ((13 - 1)6.84^2)}{(10 + 13 - 2)} $
$ = \frac{(9\times28.30) + (12\times46.82)}{21} $
$ = \frac{(254.7 + 561.85)}{21} $
$ =\frac{816.55}{21} = 38.88 $
Next, we take the square root of the pooled variance to get the pooled standard deviation. This is:
$ \sqrt{38.88} = 6.24 $
We now have all the pieces for our test statistic. We have the difference of the averages, the pooled standard deviation and the sample sizes. We calculate our test statistic as follows:
$ t = \frac{\text{difference of group averages}}{\text{standard error of difference}} = \frac{7.34}{(6.24\times \sqrt{(1/10 + 1/13)})} = \frac{7.34}{2.62} = 2.80 $
To evaluate the difference between the means in order to make a decision about our gym programs, we compare the test statistic to a theoretical value from the t- distribution. This activity involves four steps:
Let’s look at the body fat data and the two-sample t -test using statistical terms.
Our null hypothesis is that the underlying population means are the same. The null hypothesis is written as:
$ H_o: \mathrm{\mu_1} =\mathrm{\mu_2} $
The alternative hypothesis is that the means are not equal. This is written as:
$ H_o: \mathrm{\mu_1} \neq \mathrm{\mu_2} $
We calculate the average for each group, and then calculate the difference between the two averages. This is written as:
$\overline{x_1} - \overline{x_2} $
We calculate the pooled standard deviation. This assumes that the underlying population variances are equal. The pooled variance formula is written as:
The formula shows the sample size for the first group as n 1 and the second group as n 2 . The standard deviations for the two groups are s 1 and s 2 . This estimate allows the two groups to have different numbers of observations. The pooled standard deviation is the square root of the variance and is written as s p .
What if your sample sizes for the two groups are the same? In this situation, the pooled estimate of variance is simply the average of the variances for the two groups:
$ s_p^2 = \frac{(s_1^2 + s_2^2)}{2} $
The test statistic is calculated as:
$ t = \frac{(\overline{x_1} -\overline{x_2})}{s_p\sqrt{1/n_1 + 1/n_2}} $
The numerator of the test statistic is the difference between the two group averages. It estimates the difference between the two unknown population means. The denominator is an estimate of the standard error of the difference between the two unknown population means.
Technical Detail: For a single mean, the standard error is $ s/\sqrt{n} $ . The formula above extends this idea to two groups that use a pooled estimate for s (standard deviation), and that can have different group sizes.
We then compare the test statistic to a t value with our chosen alpha value and the degrees of freedom for our data. Using the body fat data as an example, we set α = 0.05. The degrees of freedom ( df ) are based on the group sizes and are calculated as:
$ df = n_1 + n_2 - 2 = 10 + 13 - 2 = 21 $
The formula shows the sample size for the first group as n 1 and the second group as n 2 . Statisticians write the t value with α = 0.05 and 21 degrees of freedom as:
$ t_{0.05,21} $
The t value with α = 0.05 and 21 degrees of freedom is 2.080. There are two possible results from our comparison:
When the variances for the two groups are not equal, we cannot use the pooled estimate of standard deviation. Instead, we take the standard error for each group separately. The test statistic is:
$ t = \frac{ (\overline{x_1} - \overline{x_2})}{\sqrt{s_1^2/n_1 + s_2^2/n_2}} $
The numerator of the test statistic is the same. It is the difference between the averages of the two groups. The denominator is an estimate of the overall standard error of the difference between means. It is based on the separate standard error for each group.
The degrees of freedom calculation for the t value is more complex with unequal variances than equal variances and is usually left up to statistical software packages. The key point to remember is that if you cannot use the pooled estimate of standard deviation, then you cannot use the simple formula for the degrees of freedom.
The normality assumption is more important when the two groups have small sample sizes than for larger sample sizes.
Normal distributions are symmetric, which means they are “even” on both sides of the center. Normal distributions do not have extreme values, or outliers. You can check these two features of a normal distribution with graphs. Earlier, we decided that the body fat data was “close enough” to normal to go ahead with the assumption of normality. The figure below shows a normal quantile plot for men and women, and supports our decision.
You can also perform a formal test for normality using software. The figure above shows results of testing for normality with JMP software. We test each group separately. Both the test for men and the test for women show that we cannot reject the hypothesis of a normal distribution. We can go ahead with the assumption that the body fat data for men and for women are normally distributed.
Testing for unequal variances is complex. We won’t show the calculations in detail, but will show the results from JMP software. The figure below shows results of a test for unequal variances for the body fat data.
Without diving into details of the different types of tests for unequal variances, we will use the F test. Before testing, we decide to accept a 10% risk of concluding the variances are equal when they are not. This means we have set α = 0.10.
Like most statistical software, JMP shows the p -value for a test. This is the likelihood of finding a more extreme value for the test statistic than the one observed. It’s difficult to calculate by hand. For the figure above, with the F test statistic of 1.654, the p- value is 0.4561. This is larger than our α value: 0.4561 > 0.10. We fail to reject the hypothesis of equal variances. In practical terms, we can go ahead with the two-sample t -test with the assumption of equal variances for the two groups.
Using a visual, you can check to see if your test statistic is a more extreme value in the distribution. The figure below shows a t- distribution with 21 degrees of freedom.
Since our test is two-sided and we have set α = .05, the figure shows that the value of 2.080 “cuts off” 2.5% of the data in each of the two tails. Only 5% of the data overall is further out in the tails than 2.080. Because our test statistic of 2.80 is beyond the cut-off point, we reject the null hypothesis of equal means.
The figure below shows results for the two-sample t -test for the body fat data from JMP software.
The results for the two-sample t -test that assumes equal variances are the same as our calculations earlier. The test statistic is 2.79996. The software shows results for a two-sided test and for one-sided tests. The two-sided test is what we want (Prob > |t|). Our null hypothesis is that the mean body fat for men and women is equal. Our alternative hypothesis is that the mean body fat is not equal. The one-sided tests are for one-sided alternative hypotheses – for example, for a null hypothesis that mean body fat for men is less than that for women.
We can reject the hypothesis of equal mean body fat for the two groups and conclude that we have evidence body fat differs in the population between men and women. The software shows a p -value of 0.0107. We decided on a 5% risk of concluding the mean body fat for men and women are different, when they are not. It is important to make this decision before doing the statistical test.
The figure also shows the results for the t- test that does not assume equal variances. This test does not use the pooled estimate of the standard deviation. As was mentioned above, this test also has a complex formula for degrees of freedom. You can see that the degrees of freedom are 20.9888. The software shows a p- value of 0.0086. Again, with our decision of a 5% risk, we can reject the null hypothesis of equal mean body fat for men and women.
If you have more than two independent groups, you cannot use the two-sample t- test. You should use a multiple comparison method. ANOVA, or analysis of variance, is one such method. Other multiple comparison methods include the Tukey-Kramer test of all pairwise differences, analysis of means (ANOM) to compare group means to the overall mean or Dunnett’s test to compare each group mean to a control mean.
If your sample size is very small, it might be hard to test for normality. In this situation, you might need to use your understanding of the measurements. For example, for the body fat data, the trainer knows that the underlying distribution of body fat is normally distributed. Even for a very small sample, the trainer would likely go ahead with the t -test and assume normality.
What if you know the underlying measurements are not normally distributed? Or what if your sample size is large and the test for normality is rejected? In this situation, you can use nonparametric analyses. These types of analyses do not depend on an assumption that the data values are from a specific distribution. For the two-sample t -test, the Wilcoxon rank sum test is a nonparametric test that could be used.
COMMENTS
Testing Equality of Two Percentages introduced a conceptual framework for statistical hypothesis testing. presented important statistical considerations for determining whether a treatment has an effect. Treatment is meant loosely—it could be a drug, an advertising campaign, a car wax, a test preparation course, a fertilizer, etc.The best way to determine whether a treatment has an effect is ...
The employment of two‐sample hypothesis testing in examining random graphs has been a prevalent approach in diverse fields such as social sciences, neuroscience, and genetics. We advance a spectral‐based two‐sample hypothesis testing methodology to test the latent position random graphs. We propose two distinct asymptotic normal ...
We are interested in the problem of two-sample network hypothesis testing: given two networks with the same set of nodes, we wish to test whether the underlying Bernoulli probability matrices of the two networks are the same or not. We propose Interlacing Balance Measure (IBM) as a new two-sample testing approach.
This paper presents a novel two-sample test for equal distributions in separable metric spaces, utilizing the maximum mean discrepancy (MMD). The test statistic is derived from the decomposition of the total variation of data in the reproducing kernel Hilbert space, and can be regarded as a V-statistic-based estimator of the squared MMD. The paper establishes the asymptotic null and ...
This framework aims to determine the optimal configuration of measurements and subjects for Cronbach's alpha by integrating hypothesis testing and confidence intervals. We have developed two R Shiny apps capable of considering up to nine probabilities, which encompass width, validity, and/or rejection events.
Sharp, nonasymptotic bounds are derived for the best achievable error probability in binary hypothesis testing between two probability distributions with indepe
Before sitting a UAT-UK test, you will need to complete a two-step registration: First-time test takers must create a ... For ease of use, the ESAT sample tests are split into their separate parts so you can easily access the subjects that you intend to take in the live test. On the day of the test, your chosen subjects will be combined as a ...
The second zip code was 89436 in the Spanish Springs area. NNPH said its common for mosquitos to have West Nile, and the agency regularly sends samples to the Nevada State Laboratory for surveillance.
In statistical hypothesis testing, a two-sample test is a test performed on the data of two random samples, each independently obtained from a different given population.The purpose of the test is to determine whether the difference between these two populations is statistically significant.. There are a large number of statistical tests that can be used in a two-sample test.
10.5: Matched or Paired Samples When using a hypothesis test for matched or paired samples, the following characteristics should be present: Simple random sampling is used. Sample sizes are often small. Two measurements (samples) are drawn from the same pair of individuals or objects. Differences are calculated from the matched or paired samples.
A two sample t-test is used to determine whether or not two population means are equal. ... 0.05, and 0.01) then you can reject the null hypothesis. Two Sample t-test: Assumptions. For the results of a two sample t-test to be valid, the following assumptions should be met:
15.1.2 Two Sample t test approach. For this we can use the two-sample t-test to compare the means of these two distinct populations. Here the alternative hypothesis is that the lottery players score more points H A: μL > μN L H A: μ L > μ N L thus the null hypothesis is H 0: μL ≤ μN L. H 0: μ L ≤ μ N L. We can now perform the test ...
10.4: Matched or Paired Samples When using a hypothesis test for matched or paired samples, the following characteristics should be present: Simple random sampling is used. Sample sizes are often small. Two measurements (samples) are drawn from the same pair of individuals or objects. Differences are calculated from the matched or paired samples.
For a 2-sample t-test, the signal, or effect, is the difference between the two sample means. This calculation is straightforward. If the first sample mean is 20 and the second mean is 15, the effect is 5. Typically, the null hypothesis states that there is no difference between the two samples.
Using the Bootstrap for Two-Sample Hypothesis Tests. Since each bootstrap replicate is a possible representation of the population, we can compute the relevant test-statistics from this bootstrap sample. By repeating this, we can have many simulated values of the test-statistics that form the null distribution to test the hypothesis.
The first step is to state the null hypothesis and an alternative hypothesis. Null hypothesis: μ 1 - μ 2 = 0. Alternative hypothesis: μ 1 - μ 2 ≠ 0. Note that these hypotheses constitute a two-tailed test. The null hypothesis will be rejected if the difference between sample means is too big or if it is too small.
The Population Mean: This image shows a series of histograms for a large number of sample means taken from a population.Recall that as more sample means are taken, the closer the mean of these means will be to the population mean. In this section, we explore hypothesis testing of two independent population means (and proportions) and also tests for paired samples of population means.
Step 1: Determine the hypotheses. The hypotheses for a difference in two population means are similar to those for a difference in two population proportions. The null hypothesis, H 0, is again a statement of "no effect" or "no difference.". H 0: μ 1 - μ 2 = 0, which is the same as H 0: μ 1 = μ 2. The alternative hypothesis, H a ...
Statisticians refer to this case (equal n in the two samples) as a paired samples hypothesis test. The procedure is very similar to the single-sample hypothesis tests we have already discussed, except that we replace variable x by the difference between the two variables, δ = x − x . B A.
8.5: Matched or Paired Samples. When using a hypothesis test for matched or paired samples, the following characteristics should be present: Simple random sampling is used. Sample sizes are often small. Two measurements (samples) are drawn from the same pair of individuals or objects. Differences are calculated from the matched or paired samples.
5.5 - Hypothesis Testing for Two-Sample Proportions. We are now going to develop the hypothesis test for the difference of two proportions for independent samples. The hypothesis test follows the same steps as one group. These notes are going to go into a little bit of math and formulas to help demonstrate the logic behind hypothesis testing ...
Use the following information to answer the next 15 exercises: Indicate if the hypothesis test is for. independent group means, population standard deviations, and/or variances known. independent group means, population standard deviations, and/or variances unknown. matched or paired samples. single mean.
Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test. Step 4: Decide whether to reject or fail to reject your null hypothesis. Step 5: Present your findings. Other interesting articles. Frequently asked questions about hypothesis testing.
CHAPTER 10ONE- AND TWO-SAMPLE TESTS OF HYPOTHESESCon dence intervals represent the rst of t. o kinds of inference that we study in this course. Hy-pothesis testing, or test of significance is the s. cond common type of formal statistical inference. . It has a di erent goal than con dence intervals.The big picture is that the test of hypothesis ...
The t-test for dependent samples is a statistical test for comparing the means from two dependent populations (or the difference between the means from two populations). The t-test is used when the differences are normally distributed. The samples also must be dependent. The formula for the t-test statistic is: t = D¯−μD (SD n√).
Hypothesis testing with two samples is a statistical method used to determine whether two groups of data are significantly different from each other. This type of testing involves comparing the means of two separate samples in order to determine if there is a significant difference between them. The process involves creating a null hypothesis ...
This statistics video explains how to perform hypothesis testing with two sample means using the t-test with the student's t-distribution and the z-test with...
The difference of two proportions is approximately normal if there are at least five successes and five failures in each sample. When conducting a hypothesis test for a difference of two proportions, the random samples must be independent and the population must be at least ten times the sample size.
Alternative Hypothesis (H A): The population means of the test scores for the two groups are unequal (μ 1 ≠ μ 2). Choosing the correct hypothesis test depends on attributes such as data type and number of groups. Because they're using continuous data and comparing two means, the researchers use a 2-sample t-test.
The two-sample t-test (also known as the independent samples t-test) ... We can reject the hypothesis of equal mean body fat for the two groups and conclude that we have evidence body fat differs in the population between men and women. The software shows a p-value of 0.0107. We decided on a 5% risk of concluding the mean body fat for men and ...