greater than (>) less than (<)
H 0 always has a symbol with an equal in it. H a never has a symbol with an equal in it. The choice of symbol depends on the wording of the hypothesis test. However, be aware that many researchers (including one of the co-authors in research work) use = in the null hypothesis, even with > or < as the symbol in the alternative hypothesis. This practice is acceptable because we only make the decision to reject or not reject the null hypothesis.
H 0 : No more than 30% of the registered voters in Santa Clara County voted in the primary election. p ≤ 30
H a : More than 30% of the registered voters in Santa Clara County voted in the primary election. p > 30
A medical trial is conducted to test whether or not a new medicine reduces cholesterol by 25%. State the null and alternative hypotheses.
H 0 : The drug reduces cholesterol by 25%. p = 0.25
H a : The drug does not reduce cholesterol by 25%. p ≠ 0.25
We want to test whether the mean GPA of students in American colleges is different from 2.0 (out of 4.0). The null and alternative hypotheses are:
H 0 : μ = 2.0
H a : μ ≠ 2.0
We want to test whether the mean height of eighth graders is 66 inches. State the null and alternative hypotheses. Fill in the correct symbol (=, ≠, ≥, <, ≤, >) for the null and alternative hypotheses. H 0 : μ __ 66 H a : μ __ 66
We want to test if college students take less than five years to graduate from college, on the average. The null and alternative hypotheses are:
H 0 : μ ≥ 5
H a : μ < 5
We want to test if it takes fewer than 45 minutes to teach a lesson plan. State the null and alternative hypotheses. Fill in the correct symbol ( =, ≠, ≥, <, ≤, >) for the null and alternative hypotheses. H 0 : μ __ 45 H a : μ __ 45
In an issue of U.S. News and World Report , an article on school standards stated that about half of all students in France, Germany, and Israel take advanced placement exams and a third pass. The same article stated that 6.6% of U.S. students take advanced placement exams and 4.4% pass. Test if the percentage of U.S. students who take advanced placement exams is more than 6.6%. State the null and alternative hypotheses.
H 0 : p ≤ 0.066
H a : p > 0.066
On a state driver’s test, about 40% pass the test on the first try. We want to test if more than 40% pass on the first try. Fill in the correct symbol (=, ≠, ≥, <, ≤, >) for the null and alternative hypotheses. H 0 : p __ 0.40 H a : p __ 0.40
In a hypothesis test , sample data is evaluated in order to arrive at a decision about some type of claim. If certain conditions about the sample are satisfied, then the claim can be evaluated for a population. In a hypothesis test, we: Evaluate the null hypothesis , typically denoted with H 0 . The null is not rejected unless the hypothesis test shows otherwise. The null statement must always contain some form of equality (=, ≤ or ≥) Always write the alternative hypothesis , typically denoted with H a or H 1 , using less than, greater than, or not equals symbols, i.e., (≠, >, or <). If we reject the null hypothesis, then we can assume there is enough evidence to support the alternative hypothesis. Never state that a claim is proven true or false. Keep in mind the underlying fact that hypothesis testing is based on probability laws; therefore, we can talk only in terms of non-absolute certainties.
H 0 and H a are contradictory.
If you're seeing this message, it means we're having trouble loading external resources on our website.
If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.
To log in and use all the features of Khan Academy, please enable JavaScript in your browser.
Course: ap®︎/college statistics > unit 10.
selected template will load here
This action is not available.
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
The actual test begins by considering two hypotheses. They are called the null hypothesis and the alternative hypothesis. These hypotheses contain opposing viewpoints.
Since the null and alternative hypotheses are contradictory, you must examine evidence to decide if you have enough evidence to reject the null hypothesis or not. The evidence is in the form of sample data. After you have determined which hypothesis the sample supports, you make a decision. There are two options for a decision. They are "reject \(H_{0}\)" if the sample information favors the alternative hypothesis or "do not reject \(H_{0}\)" or "decline to reject \(H_{0}\)" if the sample information is insufficient to reject the null hypothesis.
equal (=) | not equal \((\neq)\) greater than (>) less than (<) |
greater than or equal to \((\geq)\) | less than (<) |
less than or equal to \((\leq)\) | more than (>) |
\(H_{0}\) always has a symbol with an equal in it. \(H_{a}\) never has a symbol with an equal in it. The choice of symbol depends on the wording of the hypothesis test. However, be aware that many researchers (including one of the co-authors in research work) use = in the null hypothesis, even with > or < as the symbol in the alternative hypothesis. This practice is acceptable because we only make the decision to reject or not reject the null hypothesis.
Example \(\PageIndex{1}\)
Exercise \(\PageIndex{1}\)
A medical trial is conducted to test whether or not a new medicine reduces cholesterol by 25%. State the null and alternative hypotheses.
Example \(\PageIndex{2}\)
We want to test whether the mean GPA of students in American colleges is different from 2.0 (out of 4.0). The null and alternative hypotheses are:
Exercise \(\PageIndex{2}\)
We want to test whether the mean height of eighth graders is 66 inches. State the null and alternative hypotheses. Fill in the correct symbol \((=, \neq, \geq, <, \leq, >)\) for the null and alternative hypotheses.
Example \(\PageIndex{3}\)
We want to test if college students take less than five years to graduate from college, on the average. The null and alternative hypotheses are:
Exercise \(\PageIndex{3}\)
We want to test if it takes fewer than 45 minutes to teach a lesson plan. State the null and alternative hypotheses. Fill in the correct symbol ( =, ≠, ≥, <, ≤, >) for the null and alternative hypotheses.
Example \(\PageIndex{4}\)
In an issue of U. S. News and World Report , an article on school standards stated that about half of all students in France, Germany, and Israel take advanced placement exams and a third pass. The same article stated that 6.6% of U.S. students take advanced placement exams and 4.4% pass. Test if the percentage of U.S. students who take advanced placement exams is more than 6.6%. State the null and alternative hypotheses.
Exercise \(\PageIndex{4}\)
On a state driver’s test, about 40% pass the test on the first try. We want to test if more than 40% pass on the first try. Fill in the correct symbol (\(=, \neq, \geq, <, \leq, >\)) for the null and alternative hypotheses.
COLLABORATIVE EXERCISE
Bring to class a newspaper, some news magazines, and some Internet articles . In groups, find articles from which your group can write null and alternative hypotheses. Discuss your hypotheses with the rest of the class.
In a hypothesis test , sample data is evaluated in order to arrive at a decision about some type of claim. If certain conditions about the sample are satisfied, then the claim can be evaluated for a population. In a hypothesis test, we:
\(H_{0}\) and \(H_{a}\) are contradictory.
equal \((=)\) | greater than or equal to \((\geq)\) | less than or equal to \((\leq)\) | |
has: | not equal \((\neq)\) greater than \((>)\) less than \((<)\) | less than \((<)\) | greater than \((>)\) |
\(\alpha\) is preconceived. Its value is set before the hypothesis test starts. The \(p\)-value is calculated from the data.References
Data from the National Institute of Mental Health. Available online at http://www.nimh.nih.gov/publicat/depression.cfm .
Barbara Illowsky and Susan Dean (De Anza College) with many other contributing authors. Content produced by OpenStax College is licensed under a Creative Commons Attribution License 4.0 license. Download for free at http://cnx.org/contents/[email protected] .
Alternative hypothesis defines there is a statistically important relationship between two variables. Whereas null hypothesis states there is no statistical relationship between the two variables. In statistics, we usually come across various kinds of hypotheses. A statistical hypothesis is supposed to be a working statement which is assumed to be logical with given data. It should be noticed that a hypothesis is neither considered true nor false.
The alternative hypothesis is a statement used in statistical inference experiment. It is contradictory to the null hypothesis and denoted by H a or H 1 . We can also say that it is simply an alternative to the null. In hypothesis testing, an alternative theory is a statement which a researcher is testing. This statement is true from the researcher’s point of view and ultimately proves to reject the null to replace it with an alternative assumption. In this hypothesis, the difference between two or more variables is predicted by the researchers, such that the pattern of data observed in the test is not due to chance.
To check the water quality of a river for one year, the researchers are doing the observation. As per the null hypothesis, there is no change in water quality in the first half of the year as compared to the second half. But in the alternative hypothesis, the quality of water is poor in the second half when observed.
|
|
It denotes there is no relationship between two measured phenomena. | It’s a hypothesis that a random cause may influence the observed data or sample. |
It is represented by H | It is represented by H or H |
Example: Rohan will win at least Rs.100000 in lucky draw. | Example: Rohan will win less than Rs.100000 in lucky draw. |
Basically, there are three types of the alternative hypothesis, they are;
Left-Tailed : Here, it is expected that the sample proportion (π) is less than a specified value which is denoted by π 0 , such that;
H 1 : π < π 0
Right-Tailed: It represents that the sample proportion (π) is greater than some value, denoted by π 0 .
H 1 : π > π 0
Two-Tailed: According to this hypothesis, the sample proportion (denoted by π) is not equal to a specific value which is represented by π 0 .
H 1 : π ≠ π 0
Note: The null hypothesis for all the three alternative hypotheses, would be H 1 : π = π 0 .
MATHS Related Links | |
Register with byju's & watch live videos.
Hypothesis testing involves the careful construction of two statements: the null hypothesis and the alternative hypothesis. These hypotheses can look very similar but are actually different.
How do we know which hypothesis is the null and which one is the alternative? We will see that there are a few ways to tell the difference.
The null hypothesis reflects that there will be no observed effect in our experiment. In a mathematical formulation of the null hypothesis, there will typically be an equal sign. This hypothesis is denoted by H 0 .
The null hypothesis is what we attempt to find evidence against in our hypothesis test. We hope to obtain a small enough p-value that it is lower than our level of significance alpha and we are justified in rejecting the null hypothesis. If our p-value is greater than alpha, then we fail to reject the null hypothesis.
If the null hypothesis is not rejected, then we must be careful to say what this means. The thinking on this is similar to a legal verdict. Just because a person has been declared "not guilty", it does not mean that he is innocent. In the same way, just because we failed to reject a null hypothesis it does not mean that the statement is true.
For example, we may want to investigate the claim that despite what convention has told us, the mean adult body temperature is not the accepted value of 98.6 degrees Fahrenheit . The null hypothesis for an experiment to investigate this is “The mean adult body temperature for healthy individuals is 98.6 degrees Fahrenheit.” If we fail to reject the null hypothesis, then our working hypothesis remains that the average adult who is healthy has a temperature of 98.6 degrees. We do not prove that this is true.
If we are studying a new treatment, the null hypothesis is that our treatment will not change our subjects in any meaningful way. In other words, the treatment will not produce any effect in our subjects.
The alternative or experimental hypothesis reflects that there will be an observed effect for our experiment. In a mathematical formulation of the alternative hypothesis, there will typically be an inequality, or not equal to symbol. This hypothesis is denoted by either H a or by H 1 .
The alternative hypothesis is what we are attempting to demonstrate in an indirect way by the use of our hypothesis test. If the null hypothesis is rejected, then we accept the alternative hypothesis. If the null hypothesis is not rejected, then we do not accept the alternative hypothesis. Going back to the above example of mean human body temperature, the alternative hypothesis is “The average adult human body temperature is not 98.6 degrees Fahrenheit.”
If we are studying a new treatment, then the alternative hypothesis is that our treatment does, in fact, change our subjects in a meaningful and measurable way.
The following set of negations may help when you are forming your null and alternative hypotheses. Most technical papers rely on just the first formulation, even though you may see some of the others in a statistics textbook.
selected template will load here
This action is not available.
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
Is the typical US runner getting faster or slower over time? We consider this question in the context of the Cherry Blossom Run, comparing runners in 2006 and 2012. Technological advances in shoes, training, and diet might suggest runners would be faster in 2012. An opposing viewpoint might say that with the average body mass index on the rise, people tend to run slower. In fact, all of these components might be influencing run time.
In addition to considering run times in this section, we consider a topic near and dear to most students: sleep. A recent study found that college students average about 7 hours of sleep per night.15 However, researchers at a rural college are interested in showing that their students sleep longer than seven hours on average. We investigate this topic in Section 4.3.4.
The average time for all runners who finished the Cherry Blossom Run in 2006 was 93.29 minutes (93 minutes and about 17 seconds). We want to determine if the run10Samp data set provides strong evidence that the participants in 2012 were faster or slower than those runners in 2006, versus the other possibility that there has been no change. 16 We simplify these three options into two competing hypotheses :
We call H 0 the null hypothesis and H A the alternative hypothesis.
Null and alternative hypotheses
15 theloquitur.com/?p=1161
16 While we could answer this question by examining the entire population data (run10), we only consider the sample data (run10Samp), which is more realistic since we rarely have access to population data.
The null hypothesis often represents a skeptical position or a perspective of no difference. The alternative hypothesis often represents a new perspective, such as the possibility that there has been a change.
Hypothesis testing framework
The skeptic will not reject the null hypothesis (H 0 ), unless the evidence in favor of the alternative hypothesis (H A ) is so strong that she rejects H 0 in favor of H A .
The hypothesis testing framework is a very general tool, and we often use it without a second thought. If a person makes a somewhat unbelievable claim, we are initially skeptical. However, if there is sufficient evidence that supports the claim, we set aside our skepticism and reject the null hypothesis in favor of the alternative. The hallmarks of hypothesis testing are also found in the US court system.
Exercise \(\PageIndex{1}\)
A US court considers two possible claims about a defendant: she is either innocent or guilty. If we set these claims up in a hypothesis framework, which would be the null hypothesis and which the alternative? 17
Jurors examine the evidence to see whether it convincingly shows a defendant is guilty. Even if the jurors leave unconvinced of guilt beyond a reasonable doubt, this does not mean they believe the defendant is innocent. This is also the case with hypothesis testing: even if we fail to reject the null hypothesis, we typically do not accept the null hypothesis as true. Failing to find strong evidence for the alternative hypothesis is not equivalent to accepting the null hypothesis.
17 H 0 : The average cost is $650 per month, \(\mu\) = $650.
In the example with the Cherry Blossom Run, the null hypothesis represents no difference in the average time from 2006 to 2012. The alternative hypothesis represents something new or more interesting: there was a difference, either an increase or a decrease. These hypotheses can be described in mathematical notation using \(\mu_{12}\) as the average run time for 2012:
where 93.29 minutes (93 minutes and about 17 seconds) is the average 10 mile time for all runners in the 2006 Cherry Blossom Run. Using this mathematical notation, the hypotheses can now be evaluated using statistical tools. We call 93.29 the null value since it represents the value of the parameter if the null hypothesis is true. We will use the run10Samp data set to evaluate the hypothesis test.
We can start the evaluation of the hypothesis setup by comparing 2006 and 2012 run times using a point estimate from the 2012 sample: \(\bar {x}_{12} = 95.61\) minutes. This estimate suggests the average time is actually longer than the 2006 time, 93.29 minutes. However, to evaluate whether this provides strong evidence that there has been a change, we must consider the uncertainty associated with \(\bar {x}_{12}\).
1 6 The jury considers whether the evidence is so convincing (strong) that there is no reasonable doubt regarding the person's guilt; in such a case, the jury rejects innocence (the null hypothesis) and concludes the defendant is guilty (alternative hypothesis).
We learned in Section 4.1 that there is fluctuation from one sample to another, and it is very unlikely that the sample mean will be exactly equal to our parameter; we should not expect \(\bar {x}_{12}\) to exactly equal \(\mu_{12}\). Given that \(\bar {x}_{12} = 95.61\), it might still be possible that the population average in 2012 has remained unchanged from 2006. The difference between \(\bar {x}_{12}\) and 93.29 could be due to sampling variation, i.e. the variability associated with the point estimate when we take a random sample.
In Section 4.2, confidence intervals were introduced as a way to find a range of plausible values for the population mean. Based on run10Samp, a 95% confidence interval for the 2012 population mean, \(\mu_{12}\), was calculated as
\[(92.45, 98.77)\]
Because the 2006 mean, 93.29, falls in the range of plausible values, we cannot say the null hypothesis is implausible. That is, we failed to reject the null hypothesis, H 0 .
Double negatives can sometimes be used in statistics
In many statistical explanations, we use double negatives. For instance, we might say that the null hypothesis is not implausible or we failed to reject the null hypothesis. Double negatives are used to communicate that while we are not rejecting a position, we are also not saying it is correct.
Example \(\PageIndex{1}\)
Next consider whether there is strong evidence that the average age of runners has changed from 2006 to 2012 in the Cherry Blossom Run. In 2006, the average age was 36.13 years, and in the 2012 run10Samp data set, the average was 35.05 years with a standard deviation of 8.97 years for 100 runners.
First, set up the hypotheses:
We have previously veri ed conditions for this data set. The normal model may be applied to \(\bar {y}\) and the estimate of SE should be very accurate. Using the sample mean and standard error, we can construct a 95% con dence interval for \(\mu _{age}\) to determine if there is sufficient evidence to reject H 0 :
\[\bar{y} \pm 1.96 \times \dfrac {s}{\sqrt {100}} \rightarrow 35.05 \pm 1.96 \times 0.90 \rightarrow (33.29, 36.81)\]
This confidence interval contains the null value, 36.13. Because 36.13 is not implausible, we cannot reject the null hypothesis. We have not found strong evidence that the average age is different than 36.13 years.
Exercise \(\PageIndex{2}\)
Colleges frequently provide estimates of student expenses such as housing. A consultant hired by a community college claimed that the average student housing expense was $650 per month. What are the null and alternative hypotheses to test whether this claim is accurate? 18
H A : The average cost is different than $650 per month, \(\mu \ne\) $650.
18 Applying the normal model requires that certain conditions are met. Because the data are a simple random sample and the sample (presumably) represents no more than 10% of all students at the college, the observations are independent. The sample size is also sufficiently large (n = 75) and the data exhibit only moderate skew. Thus, the normal model may be applied to the sample mean.
Exercise \(\PageIndex{3}\)
The community college decides to collect data to evaluate the $650 per month claim. They take a random sample of 75 students at their school and obtain the data represented in Figure 4.11. Can we apply the normal model to the sample mean?
If the court makes a Type 1 Error, this means the defendant is innocent (H 0 true) but wrongly convicted. A Type 2 Error means the court failed to reject H 0 (i.e. failed to convict the person) when she was in fact guilty (H A true).
Example \(\PageIndex{2}\)
The sample mean for student housing is $611.63 and the sample standard deviation is $132.85. Construct a 95% confidence interval for the population mean and evaluate the hypotheses of Exercise 4.22.
The standard error associated with the mean may be estimated using the sample standard deviation divided by the square root of the sample size. Recall that n = 75 students were sampled.
\[ SE = \dfrac {s}{\sqrt {n}} = \dfrac {132.85}{\sqrt {75}} = 15.34\]
You showed in Exercise 4.23 that the normal model may be applied to the sample mean. This ensures a 95% confidence interval may be accurately constructed:
\[\bar {x} \pm z*SE \rightarrow 611.63 \pm 1.96 \times 15.34 \times (581.56, 641.70)\]
Because the null value $650 is not in the confidence interval, a true mean of $650 is implausible and we reject the null hypothesis. The data provide statistically significant evidence that the actual average housing expense is less than $650 per month.
Hypothesis tests are not flawless. Just think of the court system: innocent people are sometimes wrongly convicted and the guilty sometimes walk free. Similarly, we can make a wrong decision in statistical hypothesis tests. However, the difference is that we have the tools necessary to quantify how often we make such errors.
There are two competing hypotheses: the null and the alternative. In a hypothesis test, we make a statement about which one might be true, but we might choose incorrectly. There are four possible scenarios in a hypothesis test, which are summarized in Table 4.12.
Test conclusion | ||
---|---|---|
do not reject H | reject H in favor of H | |
H true H true | okay Type 2 Error | Type 1 Error okay |
A Type 1 Error is rejecting the null hypothesis when H0 is actually true. A Type 2 Error is failing to reject the null hypothesis when the alternative is actually true.
Exercise 4.25
In a US court, the defendant is either innocent (H 0 ) or guilty (H A ). What does a Type 1 Error represent in this context? What does a Type 2 Error represent? Table 4.12 may be useful.
To lower the Type 1 Error rate, we might raise our standard for conviction from "beyond a reasonable doubt" to "beyond a conceivable doubt" so fewer people would be wrongly convicted. However, this would also make it more difficult to convict the people who are actually guilty, so we would make more Type 2 Errors.
Exercise 4.26
How could we reduce the Type 1 Error rate in US courts? What influence would this have on the Type 2 Error rate?
To lower the Type 2 Error rate, we want to convict more guilty people. We could lower the standards for conviction from "beyond a reasonable doubt" to "beyond a little doubt". Lowering the bar for guilt will also result in more wrongful convictions, raising the Type 1 Error rate.
Exercise 4.27
How could we reduce the Type 2 Error rate in US courts? What influence would this have on the Type 1 Error rate?
A skeptic would have no reason to believe that sleep patterns at this school are different than the sleep patterns at another school.
Exercises 4.25-4.27 provide an important lesson:
If we reduce how often we make one type of error, we generally make more of the other type.
Hypothesis testing is built around rejecting or failing to reject the null hypothesis. That is, we do not reject H 0 unless we have strong evidence. But what precisely does strong evidence mean? As a general rule of thumb, for those cases where the null hypothesis is actually true, we do not want to incorrectly reject H 0 more than 5% of the time. This corresponds to a significance level of 0.05. We often write the significance level using \(\alpha\) (the Greek letter alpha): \(\alpha = 0.05.\) We discuss the appropriateness of different significance levels in Section 4.3.6.
If we use a 95% confidence interval to test a hypothesis where the null hypothesis is true, we will make an error whenever the point estimate is at least 1.96 standard errors away from the population parameter. This happens about 5% of the time (2.5% in each tail). Similarly, using a 99% con dence interval to evaluate a hypothesis is equivalent to a significance level of \(\alpha = 0.01\).
A confidence interval is, in one sense, simplistic in the world of hypothesis tests. Consider the following two scenarios:
In Section 4.3.4, we introduce a tool called the p-value that will be helpful in these cases. The p-value method also extends to hypothesis tests where con dence intervals cannot be easily constructed or applied.
The p-value is a way of quantifying the strength of the evidence against the null hypothesis and in favor of the alternative. Formally the p-value is a conditional probability.
definition: p-value
The p-value is the probability of observing data at least as favorable to the alternative hypothesis as our current data set, if the null hypothesis is true. We typically use a summary statistic of the data, in this chapter the sample mean, to help compute the p-value and evaluate the hypotheses.
A poll by the National Sleep Foundation found that college students average about 7 hours of sleep per night. Researchers at a rural school are interested in showing that students at their school sleep longer than seven hours on average, and they would like to demonstrate this using a sample of students. What would be an appropriate skeptical position for this research?
This is entirely based on the interests of the researchers. Had they been only interested in the opposite case - showing that their students were actually averaging fewer than seven hours of sleep but not interested in showing more than 7 hours - then our setup would have set the alternative as \(\mu < 7\).
We can set up the null hypothesis for this test as a skeptical perspective: the students at this school average 7 hours of sleep per night. The alternative hypothesis takes a new form reflecting the interests of the research: the students average more than 7 hours of sleep. We can write these hypotheses as
Using \(\mu\) > 7 as the alternative is an example of a one-sided hypothesis test. In this investigation, there is no apparent interest in learning whether the mean is less than 7 hours. (The standard error can be estimated from the sample standard deviation and the sample size: \(SE_{\bar {x}} = \dfrac {s_x}{\sqrt {n}} = \dfrac {1.75}{\sqrt {110}} = 0.17\)). Earlier we encountered a two-sided hypothesis where we looked for any clear difference, greater than or less than the null value.
Always use a two-sided test unless it was made clear prior to data collection that the test should be one-sided. Switching a two-sided test to a one-sided test after observing the data is dangerous because it can inflate the Type 1 Error rate.
TIP: One-sided and two-sided tests
If the researchers are only interested in showing an increase or a decrease, but not both, use a one-sided test. If the researchers would be interested in any difference from the null value - an increase or decrease - then the test should be two-sided.
TIP: Always write the null hypothesis as an equality
We will find it most useful if we always list the null hypothesis as an equality (e.g. \(\mu\) = 7) while the alternative always uses an inequality (e.g. \(\mu \ne 7, \mu > 7, or \mu < 7)\).
The researchers at the rural school conducted a simple random sample of n = 110 students on campus. They found that these students averaged 7.42 hours of sleep and the standard deviation of the amount of sleep for the students was 1.75 hours. A histogram of the sample is shown in Figure 4.14.
Before we can use a normal model for the sample mean or compute the standard error of the sample mean, we must verify conditions. (1) Because this is a simple random sample from less than 10% of the student body, the observations are independent. (2) The sample size in the sleep study is sufficiently large since it is greater than 30. (3) The data show moderate skew in Figure 4.14 and the presence of a couple of outliers. This skew and the outliers (which are not too extreme) are acceptable for a sample size of n = 110. With these conditions veri ed, the normal model can be safely applied to \(\bar {x}\) and the estimated standard error will be very accurate.
What is the standard deviation associated with \(\bar {x}\)? That is, estimate the standard error of \(\bar {x}\). 25
The hypothesis test will be evaluated using a significance level of \(\alpha = 0.05\). We want to consider the data under the scenario that the null hypothesis is true. In this case, the sample mean is from a distribution that is nearly normal and has mean 7 and standard deviation of about 0.17. Such a distribution is shown in Figure 4.15.
The shaded tail in Figure 4.15 represents the chance of observing such a large mean, conditional on the null hypothesis being true. That is, the shaded tail represents the p-value. We shade all means larger than our sample mean, \(\bar {x} = 7.42\), because they are more favorable to the alternative hypothesis than the observed mean.
We compute the p-value by finding the tail area of this normal distribution, which we learned to do in Section 3.1. First compute the Z score of the sample mean, \(\bar {x} = 7.42\):
\[Z = \dfrac {\bar {x} - \text {null value}}{SE_{\bar {x}}} = \dfrac {7.42 - 7}{0.17} = 2.47\]
Using the normal probability table, the lower unshaded area is found to be 0.993. Thus the shaded area is 1 - 0.993 = 0.007. If the null hypothesis is true, the probability of observing such a large sample mean for a sample of 110 students is only 0.007. That is, if the null hypothesis is true, we would not often see such a large mean.
We evaluate the hypotheses by comparing the p-value to the significance level. Because the p-value is less than the significance level \((p-value = 0.007 < 0.05 = \alpha)\), we reject the null hypothesis. What we observed is so unusual with respect to the null hypothesis that it casts serious doubt on H 0 and provides strong evidence favoring H A .
p-value as a tool in hypothesis testing
The p-value quantifies how strongly the data favor H A over H 0 . A small p-value (usually < 0.05) corresponds to sufficient evidence to reject H 0 in favor of H A .
TIP: It is useful to First draw a picture to find the p-value
It is useful to draw a picture of the distribution of \(\bar {x}\) as though H 0 was true (i.e. \(\mu\) equals the null value), and shade the region (or regions) of sample means that are at least as favorable to the alternative hypothesis. These shaded regions represent the p-value.
The ideas below review the process of evaluating hypothesis tests with p-values:
The p-value is constructed in such a way that we can directly compare it to the significance level ( \(\alpha\)) to determine whether or not to reject H 0 . This method ensures that the Type 1 Error rate does not exceed the significance level standard.
If the null hypothesis is true, how often should the p-value be less than 0.05?
About 5% of the time. If the null hypothesis is true, then the data only has a 5% chance of being in the 5% of data most favorable to H A .
Exercise 4.31
Suppose we had used a significance level of 0.01 in the sleep study. Would the evidence have been strong enough to reject the null hypothesis? (The p-value was 0.007.) What if the significance level was \(\alpha = 0.001\)? 27
27 We reject the null hypothesis whenever p-value < \(\alpha\). Thus, we would still reject the null hypothesis if \(\alpha = 0.01\) but not if the significance level had been \(\alpha = 0.001\).
Exercise 4.32
Ebay might be interested in showing that buyers on its site tend to pay less than they would for the corresponding new item on Amazon. We'll research this topic for one particular product: a video game called Mario Kart for the Nintendo Wii. During early October 2009, Amazon sold this game for $46.99. Set up an appropriate (one-sided!) hypothesis test to check the claim that Ebay buyers pay less during auctions at this same time. 28
28 The skeptic would say the average is the same on Ebay, and we are interested in showing the average price is lower.
Exercise 4.33
During early October, 2009, 52 Ebay auctions were recorded for Mario Kart.29 The total prices for the auctions are presented using a histogram in Figure 4.17, and we may like to apply the normal model to the sample mean. Check the three conditions required for applying the normal model: (1) independence, (2) at least 30 observations, and (3) the data are not strongly skewed. 30
30 (1) The independence condition is unclear. We will make the assumption that the observations are independent, which we should report with any nal results. (2) The sample size is sufficiently large: \(n = 52 \ge 30\). (3) The data distribution is not strongly skewed; it is approximately symmetric.
H 0 : The average auction price on Ebay is equal to (or more than) the price on Amazon. We write only the equality in the statistical notation: \(\mu_{ebay} = 46.99\).
H A : The average price on Ebay is less than the price on Amazon, \(\mu _{ebay} < 46.99\).
29 These data were collected by OpenIntro staff.
Example 4.34
The average sale price of the 52 Ebay auctions for Wii Mario Kart was $44.17 with a standard deviation of $4.15. Does this provide sufficient evidence to reject the null hypothesis in Exercise 4.32? Use a significance level of \(\alpha = 0.01\).
The hypotheses were set up and the conditions were checked in Exercises 4.32 and 4.33. The next step is to find the standard error of the sample mean and produce a sketch to help find the p-value.
Because the alternative hypothesis says we are looking for a smaller mean, we shade the lower tail. We find this shaded area by using the Z score and normal probability table: \(Z = \dfrac {44.17 \times 46.99}{0.5755} = -4.90\), which has area less than 0.0002. The area is so small we cannot really see it on the picture. This lower tail area corresponds to the p-value.
Because the p-value is so small - specifically, smaller than = 0.01 - this provides sufficiently strong evidence to reject the null hypothesis in favor of the alternative. The data provide statistically signi cant evidence that the average price on Ebay is lower than Amazon's asking price.
We now consider how to compute a p-value for a two-sided test. In one-sided tests, we shade the single tail in the direction of the alternative hypothesis. For example, when the alternative had the form \(\mu\) > 7, then the p-value was represented by the upper tail (Figure 4.16). When the alternative was \(\mu\) < 46.99, the p-value was the lower tail (Exercise 4.32). In a two-sided test, we shade two tails since evidence in either direction is favorable to H A .
Exercise 4.35 Earlier we talked about a research group investigating whether the students at their school slept longer than 7 hours each night. Let's consider a second group of researchers who want to evaluate whether the students at their college differ from the norm of 7 hours. Write the null and alternative hypotheses for this investigation. 31
Example 4.36 The second college randomly samples 72 students and nds a mean of \(\bar {x} = 6.83\) hours and a standard deviation of s = 1.8 hours. Does this provide strong evidence against H 0 in Exercise 4.35? Use a significance level of \(\alpha = 0.05\).
First, we must verify assumptions. (1) A simple random sample of less than 10% of the student body means the observations are independent. (2) The sample size is 72, which is greater than 30. (3) Based on the earlier distribution and what we already know about college student sleep habits, the distribution is probably not strongly skewed.
Next we can compute the standard error \((SE_{\bar {x}} = \dfrac {s}{\sqrt {n}} = 0.21)\) of the estimate and create a picture to represent the p-value, shown in Figure 4.18. Both tails are shaded.
31 Because the researchers are interested in any difference, they should use a two-sided setup: H 0 : \(\mu\) = 7, H A : \(\mu \ne 7.\)
An estimate of 7.17 or more provides at least as strong of evidence against the null hypothesis and in favor of the alternative as the observed estimate, \(\bar {x} = 6.83\).
We can calculate the tail areas by rst nding the lower tail corresponding to \(\bar {x}\):
\[Z = \dfrac {6.83 - 7.00}{0.21} = -0.81 \xrightarrow {table} \text {left tail} = 0.2090\]
Because the normal model is symmetric, the right tail will have the same area as the left tail. The p-value is found as the sum of the two shaded tails:
\[ \text {p-value} = \text {left tail} + \text {right tail} = 2 \times \text {(left tail)} = 0.4180\]
This p-value is relatively large (larger than \(\mu\)= 0.05), so we should not reject H 0 . That is, if H 0 is true, it would not be very unusual to see a sample mean this far from 7 hours simply due to sampling variation. Thus, we do not have sufficient evidence to conclude that the mean is different than 7 hours.
Example 4.37 It is never okay to change two-sided tests to one-sided tests after observing the data. In this example we explore the consequences of ignoring this advice. Using \(\alpha = 0.05\), we show that freely switching from two-sided tests to onesided tests will cause us to make twice as many Type 1 Errors as intended.
Suppose the sample mean was larger than the null value, \(\mu_0\) (e.g. \(\mu_0\) would represent 7 if H 0 : \(\mu\) = 7). Then if we can ip to a one-sided test, we would use H A : \(\mu > \mu_0\). Now if we obtain any observation with a Z score greater than 1.65, we would reject H 0 . If the null hypothesis is true, we incorrectly reject the null hypothesis about 5% of the time when the sample mean is above the null value, as shown in Figure 4.19.
Suppose the sample mean was smaller than the null value. Then if we change to a one-sided test, we would use H A : \(\mu < \mu_0\). If \(\bar {x}\) had a Z score smaller than -1.65, we would reject H 0 . If the null hypothesis is true, then we would observe such a case about 5% of the time.
By examining these two scenarios, we can determine that we will make a Type 1 Error 5% + 5% = 10% of the time if we are allowed to swap to the "best" one-sided test for the data. This is twice the error rate we prescribed with our significance level: \(\alpha = 0.05\) (!).
Caution: One-sided hypotheses are allowed only before seeing data
After observing data, it is tempting to turn a two-sided test into a one-sided test. Avoid this temptation. Hypotheses must be set up before observing the data. If they are not, the test must be two-sided.
Choosing a significance level for a test is important in many contexts, and the traditional level is 0.05. However, it is often helpful to adjust the significance level based on the application. We may select a level that is smaller or larger than 0.05 depending on the consequences of any conclusions reached from the test.
Significance levels should reflect consequences of errors
The significance level selected for a test should reflect the consequences associated with Type 1 and Type 2 Errors.
Example 4.38
A car manufacturer is considering a higher quality but more expensive supplier for window parts in its vehicles. They sample a number of parts from their current supplier and also parts from the new supplier. They decide that if the high quality parts will last more than 12% longer, it makes nancial sense to switch to this more expensive supplier. Is there good reason to modify the significance level in such a hypothesis test?
The null hypothesis is that the more expensive parts last no more than 12% longer while the alternative is that they do last more than 12% longer. This decision is just one of the many regular factors that have a marginal impact on the car and company. A significancelevel of 0.05 seems reasonable since neither a Type 1 or Type 2 error should be dangerous or (relatively) much more expensive.
Example 4.39
The same car manufacturer is considering a slightly more expensive supplier for parts related to safety, not windows. If the durability of these safety components is shown to be better than the current supplier, they will switch manufacturers. Is there good reason to modify the significance level in such an evaluation?
The null hypothesis would be that the suppliers' parts are equally reliable. Because safety is involved, the car company should be eager to switch to the slightly more expensive manufacturer (reject H 0 ) even if the evidence of increased safety is only moderately strong. A slightly larger significance level, such as \(\mu = 0.10\), might be appropriate.
Exercise 4.40
A part inside of a machine is very expensive to replace. However, the machine usually functions properly even if this part is broken, so the part is replaced only if we are extremely certain it is broken based on a series of measurements. Identify appropriate hypotheses for this test (in plain language) and suggest an appropriate significance level. 32
Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
When we do testing we end up with two outcomes.
1) We reject null hypothesis
2) We fail to reject null hypothesis.
We do not talk about accepting alternative hypotheses. If we do not talk about accepting alternative hypothesis, why do we need to have alternative hypothesis at all?
Here is update: Could somebody give me two examples:
1) rejecting null hypothesis is equal to accepting alternative hypothesis
2) rejecting null hypothesis is not equal to accepting alternative hypothesis
There was, historically, disagreement about whether an alternative hypothesis was necessary. Let me explain this point of disagreement by considering the opinions of Fisher and Neyman, within the context of frequentist statistics, and a Bayesian answer.
Fisher - We do not need an alternative hypothesis; we can simply test a null hypothesis using a goodness-of-fit test. The outcome is a $p$ -value, providing a measure of evidence for the null hypothesis.
Neyman - We must perform a hypothesis test between a null and an alternative. The test is such that it would result in type-1 errors at a fixed, pre-specified rate, $\alpha$ . The outcome is a decision - to reject or not reject the null hypothesis at the level $\alpha$ .
We need an alternative from a decision theoretic perspective - we are making a choice between two courses of action - and because we should report the power of the test $$ 1 - p\left(\textrm{Accept $H_0$} \, \middle|\, H_1\right) $$ We should seek the most powerful tests possible to have the best chance of rejecting $H_0$ when the alternative is true.
To satisfy both these points, the alternative hypothesis cannot be the vague 'not $H_0$ ' one.
Bayesian - We must consider at least two models and update their relative plausibility with data. With only a single model, we simple have $$ p(H_0) = 1 $$ no matter what data we collect. To make calculations in this framework, the alternative hypothesis (or model as it would be known in this context) cannot be the ill-defined 'not $H_0$ ' one. I call it ill-defined since we cannot write the model $p(\text{data}|\text{not }H_0)$ .
I will focus on "If we do not talk about accepting alternative hypothesis, why do we need to have alternative hypothesis at all?"
Because it helps us to choose a meaningful test statistic and design our study to have high power---a high chance of rejecting the null when the alternative is true. Without an alternative, we have no concept of power.
Imagine we only have a null hypothesis and no alternative. Then there's no guidance on how to choose a test statistic that will have high power. All we can say is, "Reject the null whenever you observe a test statistic whose value is unlikely under the null." We can pick something arbitrary: we could draw Uniform(0,1) random numbers and reject the null when they are below 0.05. This happens under the null "rarely," no more than 5% of the time---yet it's also just as rare when the null is false. So this is technically a statistical test, but it's meaningless as evidence for or against anything.
Instead, usually we have some scientifically-plausible alternative hypothesis ("There is a positive difference in outcomes between the treatment and control groups in my experiment"). We'd like to defend it against potential critics who would bring up the null hypothesis as devil's advocates ("I'm not convinced yet---maybe your treatment actually hurts, or has no effect at all , and any apparent difference in the data is due only to sampling variation").
With these 2 hypotheses in mind, now we can setup up a powerful test, by choosing a test statistic whose typical values under the alternative are unlikely under the null. (A positive 2-sample t-statistic far from 0 would be unsurprising if the alternative is true, but surprising if the null is true.) Then we figure out the test statistic's sampling distribution under the null, so we can calculate p-values---and interpret them. When we observe a test statistic that's unlikely under the null, especially if the study design, sample size, etc. were chosen to have high power , this provides some evidence for the alternative.
So, why don't we talk about "accepting" the alternative hypothesis? Because even a high-powered study doesn't provide completely rigorous proof that the null is wrong. It's still a kind of evidence, but weaker than some other kinds of evidence.
Im am not 100% sure if this is a formal requirement but typically the null hypothesis and alternative hypothesis are: 1) complementary and 2) exhaustive. That is: 1) they cannot be both true at the same time ; 2) if one is not true the other must be true.
Consider simple test of heights between girls and boys. A typical null hypothesis in this case is that $height_{boys} = height_{girls}$ . An alternative hypothesis would be $height_{boys} \ne height_{girls}$ . So if null is not true - alternative must be true.
Why do we need to have alternative hypothesis at all?
In a classical hypothesis test, the only mathematical role played by the alternative hypothesis is that it affects the ordering of the evidence through the chosen test statistic. The alternative hypothesis is used to determine the appropriate test statistic for the test, which is equivalent to setting an ordinal ranking of all possible data outcomes from those most conducive to the null hypothesis (against the stated alternative) to those least conducive to the null hypotheses (against the stated alternative). Once you have formed this ordinal ranking of the possible data outcomes, the alternative hypothesis plays no further mathematical role in the test .
You can find a related answer on this question here which gives a schematic diagram of the classical hypothesis test and how the alternative hypothesis enters into the test. This is a useful supplement to the present answer.
Formal explanation: In any classical hypothesis test with $n$ observable data values $\mathbf{x} = (x_1,...,x_n)$ you have some test statistic $T: \mathbb{R}^n \rightarrow \mathbb{R}$ that maps every possible outcome of the data onto an ordinal scale that measures whether it is more conducive to the null or alternative hypothesis. (Without loss of generality we will assume that lower values are more conducive to the null hypothesis and higher values are more conducive to the alternative hypothesis. We sometimes say that higher values of the test statistic are "more extreme" insofar as they constitute more extreme evidence for the alternative hypothesis.) The p-value of the test is then given by:
$$p(\mathbf{x}) \equiv p_T(\mathbf{x}) \equiv \mathbb{P}( T(\mathbf{X}) \geqslant T(\mathbf{x}) | H_0).$$
This p-value function fully determines the evidence in the test for any data vector. When combined with a chosen significance level, it determines the outcome of the test for any data vector. (We have described this for a fixed number of data points $n$ but this can easily be extended to allow for arbitrary $n$ .) It is important to note that the p-value is affected by the test statistic only through the ordinal scale it induces , so if you apply a monotonically increasing transformation to the test statistics, this makes no difference to the hypothesis test (i.e., it is the same test). This mathematical property merely reflects the fact that the sole purpose of the test statistic is to induce an ordinal scale on the space of all possible data vectors, to show which are more conducive to the null/alternative.
The alternative hypothesis affects this measurement only through the function $T$ , which is chosen based on the stated null and alternative hypotheses within the overall model. Hence, we can regard the test statistic function as being a function $T \equiv g (\mathcal{M}, H_0, H_A)$ of the overall model $\mathcal{M}$ and the two hypotheses. For example, for a likelihood-ratio-test the test statistic is formed by taking a ratio (or logarithm of a ratio) of supremums of the likelihood function over parameter ranges relating to the null and alternative hypotheses.
What does this mean if we compare tests with different alternatives? Suppose you have a fixed model $\mathcal{M}$ and you want to do two different hypothesis tests comparing the same null hypothesis $H_0$ against two different alternatives $H_A$ and $H_A'$ . In this case you will have two different test statistic functions:
$$T = g (\mathcal{M}, H_0, H_A) \quad \quad \quad \quad \quad T' = g (\mathcal{M}, H_0, H_A'),$$
leading to the corresponding p-value functions:
$$p(\mathbf{x}) = \mathbb{P}( T(\mathbf{X}) \geqslant T(\mathbf{x}) | H_0) \quad \quad \quad \quad \quad p'(\mathbf{x}) = \mathbb{P}( T'(\mathbf{X}) \geqslant T'(\mathbf{x}) | H_0).$$
It is important to note that if $T$ and $T'$ are monotonic increasing transformations of one another then the p-value functions $p$ and $p'$ are identical, so both tests are the same test. If the functions $T$ and $T'$ are not monotonic increasing transformations of one another then we have two genuinely different hypothesis tests.
The reason I wouldn't think of accepting the alternative hypothesis is because that's not what we are testing. Null hypothesis significance testing (NHST) calculates the probability of observing data as extreme as observed (or more) given that the null hypothesis is true, or in other words NHST calculates a probability value that is conditioned on the fact that the null hypothesis is true, $P(data|H_0)$ . So it is the probability of the data assuming that the null hypothesis is true. It never uses or gives the probability of a hypothesis (neither null nor alternative). Therefore when you observe a small p-value, all you know is that the data you observed appears to be unlikely under $H_0$ , so you are collecting evidence against the null and in favour for whatever your alternative explanation is.
Before you run the experiment, you can decide on a cut-off level ( $\alpha$ ) that deems you result significant, meaning if your p-value falls below that level, you conclude that the evidence against the null is so overwhelmingly high that the data must have originated from some other data generating process and you reject the null hypothesis based on that evidence. If the p-value is above that level you fail to reject the null hypothesis since your evidence is not substantial enough to believe that your sample came form a different data generating process.
The reason why you formulate an alternative hypothesis is because you likely had an experiment in mind before you started sampling. Formulating an alternative hypothesis can also decide on whether you use a one-tailed or two-tailed test and hence giving you more statistical power (in the one-tailed scenario). But technically in order to run the test you don't need to formulate an alternative hypothesis, you just need data.
Sign up or log in, post as a guest.
Required, but never shown
By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .
In this post, you’ll learn how to perform t-tests in Python using the popular SciPy library . T-tests are used to test for statistical significance and can be hugely advantageous when working with smaller sample sizes.
By the end of this tutorial, you’ll have learned the following:
Table of Contents
The t-test, or often referred to as the student’s t-test , dates back to the early 20th century. An Irish statistician working for Guinness Brewery, William Sealy Gosset, introduced the concept. Because the brewery was working with small sample sizes and was under strict orders of confidentiality, Gosset published his findings under the pseudonym “Student”. His seminal work, “The Probable Error of a Mean,” laid the groundwork for what we now know as Student’s t-test.
This leads us to one of the primary benefits of the t-test: the t-test is able to make reliable inferences about a population using a small sample size . Let’s explore how this works by discussing the theory behind the t-test in the following section.
Statistical tests are used to make assumptions about some population parameters. For example, it lets us test whether or not the average test score for any given group of students is 70%. The T-Test works in two different ways:
Let’s explore these in a little more depth.
The one-sample t-test is used to test the null hypothesis that the population mean inferred from a sample is equal to some given value. It can be described as below:
There are actually three different alternative hypotheses:
We can use the following formula to calculate our test statistic:
We then need to calculate the p-value using degrees of freedom equal to n – 1. If the p-value is less than your chosen significance level, we can reject the null hypothesis and say that the means differ.
The two-sample t-test is used to test whether two population means are equal (or if they differ in a significant way). In this case, the null hypothesis assumes that the two population means are equal.
When we sample two different groups, we are almost guaranteed that their sample means will differ. But the t-test allows us to test whether or not this difference is different in a statistically significant way.
Similar to the one-sample t-test, there are three different alternative hypotheses:
The formula for the two-sample t-test can be written as:
We then need to calculate the p-value using degrees of freedom equal to (n 1 +n 2 -1). If the p-value is less than your chosen significance level, we can reject the null hypothesis and say that the means differ.
Both types of t-tests follow a key set of assumptions, including:
It’s easy to test for these assumptions using Python (and I have included links to tutorials covering how to do this). Let’s take a look at example walkthroughs of how to conduct both of these tests in Python.
In this section, you’ll learn how to conduct a one-sample t-test in Python. Suppose you are a teacher and have just given a test. You know that the population mean for this test is 85% and you want to see whether the score of the class is significantly different from this population mean.
Let’s start by importing our required function, ttest_1samp() from SciPy and defining our data:
In the code block above, we first imported our required library. We then defined our sample as a list of values and defined our population mean as its own variable.
We can now pass these values into the function, as shown below:
The function returns a test statistic and the corresponding p-value. We can print these values out using f-strings to simplify the labeling , as shown above.
Finally, we can write a simple if-else statement to evaluate whether or not our sample mean is significantly different from the population mean:
We can see that by running this if-else statement, that our test indicates that there is no significant difference in the exam scores.
In order to calculate the different one-sample t-test alternative hypotheses, we can use the alternative= parameter:
Now that you have a strong understanding of how to perform a one-sample t-test, let’s dive into the exciting world of two-sample t-tests!
A two-sample t-test is used to test whether the means of two samples are equal. The test requires that both samples be normally distributed, have similar variances, and be independent of one another.
Imagine that we want to compare the test scores of two different classes. This is the perfect example of when to use a t-test. Let’s begin by running a two-tailed test, which only evaluates whether or not the two means are equal. It begins with the null hypothesis, which states that the two means are equal.
Let’s take a look at how we can run a two-tailed t-test in Python:
We can see that the ttest_ind() function returns both a test statistic and a p-value. We can run a simple if-else statement to check whether or not we can reject or fail to reject the null hypothesis:
We can see that there is a significant difference between the two sets of scores. However, the two-tailed test doesn’t tell us in which direction.
In order to do this, we need to use a right- or left-tailed two-sample t-test. To do this in SciPy, we use the alternative= parameter. By default, this is set to 'two-sided' . However, we can modify this to either 'less' or 'greater' , if we want to evaluate whether or not the mean for one sample is less than or greater than another.
Let’s see how we can check if the mean of class 2 is significantly higher than that of class 1:
Because our p-value is less than our defined value of 0.05, we can say that the mean of class 2 is higher with statistical significance.
In conclusion, this comprehensive guide has equipped you with the knowledge and practical skills to perform t-tests in Python using the SciPy library. T-tests are invaluable tools for assessing statistical significance, particularly when working with smaller sample sizes.
Throughout this tutorial, you’ve gained insights into:
Remember that t-tests come with certain assumptions, and it’s crucial to validate them before applying these tests to your data. Python provides tools to check these assumptions, ensuring the robustness and reliability of your statistical analyses.
To learn more about these functions, check out the official documentation for the one-sample t-test and for the two-sample t-test in SciPy.
Nik is the author of datagy.io and has over a decade of experience working with data analytics, data science, and Python. He specializes in teaching developers how to use Python for data science using hands-on tutorials. View Author posts
Your email address will not be published. Required fields are marked *
Save my name, email, and website in this browser for the next time I comment.
It's the initial building block in the scientific method.
What makes a hypothesis testable.
Bibliography.
A scientific hypothesis is a tentative, testable explanation for a phenomenon in the natural world. It's the initial building block in the scientific method . Many describe it as an "educated guess" based on prior knowledge and observation. While this is true, a hypothesis is more informed than a guess. While an "educated guess" suggests a random prediction based on a person's expertise, developing a hypothesis requires active observation and background research.
The basic idea of a hypothesis is that there is no predetermined outcome. For a solution to be termed a scientific hypothesis, it has to be an idea that can be supported or refuted through carefully crafted experimentation or observation. This concept, called falsifiability and testability, was advanced in the mid-20th century by Austrian-British philosopher Karl Popper in his famous book "The Logic of Scientific Discovery" (Routledge, 1959).
A key function of a hypothesis is to derive predictions about the results of future experiments and then perform those experiments to see whether they support the predictions.
A hypothesis is usually written in the form of an if-then statement, which gives a possibility (if) and explains what may happen because of the possibility (then). The statement could also include "may," according to California State University, Bakersfield .
Here are some examples of hypothesis statements:
A useful hypothesis should be testable and falsifiable. That means that it should be possible to prove it wrong. A theory that can't be proved wrong is nonscientific, according to Karl Popper's 1963 book " Conjectures and Refutations ."
An example of an untestable statement is, "Dogs are better than cats." That's because the definition of "better" is vague and subjective. However, an untestable statement can be reworded to make it testable. For example, the previous statement could be changed to this: "Owning a dog is associated with higher levels of physical fitness than owning a cat." With this statement, the researcher can take measures of physical fitness from dog and cat owners and compare the two.
In an experiment, researchers generally state their hypotheses in two ways. The null hypothesis predicts that there will be no relationship between the variables tested, or no difference between the experimental groups. The alternative hypothesis predicts the opposite: that there will be a difference between the experimental groups. This is usually the hypothesis scientists are most interested in, according to the University of Miami .
For example, a null hypothesis might state, "There will be no difference in the rate of muscle growth between people who take a protein supplement and people who don't." The alternative hypothesis would state, "There will be a difference in the rate of muscle growth between people who take a protein supplement and people who don't."
If the results of the experiment show a relationship between the variables, then the null hypothesis has been rejected in favor of the alternative hypothesis, according to the book " Research Methods in Psychology " (BCcampus, 2015).
There are other ways to describe an alternative hypothesis. The alternative hypothesis above does not specify a direction of the effect, only that there will be a difference between the two groups. That type of prediction is called a two-tailed hypothesis. If a hypothesis specifies a certain direction — for example, that people who take a protein supplement will gain more muscle than people who don't — it is called a one-tailed hypothesis, according to William M. K. Trochim , a professor of Policy Analysis and Management at Cornell University.
Sometimes, errors take place during an experiment. These errors can happen in one of two ways. A type I error is when the null hypothesis is rejected when it is true. This is also known as a false positive. A type II error occurs when the null hypothesis is not rejected when it is false. This is also known as a false negative, according to the University of California, Berkeley .
A hypothesis can be rejected or modified, but it can never be proved correct 100% of the time. For example, a scientist can form a hypothesis stating that if a certain type of tomato has a gene for red pigment, that type of tomato will be red. During research, the scientist then finds that each tomato of this type is red. Though the findings confirm the hypothesis, there may be a tomato of that type somewhere in the world that isn't red. Thus, the hypothesis is true, but it may not be true 100% of the time.
The best hypotheses are simple. They deal with a relatively narrow set of phenomena. But theories are broader; they generally combine multiple hypotheses into a general explanation for a wide range of phenomena, according to the University of California, Berkeley . For example, a hypothesis might state, "If animals adapt to suit their environments, then birds that live on islands with lots of seeds to eat will have differently shaped beaks than birds that live on islands with lots of insects to eat." After testing many hypotheses like these, Charles Darwin formulated an overarching theory: the theory of evolution by natural selection.
"Theories are the ways that we make sense of what we observe in the natural world," Tanner said. "Theories are structures of ideas that explain and interpret facts."
Encyclopedia Britannica. Scientific Hypothesis. Jan. 13, 2022. https://www.britannica.com/science/scientific-hypothesis
Karl Popper, "The Logic of Scientific Discovery," Routledge, 1959.
California State University, Bakersfield, "Formatting a testable hypothesis." https://www.csub.edu/~ddodenhoff/Bio100/Bio100sp04/formattingahypothesis.htm
Karl Popper, "Conjectures and Refutations," Routledge, 1963.
Price, P., Jhangiani, R., & Chiang, I., "Research Methods of Psychology — 2nd Canadian Edition," BCcampus, 2015.
University of Miami, "The Scientific Method" http://www.bio.miami.edu/dana/161/evolution/161app1_scimethod.pdf
William M.K. Trochim, "Research Methods Knowledge Base," https://conjointly.com/kb/hypotheses-explained/
University of California, Berkeley, "Multiple Hypothesis Testing and False Discovery Rate" https://www.stat.berkeley.edu/~hhuang/STAT141/Lecture-FDR.pdf
University of California, Berkeley, "Science at multiple levels" https://undsci.berkeley.edu/article/0_0_0/howscienceworks_19
Get the world’s most fascinating discoveries delivered straight to your inbox.
Hot Tub of Despair: The deadly ocean pool that traps and pickles creatures that fall in
Enormous deposit of rare earth elements discovered in heart of ancient Norwegian volcano
Melatonin may stave off age-related vision loss, study hints
Help | Advanced Search
Title: null hypothesis bayes factor estimates can be biased in (some) common factorial designs: a simulation study.
Abstract: Bayes factor null hypothesis tests provide a viable alternative to frequentist measures of evidence quantification. Bayes factors for realistic interesting models cannot be calculated exactly, but have to be estimated, which involves approximations to complex integrals. Crucially, the accuracy of these estimates, i.e., whether an estimated Bayes factor corresponds to the true Bayes factor, is unknown, and may depend on data, prior, and likelihood. We have recently developed a novel statistical procedure, namely simulation-based calibration (SBC) for Bayes factors, to test for a given analysis, whether the computed Bayes factors are accurate. Here, we use SBC for Bayes factors to test for some common cognitive designs, whether Bayes factors are estimated accurately. We use the bridgesampling/brms packages as well as the BayesFactor package in R. We find that Bayes factor estimates are accurate and exhibit only little bias in Latin square designs with (a) random effects for subjects only and (b) for crossed random effects for subjects and items, but a single fixed-factor. However, Bayes factor estimates turn out biased and liberal in a 2x2 design with crossed random effects for subjects and items. These results suggest that researchers should test for their individual analysis, whether Bayes factor estimates are accurate. Moreover, future research is needed to determine the boundary conditions under which Bayes factor estimates are accurate or biased, as well as software development to improve estimation accuracy.
Comments: | arXiv admin note: text overlap with |
Subjects: | Methodology (stat.ME) |
Cite as: | [stat.ME] |
(or [stat.ME] for this version) | |
Focus to learn more arXiv-issued DOI via DataCite |
Access paper:.
Code, data and media associated with this article, recommenders and search tools.
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .
IMAGES
VIDEO
COMMENTS
The null and alternative hypotheses offer competing answers to your research question. When the research question asks "Does the independent variable affect the dependent variable?": The null hypothesis ( H0) answers "No, there's no effect in the population.". The alternative hypothesis ( Ha) answers "Yes, there is an effect in the ...
The statement that is being tested against the null hypothesis is the alternative hypothesis. Alternative hypothesis is often denoted as H a or H 1. In statistical hypothesis testing, to prove the alternative hypothesis is true, it should be shown that the data is contradictory to the null hypothesis. Namely, there is sufficient evidence ...
Null hypothesis: µ ≥ 70 inches. Alternative hypothesis: µ < 70 inches. A two-tailed hypothesis involves making an "equal to" or "not equal to" statement. For example, suppose we assume the mean height of a male in the U.S. is equal to 70 inches. The null and alternative hypotheses in this case would be: Null hypothesis: µ = 70 inches.
The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. H 0, the —null hypothesis: a statement of no difference between sample means or proportions or no difference between a sample mean or proportion and a population mean or proportion. In other words, the difference equals 0.
HYPOTHESIS TESTING. A clinical trial begins with an assumption or belief, and then proceeds to either prove or disprove this assumption. In statistical terms, this belief or assumption is known as a hypothesis. Counterintuitively, what the researcher believes in (or is trying to prove) is called the "alternate" hypothesis, and the opposite ...
In hypothesis testing, the goal is to see if there is sufficient statistical evidence to reject a presumed null hypothesis in favor of a conjectured alternative hypothesis.The null hypothesis is usually denoted \(H_0\) while the alternative hypothesis is usually denoted \(H_1\). An hypothesis test is a statistical decision; the conclusion will either be to reject the null hypothesis in favor ...
Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories. ... The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis ...
Thus, our alternative hypothesis is the mathematical way of stating our research question. If we expect our obtained sample mean to be above or below the null hypothesis value, which we call a directional hypothesis, then our alternative hypothesis takes the form: HA: μ > 7.47 or HA: μ < 7.47 H A: μ > 7.47 or H A: μ < 7.47.
The alternative hypothesis is one of two mutually exclusive hypotheses in a hypothesis test. The alternative hypothesis states that a population parameter does not equal a specified value. Typically, this value is the null hypothesis value associated with no effect, such as zero.If your sample contains sufficient evidence, you can reject the null hypothesis and favor the alternative hypothesis.
Alternative hypothesis. by Marco Taboga, PhD. In a statistical test, observed data is used to decide whether or not to reject a restriction on the data-generating probability distribution. The assumption that the restriction is true is called null hypothesis, while the statement that the restriction is not true is called alternative hypothesis.
Alternative Hypothesis. The alternative hypothesis is the other theory about the properties of the population in hypothesis testing. Typically, the alternative hypothesis states that a population parameter does not equal the null hypothesis value. In other words, there is a non-zero effect.
The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. \(H_0\): The null hypothesis: It is a statement of no difference between the variables—they are not related. This can often be considered the status quo and as a result if you cannot accept the null it requires some action.
The null hypothesis and alternative hypothesis are statements regarding the differences or effects that occur in the population. You will use your sample to test which statement (i.e., the null hypothesis or alternative hypothesis) is most likely (although technically, you test the evidence against the null hypothesis).
The alternative hypothesis is a statement of a range of alternative values in which the parameter may fall. One must also check that any conditions (assumptions) needed to run the test have been satisfied e.g. normality of data, independence, and number of success and failure outcomes.
The actual test begins by considering two hypotheses.They are called the null hypothesis and the alternative hypothesis.These hypotheses contain opposing viewpoints. H 0: The null hypothesis: It is a statement about the population that either is believed to be true or is used to put forth an argument unless it can be shown to be incorrect beyond a reasonable doubt.
It is the opposite of your research hypothesis. The alternative hypothesis--that is, the research hypothesis--is the idea, phenomenon, observation that you want to prove. If you suspect that girls take longer to get ready for school than boys, then: Alternative: girls time > boys time. Null: girls time <= boys time.
The null hypothesis ( H0. H 0. ) is a statement about the population that either is believed to be true or is used to put forth an argument unless it can be shown to be incorrect beyond a reasonable doubt. The alternative hypothesis ( Ha. H a. ) is a claim about the population that is contradictory to H0. H 0.
In hypothesis testing, an alternative theory is a statement which a researcher is testing. This statement is true from the researcher's point of view and ultimately proves to reject the null to replace it with an alternative assumption. In this hypothesis, the difference between two or more variables is predicted by the researchers, such that ...
Most technical papers rely on just the first formulation, even though you may see some of the others in a statistics textbook. Null hypothesis: " x is equal to y .". Alternative hypothesis " x is not equal to y .". Null hypothesis: " x is at least y .". Alternative hypothesis " x is less than y .". Null hypothesis: " x is at ...
Thus, our alternative hypothesis is the mathematical way of stating our research question. If we expect our obtained sample mean to be above or below the null hypothesis value, which we call a directional hypothesis, then our alternative hypothesis takes the form: HA: μ > 7.47 or HA: μ < 7.47 H A: μ > 7.47 or H A: μ < 7.47.
This is also the case with hypothesis testing: even if we fail to reject the null hypothesis, we typically do not accept the null hypothesis as true. Failing to find strong evidence for the alternative hypothesis is not equivalent to accepting the null hypothesis. 17 H 0: The average cost is $650 per month, μ = $650.
Neyman - We must perform a hypothesis test between a null and an alternative. The test is such that it would result in type-1 errors at a fixed, pre-specified rate, α. α. The outcome is a decision - to reject or not reject the null hypothesis at the level α. α.
In order to calculate the different one-sample t-test alternative hypotheses, we can use the alternative= parameter: alternative='two-sided' is the default value, checking for a two-sided alternative hypothesis; alternative='less' checks whether the provided mean is less than the population mean;
Formula for Chapter 10 Statistical Inference Concerning Two Populations (BSTAT 3321) Test Statistics: Part 1-1. Hypothesis Test for the difference between two means when population variances ( σ 1 2 ∧ σ 2 2 ¿ are Known ( z -Test: Two-Sample for Means in Excel) z = ( x 1 − x 2 ) − d 0 √ σ 1 2 n 1 + σ 2 2 n 2 Part 1-2-1.
A scientific hypothesis is a tentative, testable explanation for a phenomenon in the natural world. It's the initial building block in the scientific method. Many describe it as an "educated guess ...
Bayes factor null hypothesis tests provide a viable alternative to frequentist measures of evidence quantification. Bayes factors for realistic interesting models cannot be calculated exactly, but have to be estimated, which involves approximations to complex integrals. Crucially, the accuracy of these estimates, i.e., whether an estimated Bayes factor corresponds to the true Bayes factor, is ...