Statology

Statistics Made Easy

The Complete Guide: Hypothesis Testing in Excel

In statistics, a hypothesis test is used to test some assumption about a population parameter .

There are many different types of hypothesis tests you can perform depending on the type of data you’re working with and the goal of your analysis.

This tutorial explains how to perform the following types of hypothesis tests in Excel:

  • One sample t-test
  • Two sample t-test
  • Paired samples t-test
  • One proportion z-test
  • Two proportion z-test

Let’s jump in!

Example 1: One Sample t-test in Excel

A one sample t-test is used to test whether or not the mean of a population is equal to some value.

For example, suppose a botanist wants to know if the mean height of a certain species of plant is equal to 15 inches.

To test this, she collects a random sample of 12 plants and records each of their heights in inches.

She would write the hypotheses for this particular one sample t-test as follows:

  • H 0 :  µ = 15
  • H A :  µ ≠15

Refer to this tutorial for a step-by-step explanation of how to perform this hypothesis test in Excel.

Example 2: Two Sample t-test in Excel

A two sample t-test is used to test whether or not the means of two populations are equal.

For example, suppose researchers want to know whether or not two different species of plants have the same mean height.

To test this, they collect a random sample of 20 plants from each species and measure their heights.

The researchers would write the hypotheses for this particular two sample t-test as follows:

  • H 0 :  µ 1 = µ 2
  • H A :  µ 1 ≠ µ 2

Example 3: Paired Samples t-test in Excel

A paired samples t-test is used to compare the means of two samples when each observation in one sample can be paired with an observation in the other sample.

For example, suppose we want to know whether a certain study program significantly impacts student performance on a particular exam.

To test this, we have 20 students in a class take a pre-test. Then, we have each of the students participate in the study program for two weeks. Then, the students retake a post-test of similar difficulty.

We would write the hypotheses for this particular two sample t-test as follows:

  • H 0 :  µ pre = µ post
  • H A :  µ pre ≠ µ post

Example 4: One Proportion z-test in Excel

A  one proportion z-test  is used to compare an observed proportion to a theoretical one.

For example, suppose a phone company claims that 90% of its customers are satisfied with their service.

To test this claim, an independent researcher gathered a simple random sample of 200 customers and asked them if they are satisfied with their service.

  • H 0 : p = 0.90
  • H A : p ≠ 0.90

Example 5: Two Proportion z-test in Excel

A two proportion z-test is used to test for a difference between two population proportions.

For example, suppose a s uperintendent of a school district claims that the percentage of students who prefer chocolate milk over regular milk in school cafeterias is the same for school 1 and school 2.

To test this claim, an independent researcher obtains a simple random sample of 100 students from each school and surveys them about their preferences.

  • H 0 : p 1 = p 2
  • H A : p 1  ≠ p 2

Featured Posts

5 Regularization Techniques You Should Know

Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike.  My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Join the Statology Community

Sign up to receive Statology's exclusive study resource: 100 practice problems with step-by-step solutions. Plus, get our latest insights, tutorials, and data analysis tips straight to your inbox!

By subscribing you accept Statology's Privacy Policy.

ExcelDemy

How to Do a T Test in Excel (2 Ways with Interpretation of Results)

Md. Meraz al Nahian

The article will show you how to do a T Test in Excel. T-Tests are hypothesis tests that evaluate one or two groups’ means. Hypothesis tests employ sample data to infer population traits. In this lesson, we will look at the different types of T-Tests , and how to run T-Tests in Excel. We’ll go over both paired and two sample T-Tests , with detailed instructions on how to prepare your data, run the test, and interpret the findings.

Understanding how to use the T.TEST function in Excel will improve your ability to draw significant insights and make data-driven decisions, whether you’re a student, researcher, business analyst, or anybody else who works with data. Let’s say, you’re doing education research to assess the efficacy between traditional and new approaches. T-tests will guide you through providing the mean scores of students based on the approaches that they were taught. So that, you can make a decision based on the students’ performance.

Download Practice Workbook

T Test.xlsx

T Test Type

There are basically two types of t-tests. They are:

  • One-tailed t-test
  • Two-tailed t-test

Each of them has 3 types. They are:

  • Two sample equal variance
  • Two sample unequal variance

We will show you the application of some of these types. The procedure of getting the results for all types of t-tests in Excel are the same. Let’s dig into some details and see how it can be done.

How to Do a T Test in Excel: 2 Effective Ways

1. using excel t.test or ttest function to do t test.

Here, we are going to show you how to determine the T Test result by using formulas. Excel has T.TEST and TTEST functions to operate t-test on different variables. Both functions work similarly. First, we will cover how to determine the t-test value of two sample variables with equal variance.

1.1 Two Sample Equal Variance T Test

In the dataset, you will see the prices of different laptops and smartphones. Here is a formula that performs a T Test on the prices of these products and returns the t-test result.

=T.TEST(B5:B14,C5:C14,2,2)

Calculating Two Sample T-Test Result by Formula

We set the 3rd argument of the function to 2 as we are doing a two tailed t-test on the dataset. The 4th argument should be 2 for a two sample equal variance t-test.

1.2 Paired T Test

Now, we are going to apply another formula to calculate the Paired T-Test . The following dataset shows the performance mark of some employees in two different criteria.

=T.TEST(C5:C13,D5:D13,2,1)

Calculating Paired T-Test Result by Formula

Note: The explanation of the results is described in the following sections.

2. Using Analysis Toolpak

The above tasks can be done with the Analysis Toolpak Add-in too. The Analysis Toolpak Add-in is not available in the ribbon by default. To initiate it,

  • Go to the Options window first.
  • Next, select Add-ins and click on the Go button beside the Manage section.
  • After that, click OK .

Initiating Analysis Toolpak Add-in

  • Thereafter, the Add-ins window will appear. Select Analysis Toolpak >> click OK again.

Adding Analysis Toolpak Add-in

This Add-in will be added to the ribbon of the Data tab.

2.1 Two Sample Equal Variance T Test

We will do a two sample equal variance t-test using the Analysis Toolpak here. We used the dataset that contains the prices of laptops and smartphones. For this purpose,

  • Click on the Data Analysis button from the ribbon of the Data tab.
  • The Data Analysis features will appear. Select t-Test: Two Sample Assuming Equal Variances and click OK .

Opening Two Sample T Test by Analysis Toolpak

  • After that, you need to set up the parameters for the t-test operation. Insert the Laptop and Smartphone prices as Variable 1 Range and Variable 2 Range Include the headings in the range and check Labels.
  • Next, set the value of Hypothesized Mean Difference to 0 .
  • Finally, select an Output option of your preference and click OK .

Setting up Parameters for Two Sample T-Test

As we have chosen a New Worksheet for the outputs, we will see the results in a new sheet.

Showing T-Test Result for Two Sample Test

Now, let’s get to the discussion on the results.

Comments on Results

The output shows that the mean values for Laptops and Smartphones are 1608.85 and 1409.164 respectively. We can see from the Variances row that they are not precisely equal, but they are close enough to be assumed to have equal variances. The most relevant metric is the p-value .

The difference between means is statistically significant if the p-value is less than your significance level. Excel calculates p-values for one- and two-tailed T Tests .

One-tailed T Tests can detect only one direction of difference between means. A one-tailed test, for example, might only evaluate whether Smartphones have higher prices than Laptops . Two-tailed tests can reveal differences that are larger or smaller than. There are some other disadvantages to utilizing one-tailed testing, so I’ll continue with the conventional two-tailed results.

For our results, we’ll utilize P(T=t) two-tail, which is the p-value for the t-test’s two-tailed version. We cannot reject the null hypothesis because our p-value ( 0.095639932 ) is greater than the conventional significance level of 0.05 . The hypothesis that the population means differ is supported by our sample data. The mean price of Laptops is greater than the mean price of Smartphones’ .

The Analysis Toolpak operation also returns results for one-tailed t-test . Here, the one-tailed P value of two sample equal variance t-test is 1.734 .

2.2 Paired T Test

Similarly, you can find out the Paired t-Test result for the dataset containing employee performances. Just select the t-Test: Paired Two Samples for Mean when you open the Data Analysis window.

Showing T-Test Result for Paired Test

The result shows that the mean for the Workpace is 104 and the mean for the Efficiency is 96.56 .

The difference between means is statistically significant if the p-value is less than your significance level. For our results, we’ll utilize P(T=t) two-tail, which is the p-value for the t-test’s two-tailed version. We cannot reject the null hypothesis because our p-value ( 0.188 ) is greater than the conventional significance level of 0.05 . The hypothesis that the population means differ is supported by our sample data. In particular, the Workpace mean exceeds the Efficiency mean.

How to Interpret t-Test Results in Excel

Although we explained the results of the t-Test earlier, we didn’t show the proper interpretation. So here, I’ll show you the interpretation of the two sample equal variance t-test.

Let’s bring out the results again first.

Two Sample Equal Variance t-Test Interpretation

  • The mean of laptop prices = 1608.85
  • The mean of smartphone prices = 1409.164

ii. Variance

  • The variance of laptop prices = 77622.597
  • The variance of smartphone prices = 51313.7904

iii. Observations

The number of observations for both laptops and smartphones are 10 .

iv. Pooled Variance

The samples’ average variance, calculated by pooling the variances of each sample.

The mathematical formula for this parameter is:

((No of observations of Sample 1-1)*(Variance of Sample 1) + (No of observations of Sample 2-1)*(Variance of Sample 2))/(No of observations of Sample 1 + No of observations of Sample 2 – 2)

So it becomes: ((10-1)*77622.59676+(10-1)*51313.7904)/(10+10-2) = 64468.19358

v. Hypothesized Mean Difference

We “hypothesize” that the number is the difference between the two population means. In this situation, we chose 0 because we want to see if the difference between the means of the two populations is zero.

It indicates the value of the Degrees of Freedom. Formula for this parameter is:

No of observations of Sample 1 + No of observations of Sample 2 – 2 = 10 + 10 – 2 = 18

vii. t Stat

The test statistic value of the t-Test operation.

The formula for this parameter is given below.

(Mean of Sample 1 – Mean of Sample 2)/(Square root of (Pooling Variance* (1/No of observations of Sample 1 + 1/No of observations of Sample 2)))

So it becomes: (1608.85 – 1409.164)/Sqrt(64468.19358 * (1/10 + 1/10)) = 1.758570846

viii. P(T<=t) two-tail

A two-tailed t-test’s p-value. This value can be found by entering t = 1.758570846 with 18 degrees of freedom into any T Score to P Value Calculator.

In this situation, the value of p is 0.095639932 . Because this is greater than 0.05 , we cannot reject the null hypothesis. This suggests that we lack adequate evidence to conclude that the two population means differ.

ix. t Critical two-tail

This is the test’s crucial value. A t Critical value Calculator with 18 degrees of freedom and a 95% confidence level can be used to calculate this number.

In this instance, the critical value is 2.10092204 . We cannot reject the null hypothesis because our test statistic t is less than this number. Again, we lack adequate information to conclude that the two population means are distinct.

Things to Remember

  • Excel demands that your data be arranged in columns, with data from each group in a separate column. The first row should have labels or headers.
  • Clearly state your null hypothesis (usually that there is no significant difference between the group means) and your alternative hypothesis (the opposite of the null hypothesis).
  • As a result of the t-test, Excel returns the p-value. A little p-value (usually less than the specified alpha level) indicates that the null hypothesis may be rejected and that there is a substantial difference between the group means.

Frequently Asked Questions

1. Can I perform a t-test on unequal sample sizes in Excel?

Answer: Yes, you can use the T.TEST function to do a t-test on unequal sample sizes. When calculating the test statistic, Excel automatically accounts for unequal sample sizes.

2. What is the difference between a one-tailed and a two-tailed t-test?

Answer: A one-tailed t-test determines if the means of the two groups differ substantially in a given direction (e.g., greater or smaller). A two-tailed t-test looks for any significant difference, regardless of direction.

3. Can I calculate effect size in Excel for t-tests?

Answer: While there is no built-in tool in Excel to calculate effect size, you may manually compute Cohen’s d for independent t-tests and paired sample correlations for paired t-tests using Excel’s basic mathematical operations.

In the end, we can conclude that you will learn some basic ideas on how to do a t Test in Excel. If you have any questions or feedback regarding this article, please share them in the comment section. Your valuable ideas will enrich my Excel expertise and hence the content of my upcoming articles.

<< Go Back to Statistical Significance in Excel | Excel for Statistics  |  Learn Excel

What is ExcelDemy?

Tags: Statistical Significance in Excel

Meraz Al Nahian

Md. Meraz Al Nahian has worked with the ExcelDemy project for over 1.5 years. He wrote 140+ articles for ExcelDemy. He also solved a lot of user problems and worked on dashboards. He is interested in data analysis, advanced Excel, statistics, and dashboards. He also likes to explore various Excel and VBA applications. He completed his graduation in Electrical & Electronic Engineering from Bangladesh University of Engineering & Technology (BUET). He enjoys exploring Excel-related features to gain efficiency... Read Full Bio

Leave a reply Cancel reply

ExcelDemy is a place where you can learn Excel, and get solutions to your Excel & Excel VBA-related problems, Data Analysis with Excel, etc. We provide tips, how to guide, provide online training, and also provide Excel solutions to your business problems.

Contact  |  Privacy Policy  |  TOS

  • User Reviews
  • List of Services
  • Service Pricing

trustpilot review

  • Create Basic Excel Pivot Tables
  • Excel Formulas and Functions
  • Excel Charts and SmartArt Graphics
  • Advanced Excel Training
  • Data Analysis Excel for Beginners

DMCA.com Protection Status

Advanced Excel Exercises with Solutions PDF

ExcelDemy

The Complete Guide: Hypothesis Testing in Excel

In statistics, a hypothesis test is used to test some assumption about a population parameter .

There are many different types of hypothesis tests you can perform depending on the type of data you’re working with and the goal of your analysis.

This tutorial explains how to perform the following types of hypothesis tests in Excel:

  • One sample t-test
  • Two sample t-test
  • Paired samples t-test
  • One proportion z-test
  • Two proportion z-test

Let’s jump in!

Example 1: One Sample t-test in Excel

A one sample t-test is used to test whether or not the mean of a population is equal to some value.

For example, suppose a botanist wants to know if the mean height of a certain species of plant is equal to 15 inches.

To test this, she collects a random sample of 12 plants and records each of their heights in inches.

She would write the hypotheses for this particular one sample t-test as follows:

  • H 0 :  µ = 15
  • H A :  µ ≠15

Refer to this tutorial for a step-by-step explanation of how to perform this hypothesis test in Excel.

Example 2: Two Sample t-test in Excel

A two sample t-test is used to test whether or not the means of two populations are equal.

For example, suppose researchers want to know whether or not two different species of plants have the same mean height.

To test this, they collect a random sample of 20 plants from each species and measure their heights.

The researchers would write the hypotheses for this particular two sample t-test as follows:

  • H 0 :  µ 1 = µ 2
  • H A :  µ 1 ≠ µ 2

Example 3: Paired Samples t-test in Excel

A paired samples t-test is used to compare the means of two samples when each observation in one sample can be paired with an observation in the other sample.

For example, suppose we want to know whether a certain study program significantly impacts student performance on a particular exam.

To test this, we have 20 students in a class take a pre-test. Then, we have each of the students participate in the study program for two weeks. Then, the students retake a post-test of similar difficulty.

We would write the hypotheses for this particular two sample t-test as follows:

  • H 0 :  µ pre = µ post
  • H A :  µ pre ≠ µ post

Example 4: One Proportion z-test in Excel

A  one proportion z-test  is used to compare an observed proportion to a theoretical one.

For example, suppose a phone company claims that 90% of its customers are satisfied with their service.

To test this claim, an independent researcher gathered a simple random sample of 200 customers and asked them if they are satisfied with their service.

  • H 0 : p = 0.90
  • H A : p ≠ 0.90

Example 5: Two Proportion z-test in Excel

A two proportion z-test is used to test for a difference between two population proportions.

For example, suppose a s uperintendent of a school district claims that the percentage of students who prefer chocolate milk over regular milk in school cafeterias is the same for school 1 and school 2.

To test this claim, an independent researcher obtains a simple random sample of 100 students from each school and surveys them about their preferences.

  • H 0 : p 1 = p 2
  • H A : p 1  ≠ p 2

How to Change Axis Scales in Google Sheets Plots

Statistics vs. analytics: what’s the difference, related posts, how to create a stem-and-leaf plot in spss, how to create a correlation matrix in spss, how to convert date of birth to age..., excel: how to highlight entire row based on..., how to add target line to graph in..., excel: how to use if function with negative..., excel: how to use if function with text..., excel: how to use greater than or equal..., excel: how to use if function with multiple..., how to extract number from string in pandas.

Excel Dashboards

Excel Tutorial: How To Do A Hypothesis Test In Excel

Introduction.

Welcome to our Excel tutorial on how to conduct a hypothesis test using Excel. Hypothesis testing is a crucial component of statistical analysis, allowing us to make inferences about a population based on sample data. Using Excel for hypothesis testing offers several advantages, including its familiarity, ease of use, and the ability to perform complex statistical calculations with just a few clicks.

Key Takeaways

  • Hypothesis testing is essential for making inferences about a population based on sample data.
  • Using Excel for hypothesis testing offers familiarity, ease of use, and the ability to perform complex statistical calculations.
  • Organizing and formatting data correctly in Excel is crucial for hypothesis testing.
  • Understanding the different types of hypothesis tests and selecting the appropriate test is important for accurate analysis.
  • Interpreting the results of the hypothesis test and avoiding common mistakes is essential for making valid conclusions.

Setting up the data in Excel

When conducting a hypothesis test in Excel, it is crucial to properly organize and format your data in a spreadsheet. This will ensure accurate and reliable results.

  • Start by opening a new Excel spreadsheet and entering your raw data into the cells. It is important to have a clear understanding of the variables you are working with and how they relate to each other.
  • Label each column with a clear and descriptive header to identify the variables being tested. This will help you keep track of the data and make it easier to analyze.
  • Arrange the data in a logical and organized manner, such as grouping similar data together and using separate columns for different variables.
  • Check that the data is formatted correctly, especially if it includes dates, currency, or percentages. Use the appropriate formatting options in Excel to ensure the data is displayed accurately.
  • Remove any unnecessary formatting, such as extra spaces or special characters, to avoid errors in the analysis process.
  • Double-check for any missing or erroneous data entries, and make sure that the data is complete and accurate before proceeding with the hypothesis test.

Choosing the Appropriate Test in Excel

When conducting a hypothesis test in Excel, it's crucial to choose the right test for your specific scenario. Understanding the different types of hypothesis tests and how to select the appropriate one is essential for accurate and meaningful results.

Parametric Tests:

Nonparametric tests:, one-sample, two-sample, and paired tests:, goodness-of-fit tests:, chi-square tests:.

Choosing the right hypothesis test in Excel requires careful consideration of the nature of the data and the specific research question. Here are some key factors to consider when selecting the appropriate test:

  • Understanding the Data: Determine whether the data is continuous or categorical, and whether it follows a specific distribution.
  • Research Question: Clearly define the research question and the type of comparison or relationship being investigated.
  • Sample Size: Consider the size of the sample and whether it meets the assumptions of the chosen test.
  • Dependent or Independent Variables: Determine whether the variables are independent or related in some way, as this will impact the choice of test.
  • Assumptions: Ensure that the chosen test aligns with any specific assumptions or conditions required for accurate results.

Conducting the hypothesis test

When it comes to conducting a hypothesis test in Excel, there are a few key steps to follow in order to ensure accurate results. These steps include using the Data Analysis Toolpak and inputting the necessary parameters for the test.

The Data Analysis Toolpak is a powerful add-in for Excel that provides a variety of data analysis tools, including the ability to conduct hypothesis tests. To access the Toolpak, simply go to the "Data" tab, click on "Data Analysis" in the Analysis group, and select "t-Test: Two-Sample Assuming Equal Variances" for a two-sample t-test, or "t-Test: Paired Two Sample for Means" for a paired t-test.

Once the Data Analysis Toolpak is open, you will need to input the necessary parameters for the hypothesis test. This includes selecting the appropriate variables for analysis, specifying the significance level, and choosing whether to perform a one-tailed or two-tailed test. It is important to carefully review and input the correct parameters to ensure the accuracy of the test results.

By using the Data Analysis Toolpak in Excel and inputting the necessary parameters for the hypothesis test, you can effectively conduct hypothesis tests and analyze your data with confidence.

Interpreting the results

After performing a hypothesis test in Excel, it is important to understand how to interpret the results and make conclusions based on the data.

Identify the test statistic:

Look at the p-value:, consider the confidence interval:, check for statistical significance:, reject or fail to reject the null hypothesis:, consider the practical significance:, communicate the findings:, common mistakes to avoid.

When conducting a hypothesis test in Excel, there are some common mistakes that researchers often make. By being aware of these pitfalls, you can ensure that your results are accurate and reliable.

One of the most common mistakes when doing a hypothesis test in Excel is misinterpreting the results. It's important to carefully analyze the output of the test and understand what it is telling you. Avoid jumping to conclusions without thoroughly examining the data and the significance level.

Another mistake to avoid is using the wrong test for the hypothesis you are trying to test. Excel offers a variety of hypothesis tests, such as t-tests, F-tests, and chi-squared tests, among others. It's crucial to select the appropriate test for your specific research question and data set. Using the wrong test can lead to inaccurate results and conclusions.

In conclusion, hypothesis testing in Excel is a crucial tool for making data-driven decisions in various fields, from business to science. By using Excel, we can effectively analyze data and draw meaningful conclusions about our hypotheses.

As with any skill, practice makes perfect . So, I encourage you to continue exploring and practicing hypothesis testing in Excel. There are numerous resources available online that provide additional guidance and examples to help you master this valuable technique.

Excel Dashboard

Immediate Download

MAC & PC Compatible

Free Email Support

Related aticles

Mastering Excel Dashboards for Data Analysts

The Benefits of Excel Dashboards for Data Analysts

Exploring the Power of Real-Time Data Visualization with Excel Dashboards

Unlock the Power of Real-Time Data Visualization with Excel Dashboards

How to Connect Your Excel Dashboard to Other Platforms for More Focused Insights

Unlocking the Potential of Excel's Data Dashboard

10 Keys to Designing a Dashboard with Maximum Impact in Excel

Unleashing the Benefits of a Dashboard with Maximum Impact in Excel

Essential Features for Data Exploration in Excel Dashboards

Exploring Data Easily and Securely: Essential Features for Excel Dashboards

Real-Time Dashboard Updates in Excel

Unlock the Benefits of Real-Time Dashboard Updates in Excel

Interpreting Excel Dashboards: From Data to Action

Unleashing the Power of Excel Dashboards

Different Approaches to Excel Dashboard Design and Development

Understanding the Benefits and Challenges of Excel Dashboard Design and Development

Best Excel Dashboard Tips for Smarter Data Visualization

Leverage Your Data with Excel Dashboards

How to Create Effective Dashboards in Microsoft Excel

Crafting the Perfect Dashboard for Excel

Dashboards in Excel: Managing Data Analysis and Visualization

An Introduction to Excel Dashboards

Best Practices for Designing an Insightful Excel Dashboard

How to Create an Effective Excel Dashboard

  • Choosing a selection results in a full page refresh.

statistical hypothesis testing in excel

Statistical Hypothesis Testing with Microsoft ® Office Excel ®

  • © 2022
  • Robert Hirsch 0

Overland Park, USA

You can also search for this author in PubMed   Google Scholar

  • Explains the frequentist (classical) interpretation of hypothesis tests
  • Describes the Bayesian interpretation of hypothesis tests
  • Discusses two approaches to interpretation of hypothesis tests when they are performed on accumulating data

Part of the book series: Synthesis Lectures on Mathematics & Statistics (SLMS)

586 Accesses

This is a preview of subscription content, log in via an institution to check access.

Access this book

  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

Table of contents (5 chapters)

Front matter, logic of hypothesis testing.

Robert Hirsch

Basic Statistical Methods

Interim analysis, planning the sample’s size, back matter.

  • Statistical hypothesis testing
  • Frequentist hypothesis testing
  • Bayesian hypothesis testing
  • P-value interpretation
  • Null hypothesis
  • Alternative hypothesis
  • Hypothesis testing logic
  • Sequential analysis
  • Stochastic curtailment
  • Multiple hypothesis tests
  • Errors in hypothesis testing

About this book

This book provides a comprehensive treatment of the logic behind hypothesis testing. Readers will learn to understand statistical hypothesis testing and how to interpret P -values under a variety of conditions including a single hypothesis test, a collection of hypothesis tests, and tests performed on accumulating data. The author explains how a hypothesis test can be interpreted to draw conclusions, and descriptions of the logic behind frequentist (classical) and Bayesian approaches to interpret the results of a statistical hypothesis test are provided. Both approaches have their own strengths and challenges, and a special challenge presents itself when hypothesis tests are repeatedly performed on accumulating data. Possible pitfalls and methods to interpret hypothesis tests when accumulating data are also analyzed. This book will be of interest to researchers, graduate students, and anyone who has to interpret the results of statistical analyses.

Authors and Affiliations

About the author, bibliographic information.

Book Title : Statistical Hypothesis Testing with Microsoft ® Office Excel ®

Authors : Robert Hirsch

Series Title : Synthesis Lectures on Mathematics & Statistics

DOI : https://doi.org/10.1007/978-3-031-04202-7

Publisher : Springer Cham

eBook Packages : Synthesis Collection of Technology (R0) , eBColl Synthesis Collection 11

Copyright Information : The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022

Hardcover ISBN : 978-3-031-04201-0 Published: 15 July 2022

Softcover ISBN : 978-3-031-04204-1 Published: 16 July 2023

eBook ISBN : 978-3-031-04202-7 Published: 14 July 2022

Series ISSN : 1938-1743

Series E-ISSN : 1938-1751

Edition Number : 1

Number of Pages : X, 87

Number of Illustrations : 18 b/w illustrations, 7 illustrations in colour

Topics : Statistics, general , Statistical Theory and Methods , Bayesian Inference

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

QI Macros for Excel

Six Sigma & SPC Excel Add-in

  • Questions? Contact Us
  • 888-468-1537

Statistical Analysis in QI Macros

Statistics wizard, data normality, hypothesis tests, test of means, equivalence tests, test of variances, test for outliers, test of proportion, test relationship, non-parametric tests.

Hypothesis Testing Cheat Sheet

Knowledge Base | Online User Guide

  • Free 30-Day Trial
  • Powerful SPC Software for Excel
  • SPC - Smart Performance Charts
  • Who Uses QI Macros?
  • What Do Our Customers Say?
  • QI Macros SPC Software Reviews
  • SPC Software Comparison
  • Control Chart
  • Histogram with Cp Cpk
  • Pareto Chart
  • Automated Fishbone Diagram
  • Gage R&R MSA
  • Data Mining Tools
  • Statistical Analysis - Hypothesis Testing
  • Chart and Stat Wizards
  • Lean Six Sigma Excel Templates
  • Technical Support - PC
  • Technical Support - Mac
  • QI Macros FAQs
  • Upgrade History
  • Submit Enhancement Request
  • Data Analysis Services
  • Free QI Macros Webinar
  • Free QI Macros Video Tutorials
  • How to Setup Excel for QI Macros
  • Free Healthcare Data Analytics Course
  • Free Lean Six Sigma Webinars
  • Animated Lean Six Sigma Video Tutorials
  • Free Agile Lean Six Sigma Trainer Training
  • Free White Belt Training
  • Free Yellow Belt Training
  • Free Green Belt Training
  • QI Macros Resources
  • QI Macros Knowledge Base | User Guide
  • Excel Tips and Tricks
  • Lean Six Sigma Resources
  • QI Macros Monthly Newsletter
  • Improvement Insights Blog
  • Buy QI Macros
  • Quantity Discounts and W9
  • Hassle Free Guarantee

QI Macros Reviews CNET Five Star Review Industry Leaders Our Customers

Home » Statistical Analysis Excel » Hypothesis Testing

Struggling with Hypothesis Testing in Excel?

Qi macros makes hypothesis testing easy, even if you don't know anything about statistics.

Run Any Hypothesis Test using QI Macros

  • Select your data.
  • Click on QI Macros menu > Statistical Tools > the test you want
  • QI Macros will do the math and analysis for you.

What is a Hypothesis Test?

A hypothesis test helps identify ways to reduce costs and improve quality. Hypothesis testing asks the question: Are two or more sets of data the same or different, statistically.

For companies working to improve operations, hypothesis tests help identify differences between machines, formulas, raw materials, etc. and whether the differences are statistically significant or not. Without such testing, teams can run around changing machine settings, formulas and so on causing more variation. These knee-jerk responses can amplify variation and cause more problems than doing nothing at all.

Three Types of Hypothesis Tests

  • Classical Method - comparing a test statistic to a critical value
  • p Value Method - the probability of a test statistic being contrary to the null hypothesis
  • Confidence Interval Method - is the test statistic between or outside of the confidence interval

How to Conduct a Hypothesis Test

  • Define the  null (H0) and an alternate (Ha) hypothesis .
  • Conduct the test.
  • Calculate the test statistic and the critical value (t-Test, F-test, z-Test, ANOVA, etc.).
  • Calculate a p value and compare it to a significance level (a) or confidence level (1-a).
  • Interpret the results to determine if you "cannot reject null hypothesis (accept null hypothesis)" or "reject the null hypothesis."

confused by statistics?

QI Macros for Excel Makes Hypothesis Testing as Easy as 1-2-3!

hypothesis tests in Excel

QI Macros adds a new tab to Excel's menu:

  • Just input your data into an Excel spreadsheet and select it.
  • Click on QI Macros menu , Statistical Tools and the test you want to run (t test, f test, z test, ANOVA, etc.).  If you are not sure which test to run, QI Macros Stat Wizard will analyze your data and run the possible tests for you.
  • QI Macros performs all of the calculations AND interprets the results for you:

hypothesis testing sample results in QI Macros

QI Macros Will Also Draw Charts to Help You Visualize the Differences in Your Data Sets

chart helps visualize results of hypoithesis testing

Cheat Sheet to Help You Interpret the Results Yourself

Stop struggling with hypothesis tests start conducting hypothesis tests in just minutes., download a free 30-day trial. run hypothesis tests now, qi macros can draw these charts too.

control charts

  • SPC Software for Excel
  • Free 30 Day Trial
  • On-line Tech Support
  • QI Macros Reviews
  • Free QI Macros Training
  • Privacy Policy

KnowWare International Inc BBB Business Review

KnowWare International, Inc. 2696 S. Colorado Blvd., Ste. 555 Denver, CO 80222 USA Toll-Free: 1-888-468-1537 Local: (303) 756-9144

linked in

  • Mastering Hypothesis Testing in Excel: A Practical Guide for Students

Excel for Hypothesis Testing: A Practical Approach for Students

Angela O'Brien

Hypothesis testing lies at the heart of statistical inference, serving as a cornerstone for drawing meaningful conclusions from data. It's a methodical process used to evaluate assumptions about a population parameter, typically based on sample data. The fundamental idea behind hypothesis testing is to assess whether observed differences or relationships in the sample are statistically significant enough to warrant generalizations to the larger population. This process involves formulating null and alternative hypotheses, selecting an appropriate statistical test, collecting sample data, and interpreting the results to make informed decisions. In the realm of statistical software, SAS stands out as a robust and widely used tool for data analysis in various fields such as academia, industry, and research. Its extensive capabilities make it particularly favored for complex analyses, large datasets, and advanced modeling techniques. However, despite its versatility and power, SAS can have a steep learning curve, especially for students who are just beginning their journey into statistics. The intricacies of programming syntax, data manipulation, and interpreting output may pose challenges for novice users, potentially hindering their understanding of statistical concepts like hypothesis testing. If you need assistance with your Excel homework , understanding hypothesis testing is essential for performing statistical analyses and drawing meaningful conclusions from data using Excel's built-in functions and tools.

Excel for Hypothesis Testing

Enter Excel, a ubiquitous spreadsheet software that most students are already familiar with to some extent. While Excel may not offer the same level of sophistication as SAS in terms of advanced statistical procedures, it remains a valuable tool, particularly for introductory and intermediate-level analyses. Its intuitive interface, user-friendly features, and widespread accessibility make it an attractive option for students seeking a practical approach to learning statistics. By leveraging Excel's built-in functions, data visualization tools, and straightforward formulas, students can gain hands-on experience with hypothesis testing in a familiar environment. In this blog post, we aim to bridge the gap between theoretical concepts and practical application by demonstrating how Excel can serve as a valuable companion for students tackling hypothesis testing problems, including those typically encountered in SAS assignments. We will focus on demystifying the process of hypothesis testing, breaking it down into manageable steps, and showcasing Excel's capabilities for conducting various tests commonly encountered in introductory statistics courses.

Understanding the Basics

Hypothesis testing is a fundamental concept in statistics that allows researchers to draw conclusions about a population based on sample data. At its core, hypothesis testing involves making a decision about whether a statement regarding a population parameter is likely to be true. This decision is based on the analysis of sample data and is guided by two competing hypotheses: the null hypothesis (H0) and the alternative hypothesis (Ha). The null hypothesis represents the status quo or the absence of an effect. It suggests that any observed differences or relationships in the sample data are due to random variation or chance. On the other hand, the alternative hypothesis contradicts the null hypothesis and suggests the presence of an effect or difference in the population. It reflects the researcher's belief or the hypothesis they aim to support with their analysis.

Formulating Hypotheses

In Excel, students can easily formulate hypotheses using simple formulas and logical operators. For instance, suppose a researcher wants to test whether the mean of a sample is equal to a specified value. They can use the AVERAGE function in Excel to calculate the sample mean and then compare it to the specified value using logical operators like "=" for equality. If the calculated mean is equal to the specified value, it supports the null hypothesis; otherwise, it supports the alternative hypothesis.

Excel's flexibility allows students to customize their hypotheses based on the specific parameters they are testing. Whether it's comparing means, proportions, variances, or other population parameters, Excel provides a user-friendly interface for formulating hypotheses and conducting statistical analysis.

Selecting the Appropriate Test

Excel offers a plethora of functions and tools for conducting various types of hypothesis tests, including t-tests, z-tests, chi-square tests, and ANOVA (analysis of variance). However, selecting the appropriate test requires careful consideration of the assumptions and conditions associated with each test. Students should familiarize themselves with the assumptions underlying each hypothesis test and assess whether their data meets those assumptions. For example, t-tests assume that the data follow a normal distribution, while chi-square tests require categorical data and independence between observations.

Furthermore, students should consider the nature of their research question and the type of data they are analyzing. Are they comparing means of two independent groups or assessing the association between categorical variables? By understanding the characteristics of their data and the requirements of each test, students can confidently choose the appropriate hypothesis test in Excel.

T-tests are statistical tests commonly used to compare the means of two independent samples or to compare the mean of a single sample to a known value. These tests are valuable in various fields, including psychology, biology, economics, and more. In Excel, students can employ the T.TEST function to conduct t-tests, providing them with a practical and accessible way to analyze their data and draw conclusions about population parameters based on sample statistics.

Independent Samples T-Test

The independent samples t-test, also known as the unpaired t-test, is utilized when comparing the means of two independent groups. This test is often employed in experimental and observational studies to assess whether there is a significant difference between the means of the two groups. In Excel, students can easily organize their data into separate columns representing the two groups, calculate the sample means and standard deviations for each group, and then use the T.TEST function to obtain the p-value. The p-value obtained from the T.TEST function represents the probability of observing the sample data if the null hypothesis, which typically states that there is no difference between the means of the two groups, is true.

A small p-value (typically less than the chosen significance level, commonly 0.05) indicates that there is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis, suggesting a significant difference between the group means. By conducting an independent samples t-test in Excel, students can not only assess the significance of differences between two groups but also gain valuable experience in data analysis and hypothesis testing, which are essential skills in various academic and professional settings.

Paired Samples T-Test

The paired samples t-test, also known as the dependent t-test or matched pairs t-test, is employed when comparing the means of two related groups. This test is often used in studies where participants are measured before and after an intervention or when each observation in one group is matched or paired with a specific observation in the other group. Examples include comparing pre-test and post-test scores, analyzing the performance of individuals under different conditions, and assessing the effectiveness of a treatment or intervention. In Excel, students can perform a paired samples t-test by first calculating the differences between paired observations (e.g., subtracting the before-measurement from the after-measurement). Next, they can use the one-sample t-test function, specifying the calculated differences as the sample data. This approach allows students to determine whether the mean difference between paired observations is statistically significant, indicating whether there is a meaningful change or effect between the two related groups.

Interpreting the results of a paired samples t-test involves assessing the obtained p-value in relation to the chosen significance level. A small p-value suggests that there is sufficient evidence to reject the null hypothesis, indicating a significant difference between the paired observations. This information can help students draw meaningful conclusions from their data and make informed decisions based on statistical evidence. By conducting paired samples t-tests in Excel, students can not only analyze the relationship between related groups but also develop critical thinking skills and gain practical experience in hypothesis testing, which are valuable assets in both academic and professional contexts. Additionally, mastering the application of statistical tests in Excel can enhance students' data analysis skills and prepare them for future research endeavors and real-world challenges.

Chi-Square Test

The chi-square test is a versatile statistical tool used to assess the association between two categorical variables. In essence, it helps determine whether the observed frequencies in a dataset significantly deviate from what would be expected under certain assumptions. Excel provides a straightforward means to perform chi-square tests using the CHISQ.TEST function, which calculates the probability associated with the chi-square statistic.

Goodness-of-Fit Test

One application of the chi-square test is the goodness-of-fit test, which evaluates how well the observed frequencies in a single categorical variable align with the expected frequencies dictated by a theoretical distribution. This test is particularly useful when researchers wish to ascertain whether their data conforms to a specific probability distribution. In Excel, students can organize their data into a frequency table, listing the categories of the variable of interest along with their corresponding observed frequencies. They can then specify the expected frequencies based on the theoretical distribution they are testing against. For example, if analyzing the outcomes of a six-sided die roll, where each face is expected to occur with equal probability, the expected frequency for each category would be the total number of observations divided by six.

Once the observed and expected frequencies are determined, students can employ the CHISQ.TEST function in Excel to calculate the chi-square statistic and its associated p-value. The p-value represents the probability of obtaining a chi-square statistic as extreme or more extreme than the observed value under the assumption that the null hypothesis is true (i.e., the observed frequencies match the expected frequencies). Interpreting the results of the goodness-of-fit test involves comparing the calculated p-value to a predetermined significance level (commonly denoted as α). If the p-value is less than α (e.g., α = 0.05), there is sufficient evidence to reject the null hypothesis, indicating that the observed frequencies significantly differ from the expected frequencies specified by the theoretical distribution. Conversely, if the p-value is greater than α, there is insufficient evidence to reject the null hypothesis, suggesting that the observed frequencies align well with the expected frequencies.

Test of Independence

Another important application of the chi-square test in Excel is the test of independence, which evaluates whether there is a significant association between two categorical variables in a contingency table. This test is employed when researchers seek to determine whether the occurrence of one variable is related to the occurrence of another. To conduct a test of independence in Excel, students first create a contingency table that cross-tabulates the two categorical variables of interest. Each cell in the table represents the frequency of occurrences for a specific combination of categories from the two variables.

Similar to the goodness-of-fit test, students then calculate the expected frequencies for each cell under the assumption of independence between the variables. Using the CHISQ.TEST function in Excel, students can calculate the chi-square statistic and its associated p-value based on the observed and expected frequencies in the contingency table. The interpretation of the test results follows a similar procedure to that of the goodness-of-fit test, with the p-value indicating whether there is sufficient evidence to reject the null hypothesis of independence between the two variables.

Excel, despite being commonly associated with spreadsheet tasks, offers a plethora of features that make it a versatile and powerful tool for statistical analysis, especially for students diving into the intricacies of hypothesis testing. Its widespread availability and user-friendly interface make it accessible to students at various levels of statistical proficiency. However, the true value of Excel lies not just in its accessibility but also in its ability to facilitate a hands-on learning experience that reinforces theoretical concepts.

At the core of utilizing Excel for hypothesis testing is a solid understanding of the fundamental principles of statistical inference. Students need to grasp concepts such as the null and alternative hypotheses, significance levels, p-values, and test statistics. Excel provides a practical platform for students to apply these concepts in a real-world context. Through hands-on experimentation with sample datasets, students can observe how changes in data inputs and statistical parameters affect the outcome of hypothesis tests, thus deepening their understanding of statistical theory.

Post a comment...

Mastering hypothesis testing in excel: a practical guide for students submit your homework, attached files.

Hypothesis Testing

hypothesis

So, a hypothesis is just a statement of theory.  It may or may not be true.  A drug company can claim that a new drug is better at decreasing blood pressure.   You may claim that the diet plan you created helps people lose more weight than a nationally known diet plan.  All these things are just statements – just hypotheses.

The hypothesis is the starting point.  From there, we have to test the hypothesis and reach a decision if the hypothesis is probably true or probably false.  Note the word “probably.”  There is always variation – so there is always a chance for you to make the wrong decision.  This month’s publication takes a look at the five steps involved in conducting a hypothesis test.

In this issue:

  • The problem
  • A brief pause for the standard normal distribution
  • Formulate the null hypothesis and the alternative hypothesis
  • Determine the significance level
  • Collect the data and calculate the sample statistics
  • Calculate the p value for the hypothesis test
  • Compare the p value to the desired significance level

Quick Links

You can download this publication as a pdf here .

The Problem

six sigma

The average coating thickness is 5 mil.  You want to be sure that the coating thickness remains the same before you will approve the process change.

The team wants to perform a hypothesis test to prove that the average coating thickness will not change.  The team will go through the basic five steps of hypothesis testing:

The details of the five steps are shown below.  However, before those steps are covered, a review of the standard normal distribution is needed.  This will be required when we do some calculations.

A Brief Pause for the Standard Normal Distribution

We need to digress a moment here because we will need to make use of a special case of the normal distribution – when the average = 0 and the standard deviation = 1. This special case is called the standard normal distribution and is shown in Figure 1.

Figure 1: Standard Normal Distribution

standard_normal_curve

For this distribution, the area under the curve from -∞ to +∞ is equal to 1.0. In addition, the area under the curve is proportional to the fraction of measurements that fall in that region. These two facts can used to help determine the fraction of measurements that fall above some value (such as a specification limit), below some value, or between two values.

histogram

z=  (x- μ)/σ

where x is some value, μ is the average, and σ is the standard deviation of the x values.  The value of z (the z score) is simply how many standard deviations a value, x, is from the average.

For example, suppose x is 1.5 standard deviations below the average.  In this case, z = -1.5.  The area below z = -1.5 is the percentage of x values that are more than 1.5 standard deviations below the average.  For z = -1.5, that area is 6.68% as is shown in Figure 1.   If z = 1.5, then the area above z = 1.5 is the percentage of x values that are more than 1.5 standard deviations above the average.  This area is also 6.68%.

To find the percentage of data within z = -1.5 and z = 1.5, you simply use the fact that the area under the curve is 100%, so the percentage of data between the two z values is 100 – 6.68 – 6.68 = 86.64%.  You can determine these percentages from a table of z values (see our publication on the normal distribution ) or by using Excel’s NORMSDIST function.

These percentages can also be viewed as probabilities, e.g., the probability of getting a result that is less than -1.5 standard deviations below the average is 0.0668.  We will make use of this knowledge below.  Now back to the steps in hypothesis testing.

Step 1: Formulate the Null Hypothesis and Alternative Hypothesis

Hypothesis testing

So the null hypothesis (H 0 ) is that the process change will not impact the average coating thickness; the average coating thickness (μ) will remain at 5.  This is usually written as:

Now for the alternative hypothesis, which is denoted by H 1 .  The alternative hypothesis is that the process change will have an effect on the average coating thickness and the average coating thickness will not equal 5.  This is usually written as:

This is called a two-sided hypothesis test since you are only interested if the mean is not equal to 5.  You can have one-sided tests where you want the mean to be greater than or less than some value.

Step 2: Determine the Significance Level You Want

The significance level is important in hypothesis testing.  It is the probability of rejecting the null hypothesis when it is true.  This probability is denoted by α.  Typical values of  α include 0.05 and 0.01.  You decide that you want α to be 0.05.  This means that there is only a 5% of chance of rejecting the null hypothesis when it is actually true.

Step 3: Collect the Data and Calculate the Sample Statistics

data

X   = average coating thickness = 5.06

s = standard deviation of the coating thickness = 0.20

We have our statistics.  How do you decide to accept or reject the null hypothesis?  The way you do this is to assume that the null hypothesis is true and then determine the probability (p value) of getting this sample average.  If the p value is large, it means that there is large probability of getting an average thickness of 5.06 with a standard deviation of 0.20 when the null hypothesis is true and you will accept that the null hypothesis is probably true.  But if the probability of getting these statistics is small, you will assume that the null hypothesis is probably not true and reject it in favor the alternative hypothesis.

Step 4: Calculate the p Value

To determine this probability, you will need to consider your sampling distribution.    The distribution of sample averages tends to be normal when the sample size is large enough.  We will use this assumption here.  So, your sampling distribution is represented by all the possible sample averages of sample size 25 from the population of coating thicknesses.  This normal distribution is shown in Figure 2.

Figure 2: Normal Distribution for Sample Averages

sampling distribution

The highest point on the curve is the average.  The population average of the sample averages (μ X ) is equal to the population average, μ, so we have just used μ in Figure 1.  The standard deviation of the sample averages is denoted by σ X .

To be able to draw your sampling distribution, you need to know μ X   and  σ X .  Since you assumed that the null hypothesis is true, μ X   = 5.0.  The standard deviation of the sample averages is given by:

where σ is the population standard deviation and n is the sample size.

You don’t know what the population standard deviation is, but you have an estimate from the sample statistics.  The standard deviation of the 25 samples was 0.2.  You can use this as the population standard deviation.

σ X =σ/√n =  s/√n=0.2/√25=0.04

Now you can draw the sampling distribution and add the sample average as shown in Figure 3.

Figure 3: Sampling Distribution

sampling distrbution with mean = 5

Now we return to the z score.  Remember, the z score is a measure of how many standard deviations the sample average ( X  )is from the population average (μ).   For this example, the z value is calculated as:

z=  ( X -μ)/σ X =(5.06-5)/.04=.06/.04=1.5

So, 5.06 is 1.5 standard deviations away from the average.    As shown above, the probability of getting a result that is 1.5 standard deviations away from the average is 0.0668.  Remember, this a two-side test, so you didn’t care if the difference was above or below the average.  So, the probability of getting an average that is more than 1.5 standard deviations away from the average is 2(0.0668) = 0.1336 or 13.36%.  This is the p value:

p value = 0.1336

Remember what the p value represents.  You assumed that the null hypothesis is true.  The p value is the probability of getting this result (or a more extreme result) if the null hypothesis is true.

Step 5:  Compare the p value to the Desired Significance Level

In step 2, we set the significance level at 0.05.  Since our p value is greater than this, we conclude that the coating thickness was not impacted by the process change.  We accept the null hypothesis as probably being true.  If the p value had been less than 0.05, we would rejected the null hypothesis and said that the process change did impact the coating thickness.

This newsletter has taken a look at how to perform hypothesis testing.  The five steps are:

  • Determine the significance level you want

The normal distribution was used to demonstrate how hypothesis testing is done.  You will not always be dealing with the normal distribution but the process is essentially the same.  One item that is still to be discussed is how to select the sample size.  This will be the subject of a later publication.

  • SPC for Excel Software
  • Visit our home page
  • SPC Training
  • SPC Consulting
  • Ordering Information

Thanks so much for reading our SPC Knowledge Base. We hope you find it informative and useful. Happy charting and may the data always support your position.

Dr. Bill McNeese BPI Consulting, LLC

View Bill McNeese

Connect with Us

  • Basic Statistics
  • Item Analysis
  • Analysis of Individual Values (ANOX)
  • Nonparametric Techniques for Comparing Processes
  • Nonparametric Techniques for a Single Sample
  • Descriptive Statistics
  • Interpretation of Alpha and p-Value
  • Just Because There is a Correlation, Doesn’t Mean ….
  • Deciding Which Distribution Fits Your Data Best
  • Distribution Fitting
  • Box-Cox Transformation
  • What? My Data are Not Normal?
  • Are Skewness and Statistics Useful Statistics – Revisited
  • How Many Samples Do I Need?
  • Anderson-Darling Test for Normality
  • Polls, Sample Size, and Error Margins
  • Normal Probability Plots
  • Normal Distribution
  • Are the Skewness and Kurtosis Useful Statistics?
  • Inspecting Supplier Material
  • Explaining Standard Deviation

guest

SPC Knowledgebase Newsletter and Videos

U.S. flag

A .gov website belongs to an official government organization in the United States.

A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

  • Contact Information
  • Guidelines for Examining Unusual Patterns of Cancer and Environmental Concerns
  • Resources and Tools

Appendix A: Statistical Considerations

At a glance.

  • This appendix provides guidance on epidemiologic and descriptive statistical methods to assess cancer occurrences.
  • The standardized incidence ratio (SIR) is often used to assess if there is an excess number of cancer cases.
  • Alpha, beta, and statistical power relate to the types of errors that can occur during hypothesis testing.

Aerial view of a neighborhood with many houses and trees.

This section provides general guidance regarding epidemiologic and descriptive statistical methods most commonly used to assess occurrences of cancer. Frequencies, proportions, rates, and other descriptive statistics are useful first steps in evaluating the suspected unusual pattern of cancer. These statistics can be calculated by geographical location (e.g., census tracts) and by demographic variables such as age category, race, ethnicity, and sex. Comparisons can then be made across different stratifications using statistical summaries such as ratios.

Standardized incidence ratio

The standardized incidence ratio (SIR) is often used to assess whether there is an excess number of cancer cases, considering what is “expected” to occur within an area over time given existing knowledge of the type of cancer and the local population at risk. The SIR is a ratio of the number of observed cancer cases in the study population compared to the number that would be expected if the study population experienced the same cancer rates as a selected reference population. Typically, the state as a whole is used as a reference population. The equation is as follows:

Standardized Incidence Ratio

Adjusting for factors

The SIR can be adjusted for factors such as age, sex, race, or ethnicity, but it is most commonly adjusted for differences in age between two populations. In cancer analyses, adjusting for age is important because age is a risk factor for many cancers, and the population in an area of interest could be, on average, younger or older than the reference population 1 2 . In these instances, comparing the crude counts or rates would present a biased comparison.

For more guidance, this measure is explained in many epidemiologic textbooks, sometimes under standardized mortality ratio, which uses the same method but measures mortality instead of incidence rates 3 4 5 6 7 8 9 . Two ways are generally used to adjust via standardization, an indirect and a direct method. An example of one method is shown below, but a discussion of other methods is provided in several epidemiologic textbooks 3 and reference manuals 10 .

An example is provided in the table below, adjusting for age groups. The second column, denoted with an "O," is the observed number of cases in the area of interest, which in this example is a particular county within the state. The third column shows the population totals for each age group within the county of interest, designated as "A." The state age-specific cancer rates are shown in the fourth column, denoted as "B." To get the expected number of cases in the fifth column, A and B must be multiplied for each row. The total observed cases and the total expected cases are then summarized.

*Number of cases in a specified time frame. † Number of cases in the state divided by the state population for the specified time frame. Rates are typically expressed per 100,000 or 1,000,000 population.

The number of observed cancer cases can then be compared to the expected. The SIR is calculated using the formula below.

Standardized Incidence Ratio Example

Confidence intervals

A confidence interval (CI) is one of the most important statistics to be calculated, as it helps to provide understanding of both statistical significance and precision of the estimate. The narrower the confidence interval, the more precise the estimate 4 .

A common way of calculating confidence intervals for the SIR is shown below 4 :

Confidence Intervals

Using the example above produces this result:

Confidence Intervals Example

If the confidence interval for the SIR includes 1.0, the SIR is not considered statistically significant. However, there are many considerations when using the SIR. Because the statistics can be impacted by small case counts, or the proportion of the population within an area of interest, and other factors, the significance of the SIR should not be used as the sole metric to determine further assessment in the investigation of unusual patterns of cancer. Additionally, in instances of a small sample, exact statistical methods, which are directly calculated from data probabilities such as a chi-square or Fisher’s exact test, can be considered. These calculations can be performed using software such as R, Microsoft Excel, SAS, and STATA 9 . A few additional topics regarding the SIR are summarized below.

Reference population

Decisions about the reference population should be made prior to calculating the SIR. The reference population used for the SIR could be people in the surrounding census tracts, other counties in the state, or the entire state. Selecting the appropriate reference population is dependent upon the hypothesis being tested and should be large enough to provide relatively stable reference rates. One issue to consider is the size of the study population relative to the reference population. If the study population is small relative to the overall state population, including the study population in the reference population calculation will not yield substantially different results. However, excluding the study population from the reference population may reduce bias. If the reference population is smaller than the state as a whole (such as another county), the reference population should be “similar” to the study population in terms of factors that could be confounders (like age distribution, socioeconomic status and environmental exposures other than the exposure of interest). However, the reference population should not be selected to be similar to the study population in terms of the exposure of interest. Appropriate comparisons may also better address issues of environmental justice and health equity. Ultimately, careful consideration of the refence population is necessary since the choice can impact appropriate interpretation of findings and can introduce biases resulting in a decrease in estimate precision.

Limitations and further considerations for the SIR

One difficulty in community cancer investigations is that the population under study is generally a community or part of a community, leading to a relatively small number of individuals comprising the total population (e.g., small denominator for rate calculations). Small denominators frequently yield wide confidence intervals, meaning that estimates like the SIR may be imprecise 5 . Other methods, such as qualitative analyses or geospatial/spatial statistics methods, can provide further examination of the cancer and area of concern to better discern associations. Further epidemiologic studies may help calculate other statistics, such as logistic regression or Poisson regression. These methods are described in Appendix B. Other resources can provide additional guidance on use of p-values, confidence intervals, and statistical tests 3 4 9 11 12 .

Alpha, beta, and statistical power

Another important consideration in community cancer investigations is the types of errors that can occur during hypothesis testing and the related alpha, beta, and statistical power for the investigation. A type I error occurs when the null hypothesis (H o ) is rejected but actually true (e.g., concluding that there is a difference in cancer rates between the study population and the reference population when there is actually no difference). The probability of a type I error is often referred to as alpha or α 13 .

Alpha, Beta, and Statistical Power

A type II error occurs when the null hypothesis is not rejected and it should have been (e.g., concluding that there is no difference in cancer rates when there actually is a difference). The probability of a type II error is often referred to as beta or β.

Alpha, Beta, and Statistical Power 2

Power is the probability of rejecting the null hypothesis when the null hypothesis is actually false (e.g., concluding there is a difference in cancer rates between the study population and reference population when there actually is a difference). Power is equal to 1-beta. Power is related to the sample size of the study—the larger the sample size, the larger the power. Power is also related to several other factors including the following:

  • The size of the effect (e.g., rate ratio or rate difference) to be detected
  • The probability of incorrectly rejecting the null hypothesis (alpha)
  • Other features related to the study design, such as the distribution and variability of the outcome measure

As with other epidemiologic analyses, in community cancer investigations, a power analysis can be conducted to estimate the minimum number of people (sample size) needed in a study for detection of an effect (e.g., rate ratio or rate difference) of a given size with a specified level of power (1-beta) and a specified probability of rejecting the null hypothesis when the null hypothesis is true (alpha), given an assumed distribution for the outcome. Typically, a power value of 0.8 (equivalent to a beta value of 0.2) and an alpha value of 0.05 are used. An alpha value of 0.05 corresponds to a 95% confidence interval. Selection of an alpha value larger than 0.05 (e.g., 0.10: 90% confidence interval) can increase the possibility of concluding that there is a difference when there is actually no difference (Type I error). Selection of a smaller alpha value (e.g., 0.01: 99% confidence interval) can decrease the possibility of that risk and is sometimes considered when many SIRs are computed. The rationale for doing this is that one would expect to see some statistically significant apparent associations just by chance. As the number of SIRs examined increases, the number of SIRs that will be statistically significant by chance alone also increases (if alpha is 0.05, then 5% of the results are expected to be statistically significant by chance alone). However, one may consider this fact when interpreting results, rather than using a lower alpha value 14 . Decreasing the alpha value used will also decrease power for detection of differences between the population of interest and the reference population.

In many investigations of suspected unusual patterns of cancer, the number of people in the study population is determined by factors that may prevent the selection of a sample size sufficient to detect statistically significant differences. In these situations, a power analysis can be used to estimate the power of the study for detecting a difference in rates of a given magnitude. This information can be used to decide if or what type of statistical analysis is appropriate. Therefore, the results of a power calculation can be informative regarding how best to move forward.

Additional Contributing Authors:

Andrea Winquist, Angela Werner

  • Waller LA, Gotway CA. Applied spatial statistics for public health data. New York: John Wiley and Sons; 2004.
  • National Cancer Institute. Cancer Incidence Statistics [Internet]. 2021 [cited 2022 Jan 7]. Available from: https://surveillance.cancer.gov/statistics/types/incidence.html
  • Gordis L. Epidemiology. Philadelphia, PA: Elsevier Saunders; 2014.
  • Merrill R. Environmental Epidemiology, Principles and Methods. Sudbury, MA: Jones and Bartlett Publishers, Inc.; 2008.
  • Kelsey JL, Whittemore AS, Evans AS, Thompson WD. Methods in observational epidemiology. 2nd ed. New York, NY: Oxford University Press; 1996.
  • Sahai H, Khurshid A. Statistics in epidemiology: methods, techniques, and applications. Boca Raton: CRC; 1996.
  • Selvin S. Statistical analysis of epidemiologic data. New York, NY: Oxford University Press; 1996.
  • Breslow NE, Day NE. Statistical methods in cancer research. Volume I – The analysis of case-control studies. IARC Sci Publ. 1980;(32):5–338.
  • Rothman KJ, Greenland S, Lash TL. Modern epidemiology. 3rd ed. Philadelphia, PA: Lippincott Williams & Wilkins; 2008.
  • United Kingdom and Ireland Association of Cancer Registries. Standard Operating Procedure: Investigating and Analysing Small-Area Cancer Clusters [Internet]. 2015. Available from: Cancer Cluster SOP_0.pdf (ukiacr.org)
  • Greenland S, Senn S, Rothman KJ, Carlin J, Poole C, Goodman S, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31(4):337–50.
  • Amrhein V, Greenland S, McShane B. Scientists rise up against statistical significance. Nature. 2019;567:305–7.
  • Pagano M, Gauvreau K. Principles of Biostatistics. 2nd ed. Pacific Grove, CA: Duxbury Thomson Learning; 2000.
  • Rothman KJ. No Adjustments Are Needed for Multiple Comparisons. Vol. 1. 1990.

Unusual Cancer Patterns

Guidelines and resources for examining unusual patterns of cancer and environmental concerns.

For Everyone

Public health.

IMAGES

  1. Hypothesis Testing Formula

    statistical hypothesis testing in excel

  2. Hypothesis Testing Formula

    statistical hypothesis testing in excel

  3. Hypothesis Testing Population Proportion

    statistical hypothesis testing in excel

  4. Hypothesis testing in excel

    statistical hypothesis testing in excel

  5. Hypothesis Tests

    statistical hypothesis testing in excel

  6. Hypothesis Testing Formula

    statistical hypothesis testing in excel

VIDEO

  1. Microsoft Excel Hypothesis Testing Critical Values X^2

  2. Tutorial Excel for research data analysis:Hypothesis testing ,Students t-test, practical approach

  3. Hypothesis Testing in Excel

  4. 8 Hypothesis testing| Z-test |Two Independent Samples with MS Excel

  5. Independent Sample t-test using Excel

  6. Hypothesis Testing for Two Independent Samples in Excel (z-test and t-test)

COMMENTS

  1. The Complete Guide: Hypothesis Testing in Excel

    In statistics, a hypothesis test is used to test some assumption about a population parameter. There are many different types of hypothesis tests you can perform depending on the type of data you're working with and the goal of your analysis. This tutorial explains how to perform the following types of hypothesis tests in Excel: One sample t ...

  2. How to do t-Tests in Excel

    It's free! To install Excel's Analysis Tookpak, click the File tab on the top-left and then click Options on the bottom-left. Then, click Add-Ins. On the Manage drop-down list, choose Excel Add-ins, and click Go. On the popup that appears, check Analysis ToolPak and click OK.

  3. How to Do a T Test in Excel (2 Ways with Interpretation of Results)

    Here is a formula that performs a T Test on the prices of these products and returns the t-test result. =T.TEST(B5:B14,C5:C14,2,2) We set the 3rd argument of the function to 2 as we are doing a two tailed t-test on the dataset. The 4th argument should be 2 for a two sample equal variance t-test.

  4. The Complete Guide: Hypothesis Testing in Excel

    In statistics, a hypothesis test is used to test some assumption about a population parameter. There are many different types of hypothesis tests you can perform depending on the type of data you're working with and the goal of your analysis. This tutorial explains how to perform the following types of hypothesis tests in Excel: One sample t ...

  5. Excel Tutorial: How To Test Hypothesis In Excel

    A. Inputting the data into the Excel spreadsheet. The first step in testing a hypothesis in Excel is to input your data into the spreadsheet. This may include numerical values, categorical data, or any other relevant information for your analysis. B. Organizing the data for hypothesis testing.

  6. Excel Tutorial: How To Do A Hypothesis Test In Excel

    To access the Toolpak, simply go to the "Data" tab, click on "Data Analysis" in the Analysis group, and select "t-Test: Two-Sample Assuming Equal Variances" for a two-sample t-test, or "t-Test: Paired Two Sample for Means" for a paired t-test. B. Inputting the necessary parameters for the test. Once the Data Analysis Toolpak is open, you will ...

  7. Hypothesis Test in Excel for the Population Mean (Large Sample)

    Step 1: Type your data into a single column in Excel. For example, type your data into cells A1:A40. Step 2: Click the "Data" tab and then click "Data Analysis.". If you don't see the Data Analysis button then you may need to load the Data Analysis Toolpak. Step 3: Click " Descriptive Statistics " and then click "OK.".

  8. Hypothesis Testing in Excel: A Practical Handbook

    Hypothesis testing is a crucial statistical method used to draw meaningful conclusions about populations based on sample data. Excel, a ubiquitous spreadsheet tool, can be a handy companion in ...

  9. Hypothesis Testing

    Hypothesis Testing. Central to statistical analysis is the notion of hypothesis testing. We now review hypothesis testing (via null and alternative hypotheses), as well as consider the related topics of confidence intervals, effect size, statistical power, and sample size requirements. Concepts introduced in this part of the website will seem ...

  10. Hypothesis t-test for One Sample Mean using Excel's Data Analysis

    This video shows how to conduct a one-sample hypothesis t-test for the mean in Microsoft Excel using the built-in Data Analysis (from raw data).How to load ...

  11. Hypothesis test (t-test) for a mean in Excel

    Dr Nic shows how to use Excel to perform a hypothesis test for mean using Excel. She also shows the overall hypothesis testing process, linked in with her ot...

  12. PDF Statistical Hypothesis Testing with Microsoft® Office Excel®

    includes several hypothesis tests. 1.1 Classical (Frequentist) Approach . The . classical . or . frequentist . approach to hypothesis is the one used most often and is taught in most introductory statistics texts. We will begin by understanding the logic behind this approach to statistical hypothesis testing.

  13. Statistical Hypothesis Testing with Microsoft ® Office Excel

    This book provides a comprehensive treatment of the logic behind hypothesis testing. Readers will learn to understand statistical hypothesis testing and how to interpret P-values under a variety of conditions including a single hypothesis test, a collection of hypothesis tests, and tests performed on accumulating data.The author explains how a hypothesis test can be interpreted to draw ...

  14. Hypothesis Testing

    Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test. Step 4: Decide whether to reject or fail to reject your null hypothesis. Step 5: Present your findings. Other interesting articles. Frequently asked questions about hypothesis testing.

  15. Power and Sample Size

    To achieve power of 90% requires a sample of size 265, but if you only need to detect an effect of size .5, then you only need a sample of size 44 to achieve 90% power. Resources. The Real Statistics Resource Pack provides several worksheet functions for carrying out both a priori and post hoc tests in Excel.

  16. How to Test Variances in Excel

    In Excel, click Data Analysis on the Data tab. From the Data Analysis popup, choose F-Test Two-Sample for Variances. Under Input, select the ranges for both Variable 1 Range and Variable 2 Range. Check the Labels checkbox if you have meaningful variable names in row 1. This option makes the output easier to interpret.

  17. Hypothesis Testing Excel

    QI Macros for Excel Makes Hypothesis Testing as Easy as 1-2-3! QI Macros adds a new tab to Excel's menu: Just input your data into an Excel spreadsheet and select it. Click on QI Macros menu, Statistical Tools and the test you want to run (t test, f test, z test, ANOVA, etc.). If you are not sure which test to run, QI Macros Stat Wizard will ...

  18. Excel for Hypothesis Testing: A Practical Approach for Students

    Hypothesis testing is a fundamental concept in statistics that allows researchers to draw conclusions about a population based on sample data. At its core, hypothesis testing involves making a decision about whether a statement regarding a population parameter is likely to be true. This decision is based on the analysis of sample data and is ...

  19. Test regression slope

    Thus Theorem 1 of One Sample Hypothesis Testing for Correlation can be transformed into the following test of the hypothesis H 0: β = 0 (i.e. the slope of the population regression line is zero): Example. Example 1: Test whether the slope of the regression line in Example 1 of Method of Least Squares is zero.

  20. Hypothesis Testing

    The team will go through the basic five steps of hypothesis testing: Formulate the null hypothesis and the alternative hypothesis. Determine the significance level. Collect the data and calculate the sample statistics. Calculate the p value for the hypothesis test. Compare the p value to the desired significance level.

  21. How to Perform Regression Analysis using Excel

    Download the Excel file that contains the data for this example: MultipleRegression. In Excel, click Data Analysis on the Data tab, as shown above. In the Data Analysis popup, choose Regression, and then follow the steps below. Specifying the correct model is an iterative process where you fit a model, check the results, and possibly modify it.

  22. Calculate Standard Deviation in Excel: A Step-by-Step Guide

    Hypothesis Testing: Standard deviation is also a key component in statistical tests to determine the significance of differences between groups. This includes test groups in just about everything from clinical trials to focus groups. ... Excel tables: Easily structure data as an Excel Table. When adding new data, formulas using table references ...

  23. Statistical Consulting Service

    This is an introduction to statistical inference for all branches of academic research. The course covers the following things. The foundations of statistics. Useful hypothesis tests. Hypothesis test p-values, size, power and confidence intervals. Statistical reasoning in modern research papers. This course is delivered as a single 3 hour session.

  24. Appendix A: Statistical Considerations

    At a glance. This appendix provides guidance on epidemiologic and descriptive statistical methods to assess cancer occurrences. The standardized incidence ratio (SIR) is often used to assess if there is an excess number of cancer cases. Alpha, beta, and statistical power relate to the types of errors that can occur during hypothesis testing.