Grad Coach

How To Write The Results/Findings Chapter

For quantitative studies (dissertations & theses).

By: Derek Jansen (MBA) | Expert Reviewed By: Kerryn Warren (PhD) | July 2021

So, you’ve completed your quantitative data analysis and it’s time to report on your findings. But where do you start? In this post, we’ll walk you through the results chapter (also called the findings or analysis chapter), step by step, so that you can craft this section of your dissertation or thesis with confidence. If you’re looking for information regarding the results chapter for qualitative studies, you can find that here .

Overview: Quantitative Results Chapter

  • What exactly the results chapter is
  • What you need to include in your chapter
  • How to structure the chapter
  • Tips and tricks for writing a top-notch chapter
  • Free results chapter template

What exactly is the results chapter?

The results chapter (also referred to as the findings or analysis chapter) is one of the most important chapters of your dissertation or thesis because it shows the reader what you’ve found in terms of the quantitative data you’ve collected. It presents the data using a clear text narrative, supported by tables, graphs and charts. In doing so, it also highlights any potential issues (such as outliers or unusual findings) you’ve come across.

But how’s that different from the discussion chapter?

Well, in the results chapter, you only present your statistical findings. Only the numbers, so to speak – no more, no less. Contrasted to this, in the discussion chapter , you interpret your findings and link them to prior research (i.e. your literature review), as well as your research objectives and research questions . In other words, the results chapter presents and describes the data, while the discussion chapter interprets the data.

Let’s look at an example.

In your results chapter, you may have a plot that shows how respondents to a survey  responded: the numbers of respondents per category, for instance. You may also state whether this supports a hypothesis by using a p-value from a statistical test. But it is only in the discussion chapter where you will say why this is relevant or how it compares with the literature or the broader picture. So, in your results chapter, make sure that you don’t present anything other than the hard facts – this is not the place for subjectivity.

It’s worth mentioning that some universities prefer you to combine the results and discussion chapters. Even so, it is good practice to separate the results and discussion elements within the chapter, as this ensures your findings are fully described. Typically, though, the results and discussion chapters are split up in quantitative studies. If you’re unsure, chat with your research supervisor or chair to find out what their preference is.

Free template for results section of a dissertation or thesis

What should you include in the results chapter?

Following your analysis, it’s likely you’ll have far more data than are necessary to include in your chapter. In all likelihood, you’ll have a mountain of SPSS or R output data, and it’s your job to decide what’s most relevant. You’ll need to cut through the noise and focus on the data that matters.

This doesn’t mean that those analyses were a waste of time – on the contrary, those analyses ensure that you have a good understanding of your dataset and how to interpret it. However, that doesn’t mean your reader or examiner needs to see the 165 histograms you created! Relevance is key.

How do I decide what’s relevant?

At this point, it can be difficult to strike a balance between what is and isn’t important. But the most important thing is to ensure your results reflect and align with the purpose of your study .  So, you need to revisit your research aims, objectives and research questions and use these as a litmus test for relevance. Make sure that you refer back to these constantly when writing up your chapter so that you stay on track.

There must be alignment between your research aims objectives and questions

As a general guide, your results chapter will typically include the following:

  • Some demographic data about your sample
  • Reliability tests (if you used measurement scales)
  • Descriptive statistics
  • Inferential statistics (if your research objectives and questions require these)
  • Hypothesis tests (again, if your research objectives and questions require these)

We’ll discuss each of these points in more detail in the next section.

Importantly, your results chapter needs to lay the foundation for your discussion chapter . This means that, in your results chapter, you need to include all the data that you will use as the basis for your interpretation in the discussion chapter.

For example, if you plan to highlight the strong relationship between Variable X and Variable Y in your discussion chapter, you need to present the respective analysis in your results chapter – perhaps a correlation or regression analysis.

Need a helping hand?

statistics for thesis

How do I write the results chapter?

There are multiple steps involved in writing up the results chapter for your quantitative research. The exact number of steps applicable to you will vary from study to study and will depend on the nature of the research aims, objectives and research questions . However, we’ll outline the generic steps below.

Step 1 – Revisit your research questions

The first step in writing your results chapter is to revisit your research objectives and research questions . These will be (or at least, should be!) the driving force behind your results and discussion chapters, so you need to review them and then ask yourself which statistical analyses and tests (from your mountain of data) would specifically help you address these . For each research objective and research question, list the specific piece (or pieces) of analysis that address it.

At this stage, it’s also useful to think about the key points that you want to raise in your discussion chapter and note these down so that you have a clear reminder of which data points and analyses you want to highlight in the results chapter. Again, list your points and then list the specific piece of analysis that addresses each point. 

Next, you should draw up a rough outline of how you plan to structure your chapter . Which analyses and statistical tests will you present and in what order? We’ll discuss the “standard structure” in more detail later, but it’s worth mentioning now that it’s always useful to draw up a rough outline before you start writing (this advice applies to any chapter).

Step 2 – Craft an overview introduction

As with all chapters in your dissertation or thesis, you should start your quantitative results chapter by providing a brief overview of what you’ll do in the chapter and why . For example, you’d explain that you will start by presenting demographic data to understand the representativeness of the sample, before moving onto X, Y and Z.

This section shouldn’t be lengthy – a paragraph or two maximum. Also, it’s a good idea to weave the research questions into this section so that there’s a golden thread that runs through the document.

Your chapter must have a golden thread

Step 3 – Present the sample demographic data

The first set of data that you’ll present is an overview of the sample demographics – in other words, the demographics of your respondents.

For example:

  • What age range are they?
  • How is gender distributed?
  • How is ethnicity distributed?
  • What areas do the participants live in?

The purpose of this is to assess how representative the sample is of the broader population. This is important for the sake of the generalisability of the results. If your sample is not representative of the population, you will not be able to generalise your findings. This is not necessarily the end of the world, but it is a limitation you’ll need to acknowledge.

Of course, to make this representativeness assessment, you’ll need to have a clear view of the demographics of the population. So, make sure that you design your survey to capture the correct demographic information that you will compare your sample to.

But what if I’m not interested in generalisability?

Well, even if your purpose is not necessarily to extrapolate your findings to the broader population, understanding your sample will allow you to interpret your findings appropriately, considering who responded. In other words, it will help you contextualise your findings . For example, if 80% of your sample was aged over 65, this may be a significant contextual factor to consider when interpreting the data. Therefore, it’s important to understand and present the demographic data.

 Step 4 – Review composite measures and the data “shape”.

Before you undertake any statistical analysis, you’ll need to do some checks to ensure that your data are suitable for the analysis methods and techniques you plan to use. If you try to analyse data that doesn’t meet the assumptions of a specific statistical technique, your results will be largely meaningless. Therefore, you may need to show that the methods and techniques you’ll use are “allowed”.

Most commonly, there are two areas you need to pay attention to:

#1: Composite measures

The first is when you have multiple scale-based measures that combine to capture one construct – this is called a composite measure .  For example, you may have four Likert scale-based measures that (should) all measure the same thing, but in different ways. In other words, in a survey, these four scales should all receive similar ratings. This is called “ internal consistency ”.

Internal consistency is not guaranteed though (especially if you developed the measures yourself), so you need to assess the reliability of each composite measure using a test. Typically, Cronbach’s Alpha is a common test used to assess internal consistency – i.e., to show that the items you’re combining are more or less saying the same thing. A high alpha score means that your measure is internally consistent. A low alpha score means you may need to consider scrapping one or more of the measures.

#2: Data shape

The second matter that you should address early on in your results chapter is data shape. In other words, you need to assess whether the data in your set are symmetrical (i.e. normally distributed) or not, as this will directly impact what type of analyses you can use. For many common inferential tests such as T-tests or ANOVAs (we’ll discuss these a bit later), your data needs to be normally distributed. If it’s not, you’ll need to adjust your strategy and use alternative tests.

To assess the shape of the data, you’ll usually assess a variety of descriptive statistics (such as the mean, median and skewness), which is what we’ll look at next.

Descriptive statistics

Step 5 – Present the descriptive statistics

Now that you’ve laid the foundation by discussing the representativeness of your sample, as well as the reliability of your measures and the shape of your data, you can get started with the actual statistical analysis. The first step is to present the descriptive statistics for your variables.

For scaled data, this usually includes statistics such as:

  • The mean – this is simply the mathematical average of a range of numbers.
  • The median – this is the midpoint in a range of numbers when the numbers are arranged in order.
  • The mode – this is the most commonly repeated number in the data set.
  • Standard deviation – this metric indicates how dispersed a range of numbers is. In other words, how close all the numbers are to the mean (the average).
  • Skewness – this indicates how symmetrical a range of numbers is. In other words, do they tend to cluster into a smooth bell curve shape in the middle of the graph (this is called a normal or parametric distribution), or do they lean to the left or right (this is called a non-normal or non-parametric distribution).
  • Kurtosis – this metric indicates whether the data are heavily or lightly-tailed, relative to the normal distribution. In other words, how peaked or flat the distribution is.

A large table that indicates all the above for multiple variables can be a very effective way to present your data economically. You can also use colour coding to help make the data more easily digestible.

For categorical data, where you show the percentage of people who chose or fit into a category, for instance, you can either just plain describe the percentages or numbers of people who responded to something or use graphs and charts (such as bar graphs and pie charts) to present your data in this section of the chapter.

When using figures, make sure that you label them simply and clearly , so that your reader can easily understand them. There’s nothing more frustrating than a graph that’s missing axis labels! Keep in mind that although you’ll be presenting charts and graphs, your text content needs to present a clear narrative that can stand on its own. In other words, don’t rely purely on your figures and tables to convey your key points: highlight the crucial trends and values in the text. Figures and tables should complement the writing, not carry it .

Depending on your research aims, objectives and research questions, you may stop your analysis at this point (i.e. descriptive statistics). However, if your study requires inferential statistics, then it’s time to deep dive into those .

Dive into the inferential statistics

Step 6 – Present the inferential statistics

Inferential statistics are used to make generalisations about a population , whereas descriptive statistics focus purely on the sample . Inferential statistical techniques, broadly speaking, can be broken down into two groups .

First, there are those that compare measurements between groups , such as t-tests (which measure differences between two groups) and ANOVAs (which measure differences between multiple groups). Second, there are techniques that assess the relationships between variables , such as correlation analysis and regression analysis. Within each of these, some tests can be used for normally distributed (parametric) data and some tests are designed specifically for use on non-parametric data.

There are a seemingly endless number of tests that you can use to crunch your data, so it’s easy to run down a rabbit hole and end up with piles of test data. Ultimately, the most important thing is to make sure that you adopt the tests and techniques that allow you to achieve your research objectives and answer your research questions .

In this section of the results chapter, you should try to make use of figures and visual components as effectively as possible. For example, if you present a correlation table, use colour coding to highlight the significance of the correlation values, or scatterplots to visually demonstrate what the trend is. The easier you make it for your reader to digest your findings, the more effectively you’ll be able to make your arguments in the next chapter.

make it easy for your reader to understand your quantitative results

Step 7 – Test your hypotheses

If your study requires it, the next stage is hypothesis testing. A hypothesis is a statement , often indicating a difference between groups or relationship between variables, that can be supported or rejected by a statistical test. However, not all studies will involve hypotheses (again, it depends on the research objectives), so don’t feel like you “must” present and test hypotheses just because you’re undertaking quantitative research.

The basic process for hypothesis testing is as follows:

  • Specify your null hypothesis (for example, “The chemical psilocybin has no effect on time perception).
  • Specify your alternative hypothesis (e.g., “The chemical psilocybin has an effect on time perception)
  • Set your significance level (this is usually 0.05)
  • Calculate your statistics and find your p-value (e.g., p=0.01)
  • Draw your conclusions (e.g., “The chemical psilocybin does have an effect on time perception”)

Finally, if the aim of your study is to develop and test a conceptual framework , this is the time to present it, following the testing of your hypotheses. While you don’t need to develop or discuss these findings further in the results chapter, indicating whether the tests (and their p-values) support or reject the hypotheses is crucial.

Step 8 – Provide a chapter summary

To wrap up your results chapter and transition to the discussion chapter, you should provide a brief summary of the key findings . “Brief” is the keyword here – much like the chapter introduction, this shouldn’t be lengthy – a paragraph or two maximum. Highlight the findings most relevant to your research objectives and research questions, and wrap it up.

Some final thoughts, tips and tricks

Now that you’ve got the essentials down, here are a few tips and tricks to make your quantitative results chapter shine:

  • When writing your results chapter, report your findings in the past tense . You’re talking about what you’ve found in your data, not what you are currently looking for or trying to find.
  • Structure your results chapter systematically and sequentially . If you had two experiments where findings from the one generated inputs into the other, report on them in order.
  • Make your own tables and graphs rather than copying and pasting them from statistical analysis programmes like SPSS. Check out the DataIsBeautiful reddit for some inspiration.
  • Once you’re done writing, review your work to make sure that you have provided enough information to answer your research questions , but also that you didn’t include superfluous information.

If you’ve got any questions about writing up the quantitative results chapter, please leave a comment below. If you’d like 1-on-1 assistance with your quantitative analysis and discussion, check out our hands-on coaching service , or book a free consultation with a friendly coach.

statistics for thesis

Psst... there’s more!

This post was based on one of our popular Research Bootcamps . If you're working on a research project, you'll definitely want to check this out ...

You Might Also Like:

How to write the results chapter in a qualitative thesis

Thank you. I will try my best to write my results.

Lord

Awesome content 👏🏾

Tshepiso

this was great explaination

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

Statistical Methods in Theses: Guidelines and Explanations

Signed August 2018 Naseem Al-Aidroos, PhD, Christopher Fiacconi, PhD Deborah Powell, PhD, Harvey Marmurek, PhD, Ian Newby-Clark, PhD, Jeffrey Spence, PhD, David Stanley, PhD, Lana Trick, PhD

Version:  2.00

This document is an organizational aid, and workbook, for students. We encourage students to take this document to meetings with their advisor and committee. This guide should enhance a committee’s ability to assess key areas of a student’s work. 

In recent years a number of well-known and apparently well-established findings have  failed to replicate , resulting in what is commonly referred to as the replication crisis. The APA Publication Manual 6 th Edition notes that “The essence of the scientific method involves observations that can be repeated and verified by others.” (p. 12). However, a systematic investigation of the replicability of psychology findings published in  Science  revealed that over half of psychology findings do not replicate (see a related commentary in  Nature ). Even more disturbing, a  Bayesian reanalysis of the reproducibility project  showed that 64% of studies had sample sizes so small that strong evidence for or against the null or alternative hypotheses did not exist. Indeed, Morey and Lakens (2016) concluded that most of psychology is statistically unfalsifiable due to small sample sizes and correspondingly low power (see  article ). Our discipline’s reputation is suffering. News of the replication crisis has reached the popular press (e.g.,  The Atlantic ,   The Economist ,   Slate , Last Week Tonight ).

An increasing number of psychologists have responded by promoting new research standards that involve open science and the elimination of  Questionable Research Practices . The open science perspective is made manifest in the  Transparency and Openness Promotion (TOP) guidelines  for journal publications. These guidelines were adopted some time ago by the  Association for Psychological Science . More recently, the guidelines were adopted by American Psychological Association journals ( see details ) and journals published by Elsevier ( see details ). It appears likely that, in the very near future, most journals in psychology will be using an open science approach. We strongly advise readers to take a moment to inspect the  TOP Guidelines Summary Table . 

A key aspect of open science and the TOP guidelines is the sharing of data associated with published research (with respect to medical research, see point #35 in the  World Medical Association Declaration of Helsinki ). This practice is viewed widely as highly important. Indeed, open science is recommended by  all G7 science ministers . All Tri-Agency grants must include a data-management plan that includes plans for sharing: “ research data resulting from agency funding should normally be preserved in a publicly accessible, secure and curated repository or other platform for discovery and reuse by others.”  Moreover, a 2017 editorial published in the  New England Journal of Medicine announced that the  International Committee of Medical Journal Editors believes there is  “an ethical obligation to responsibly share data.”  As of this writing,  60% of highly ranked psychology journals require or encourage data sharing .

The increasing importance of demonstrating that findings are replicable is reflected in calls to make replication a requirement for the promotion of faculty (see details in  Nature ) and experts in open science are now refereeing applications for tenure and promotion (see details at the  Center for Open Science  and  this article ). Most dramatically, in one instance, a paper resulting from a dissertation was retracted due to misleading findings attributable to Questionable Research Practices. Subsequent to the retraction, the Ohio State University’s Board of Trustees unanimously revoked the PhD of the graduate student who wrote the dissertation ( see details ). Thus, the academic environment is changing and it is important to work toward using new best practices in lieu of older practices—many of which are synonymous with Questionable Research Practices. Doing so should help you avoid later career regrets and subsequent  public mea culpas . One way to achieve your research objectives in this new academic environment is  to incorporate replications into your research . Replications are becoming more common and there are even websites dedicated to helping students conduct replications (e.g.,  Psychology Science Accelerator ) and indexing the success of replications (e.g., Curate Science ). You might even consider conducting a replication for your thesis (subject to committee approval).

As early-career researchers, it is important to be aware of the changing academic environment. Senior principal investigators may be  reluctant to engage in open science  (see this student perspective in a  blog post  and  podcast ) and research on resistance to data sharing indicates that one of the barriers to sharing data is that researchers do not feel that they have knowledge of  how to share data online . This document is an educational aid and resource to provide students with introductory knowledge of how to participate in open science and online data sharing to start their education on these subjects. 

Guidelines and Explanations

In light of the changes in psychology, faculty members who teach statistics/methods have reviewed the literature and generated this guide for graduate students. The guide is intended to enhance the quality of student theses by facilitating their engagement in open and transparent research practices and by helping them avoid Questionable Research Practices, many of which are now deemed unethical and covered in the ethics section of textbooks.

This document is an informational tool.

How to Start

In order to follow best practices, some first steps need to be followed. Here is a list of things to do:

  • Get an Open Science account. Registration at  osf.io  is easy!
  • If conducting confirmatory hypothesis testing for your thesis, pre-register your hypotheses (see Section 1-Hypothesizing). The Open Science Foundation website has helpful  tutorials  and  guides  to get you going.
  • Also, pre-register your data analysis plan. Pre-registration typically includes how and when you will stop collecting data, how you will deal with violations of statistical assumptions and points of influence (“outliers”), the specific measures you will use, and the analyses you will use to test each hypothesis, possibly including the analysis script. Again, there is a lot of help available for this. 

Exploratory and Confirmatory Research Are Both of Value, But Do Not Confuse the Two

We note that this document largely concerns confirmatory research (i.e., testing hypotheses). We by no means intend to devalue exploratory research. Indeed, it is one of the primary ways that hypotheses are generated for (possible) confirmation. Instead, we emphasize that it is important that you clearly indicate what of your research is exploratory and what is confirmatory. Be clear in your writing and in your preregistration plan. You should explicitly indicate which of your analyses are exploratory and which are confirmatory. Please note also that if you are engaged in exploratory research, then Null Hypothesis Significance Testing (NHST) should probably be avoided (see rationale in  Gigerenzer  (2004) and  Wagenmakers et al., (2012) ). 

This document is structured around the stages of thesis work:  hypothesizing, design, data collection, analyses, and reporting – consistent with the headings used by Wicherts et al. (2016). We also list the Questionable Research Practices associated with each stage and provide suggestions for avoiding them. We strongly advise going through all of these sections during thesis/dissertation proposal meetings because a priori decisions need to be made prior to data collection (including analysis decisions). 

To help to ensure that the student has informed the committee about key decisions at each stage, there are check boxes at the end of each section.

How to Use This Document in a Proposal Meeting

  • Print off a copy of this document and take it to the proposal meeting.
  • During the meeting, use the document to seek assistance from faculty to address potential problems.
  • Revisit responses to issues raised by this document (especially the Analysis and Reporting Stages) when you are seeking approval to proceed to defense.

Consultation and Help Line

Note that the Center for Open Science now has a help line (for individual researchers and labs) you can call for help with open science issues. They also have training workshops. Please see their  website  for details.

  • Hypothesizing
  • Data Collection
  • Printer-friendly version
  • PDF version

Reference management. Clean and simple.

How to collect data for your thesis

Thesis data collection tips

Collecting theoretical data

Search for theses on your topic, use content-sharing platforms, collecting empirical data, qualitative vs. quantitative data, frequently asked questions about gathering data for your thesis, related articles.

After choosing a topic for your thesis , you’ll need to start gathering data. In this article, we focus on how to effectively collect theoretical and empirical data.

Empirical data : unique research that may be quantitative, qualitative, or mixed.

Theoretical data : secondary, scholarly sources like books and journal articles that provide theoretical context for your research.

Thesis : the culminating, multi-chapter project for a bachelor’s, master’s, or doctoral degree.

Qualitative data : info that cannot be measured, like observations and interviews .

Quantitative data : info that can be measured and written with numbers.

At this point in your academic life, you are already acquainted with the ways of finding potential references. Some obvious sources of theoretical material are:

  • edited volumes
  • conference proceedings
  • online databases like Google Scholar , ERIC , or Scopus

You can also take a look at the top list of academic search engines .

Looking at other theses on your topic can help you see what approaches have been taken and what aspects other writers have focused on. Pay close attention to the list of references and follow the bread-crumbs back to the original theories and specialized authors.

Another method for gathering theoretical data is to read through content-sharing platforms. Many people share their papers and writings on these sites. You can either hunt sources, get some inspiration for your own work or even learn new angles of your topic. 

Some popular content sharing sites are:

With these sites, you have to check the credibility of the sources. You can usually rely on the content, but we recommend double-checking just to be sure. Take a look at our guide on what are credible sources?

The more you know, the better. The guide, " How to undertake a literature search and review for dissertations and final year projects ," will give you all the tools needed for finding literature .

In order to successfully collect empirical data, you have to choose first what type of data you want as an outcome. There are essentially two options, qualitative or quantitative data. Many people mistake one term with the other, so it’s important to understand the differences between qualitative and quantitative research .

Boiled down, qualitative data means words and quantitative means numbers. Both types are considered primary sources . Whichever one adapts best to your research will define the type of methodology to carry out, so choose wisely.

In the end, having in mind what type of outcome you intend and how much time you count on will lead you to choose the best type of empirical data for your research. For a detailed description of each methodology type mentioned above, read more about collecting data .

Once you gather enough theoretical and empirical data, you will need to start writing. But before the actual writing part, you have to structure your thesis to avoid getting lost in the sea of information. Take a look at our guide on how to structure your thesis for some tips and tricks.

The key to knowing what type of data you should collect for your thesis is knowing in advance the type of outcome you intend to have, and the amount of time you count with.

Some obvious sources of theoretical material are journals, libraries and online databases like Google Scholar , ERIC or Scopus , or take a look at the top list of academic search engines . You can also search for theses on your topic or read content sharing platforms, like Medium , Issuu , or Slideshare .

To gather empirical data, you have to choose first what type of data you want. There are two options, qualitative or quantitative data. You can gather data through observations, interviews, focus groups, or with surveys, tests, and existing databases.

Qualitative data means words, information that cannot be measured. It may involve multimedia material or non-textual data. This type of data claims to be detailed, nuanced and contextual.

Quantitative data means numbers, information that can be measured and written with numbers. This type of data claims to be credible, scientific and exact.

Rhetorical analysis illustration

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Indian J Anaesth
  • v.60(9); 2016 Sep

Basic statistical tools in research and data analysis

Zulfiqar ali.

Department of Anaesthesiology, Division of Neuroanaesthesiology, Sheri Kashmir Institute of Medical Sciences, Soura, Srinagar, Jammu and Kashmir, India

S Bala Bhaskar

1 Department of Anaesthesiology and Critical Care, Vijayanagar Institute of Medical Sciences, Bellary, Karnataka, India

Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.

INTRODUCTION

Statistics is a branch of science that deals with the collection, organisation, analysis of data and drawing of inferences from the samples to the whole population.[ 1 ] This requires a proper design of the study, an appropriate selection of the study sample and choice of a suitable statistical test. An adequate knowledge of statistics is necessary for proper designing of an epidemiological study or a clinical trial. Improper statistical methods may result in erroneous conclusions which may lead to unethical practice.[ 2 ]

Variable is a characteristic that varies from one individual member of population to another individual.[ 3 ] Variables such as height and weight are measured by some type of scale, convey quantitative information and are called as quantitative variables. Sex and eye colour give qualitative information and are called as qualitative variables[ 3 ] [ Figure 1 ].

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g001.jpg

Classification of variables

Quantitative variables

Quantitative or numerical data are subdivided into discrete and continuous measurements. Discrete numerical data are recorded as a whole number such as 0, 1, 2, 3,… (integer), whereas continuous data can assume any value. Observations that can be counted constitute the discrete data and observations that can be measured constitute the continuous data. Examples of discrete data are number of episodes of respiratory arrests or the number of re-intubations in an intensive care unit. Similarly, examples of continuous data are the serial serum glucose levels, partial pressure of oxygen in arterial blood and the oesophageal temperature.

A hierarchical scale of increasing precision can be used for observing and recording the data which is based on categorical, ordinal, interval and ratio scales [ Figure 1 ].

Categorical or nominal variables are unordered. The data are merely classified into categories and cannot be arranged in any particular order. If only two categories exist (as in gender male and female), it is called as a dichotomous (or binary) data. The various causes of re-intubation in an intensive care unit due to upper airway obstruction, impaired clearance of secretions, hypoxemia, hypercapnia, pulmonary oedema and neurological impairment are examples of categorical variables.

Ordinal variables have a clear ordering between the variables. However, the ordered data may not have equal intervals. Examples are the American Society of Anesthesiologists status or Richmond agitation-sedation scale.

Interval variables are similar to an ordinal variable, except that the intervals between the values of the interval variable are equally spaced. A good example of an interval scale is the Fahrenheit degree scale used to measure temperature. With the Fahrenheit scale, the difference between 70° and 75° is equal to the difference between 80° and 85°: The units of measurement are equal throughout the full range of the scale.

Ratio scales are similar to interval scales, in that equal differences between scale values have equal quantitative meaning. However, ratio scales also have a true zero point, which gives them an additional property. For example, the system of centimetres is an example of a ratio scale. There is a true zero point and the value of 0 cm means a complete absence of length. The thyromental distance of 6 cm in an adult may be twice that of a child in whom it may be 3 cm.

STATISTICS: DESCRIPTIVE AND INFERENTIAL STATISTICS

Descriptive statistics[ 4 ] try to describe the relationship between variables in a sample or population. Descriptive statistics provide a summary of data in the form of mean, median and mode. Inferential statistics[ 4 ] use a random sample of data taken from a population to describe and make inferences about the whole population. It is valuable when it is not possible to examine each member of an entire population. The examples if descriptive and inferential statistics are illustrated in Table 1 .

Example of descriptive and inferential statistics

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g002.jpg

Descriptive statistics

The extent to which the observations cluster around a central location is described by the central tendency and the spread towards the extremes is described by the degree of dispersion.

Measures of central tendency

The measures of central tendency are mean, median and mode.[ 6 ] Mean (or the arithmetic average) is the sum of all the scores divided by the number of scores. Mean may be influenced profoundly by the extreme variables. For example, the average stay of organophosphorus poisoning patients in ICU may be influenced by a single patient who stays in ICU for around 5 months because of septicaemia. The extreme values are called outliers. The formula for the mean is

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g003.jpg

where x = each observation and n = number of observations. Median[ 6 ] is defined as the middle of a distribution in a ranked data (with half of the variables in the sample above and half below the median value) while mode is the most frequently occurring variable in a distribution. Range defines the spread, or variability, of a sample.[ 7 ] It is described by the minimum and maximum values of the variables. If we rank the data and after ranking, group the observations into percentiles, we can get better information of the pattern of spread of the variables. In percentiles, we rank the observations into 100 equal parts. We can then describe 25%, 50%, 75% or any other percentile amount. The median is the 50 th percentile. The interquartile range will be the observations in the middle 50% of the observations about the median (25 th -75 th percentile). Variance[ 7 ] is a measure of how spread out is the distribution. It gives an indication of how close an individual observation clusters about the mean value. The variance of a population is defined by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g004.jpg

where σ 2 is the population variance, X is the population mean, X i is the i th element from the population and N is the number of elements in the population. The variance of a sample is defined by slightly different formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g005.jpg

where s 2 is the sample variance, x is the sample mean, x i is the i th element from the sample and n is the number of elements in the sample. The formula for the variance of a population has the value ‘ n ’ as the denominator. The expression ‘ n −1’ is known as the degrees of freedom and is one less than the number of parameters. Each observation is free to vary, except the last one which must be a defined value. The variance is measured in squared units. To make the interpretation of the data simple and to retain the basic unit of observation, the square root of variance is used. The square root of the variance is the standard deviation (SD).[ 8 ] The SD of a population is defined by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g006.jpg

where σ is the population SD, X is the population mean, X i is the i th element from the population and N is the number of elements in the population. The SD of a sample is defined by slightly different formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g007.jpg

where s is the sample SD, x is the sample mean, x i is the i th element from the sample and n is the number of elements in the sample. An example for calculation of variation and SD is illustrated in Table 2 .

Example of mean, variance, standard deviation

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g008.jpg

Normal distribution or Gaussian distribution

Most of the biological variables usually cluster around a central value, with symmetrical positive and negative deviations about this point.[ 1 ] The standard normal distribution curve is a symmetrical bell-shaped. In a normal distribution curve, about 68% of the scores are within 1 SD of the mean. Around 95% of the scores are within 2 SDs of the mean and 99% within 3 SDs of the mean [ Figure 2 ].

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g009.jpg

Normal distribution curve

Skewed distribution

It is a distribution with an asymmetry of the variables about its mean. In a negatively skewed distribution [ Figure 3 ], the mass of the distribution is concentrated on the right of Figure 1 . In a positively skewed distribution [ Figure 3 ], the mass of the distribution is concentrated on the left of the figure leading to a longer right tail.

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g010.jpg

Curves showing negatively skewed and positively skewed distribution

Inferential statistics

In inferential statistics, data are analysed from a sample to make inferences in the larger collection of the population. The purpose is to answer or test the hypotheses. A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon. Hypothesis tests are thus procedures for making rational decisions about the reality of observed effects.

Probability is the measure of the likelihood that an event will occur. Probability is quantified as a number between 0 and 1 (where 0 indicates impossibility and 1 indicates certainty).

In inferential statistics, the term ‘null hypothesis’ ( H 0 ‘ H-naught ,’ ‘ H-null ’) denotes that there is no relationship (difference) between the population variables in question.[ 9 ]

Alternative hypothesis ( H 1 and H a ) denotes that a statement between the variables is expected to be true.[ 9 ]

The P value (or the calculated probability) is the probability of the event occurring by chance if the null hypothesis is true. The P value is a numerical between 0 and 1 and is interpreted by researchers in deciding whether to reject or retain the null hypothesis [ Table 3 ].

P values with interpretation

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g011.jpg

If P value is less than the arbitrarily chosen value (known as α or the significance level), the null hypothesis (H0) is rejected [ Table 4 ]. However, if null hypotheses (H0) is incorrectly rejected, this is known as a Type I error.[ 11 ] Further details regarding alpha error, beta error and sample size calculation and factors influencing them are dealt with in another section of this issue by Das S et al .[ 12 ]

Illustration for null hypothesis

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g012.jpg

PARAMETRIC AND NON-PARAMETRIC TESTS

Numerical data (quantitative variables) that are normally distributed are analysed with parametric tests.[ 13 ]

Two most basic prerequisites for parametric statistical analysis are:

  • The assumption of normality which specifies that the means of the sample group are normally distributed
  • The assumption of equal variance which specifies that the variances of the samples and of their corresponding population are equal.

However, if the distribution of the sample is skewed towards one side or the distribution is unknown due to the small sample size, non-parametric[ 14 ] statistical techniques are used. Non-parametric tests are used to analyse ordinal and categorical data.

Parametric tests

The parametric tests assume that the data are on a quantitative (numerical) scale, with a normal distribution of the underlying population. The samples have the same variance (homogeneity of variances). The samples are randomly drawn from the population, and the observations within a group are independent of each other. The commonly used parametric tests are the Student's t -test, analysis of variance (ANOVA) and repeated measures ANOVA.

Student's t -test

Student's t -test is used to test the null hypothesis that there is no difference between the means of the two groups. It is used in three circumstances:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g013.jpg

where X = sample mean, u = population mean and SE = standard error of mean

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g014.jpg

where X 1 − X 2 is the difference between the means of the two groups and SE denotes the standard error of the difference.

  • To test if the population means estimated by two dependent samples differ significantly (the paired t -test). A usual setting for paired t -test is when measurements are made on the same subjects before and after a treatment.

The formula for paired t -test is:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g015.jpg

where d is the mean difference and SE denotes the standard error of this difference.

The group variances can be compared using the F -test. The F -test is the ratio of variances (var l/var 2). If F differs significantly from 1.0, then it is concluded that the group variances differ significantly.

Analysis of variance

The Student's t -test cannot be used for comparison of three or more groups. The purpose of ANOVA is to test if there is any significant difference between the means of two or more groups.

In ANOVA, we study two variances – (a) between-group variability and (b) within-group variability. The within-group variability (error variance) is the variation that cannot be accounted for in the study design. It is based on random differences present in our samples.

However, the between-group (or effect variance) is the result of our treatment. These two estimates of variances are compared using the F-test.

A simplified formula for the F statistic is:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g016.jpg

where MS b is the mean squares between the groups and MS w is the mean squares within groups.

Repeated measures analysis of variance

As with ANOVA, repeated measures ANOVA analyses the equality of means of three or more groups. However, a repeated measure ANOVA is used when all variables of a sample are measured under different conditions or at different points in time.

As the variables are measured from a sample at different points of time, the measurement of the dependent variable is repeated. Using a standard ANOVA in this case is not appropriate because it fails to model the correlation between the repeated measures: The data violate the ANOVA assumption of independence. Hence, in the measurement of repeated dependent variables, repeated measures ANOVA should be used.

Non-parametric tests

When the assumptions of normality are not met, and the sample means are not normally, distributed parametric tests can lead to erroneous results. Non-parametric tests (distribution-free test) are used in such situation as they do not require the normality assumption.[ 15 ] Non-parametric tests may fail to detect a significant difference when compared with a parametric test. That is, they usually have less power.

As is done for the parametric tests, the test statistic is compared with known values for the sampling distribution of that statistic and the null hypothesis is accepted or rejected. The types of non-parametric analysis techniques and the corresponding parametric analysis techniques are delineated in Table 5 .

Analogue of parametric and non-parametric tests

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g017.jpg

Median test for one sample: The sign test and Wilcoxon's signed rank test

The sign test and Wilcoxon's signed rank test are used for median tests of one sample. These tests examine whether one instance of sample data is greater or smaller than the median reference value.

This test examines the hypothesis about the median θ0 of a population. It tests the null hypothesis H0 = θ0. When the observed value (Xi) is greater than the reference value (θ0), it is marked as+. If the observed value is smaller than the reference value, it is marked as − sign. If the observed value is equal to the reference value (θ0), it is eliminated from the sample.

If the null hypothesis is true, there will be an equal number of + signs and − signs.

The sign test ignores the actual values of the data and only uses + or − signs. Therefore, it is useful when it is difficult to measure the values.

Wilcoxon's signed rank test

There is a major limitation of sign test as we lose the quantitative information of the given data and merely use the + or – signs. Wilcoxon's signed rank test not only examines the observed values in comparison with θ0 but also takes into consideration the relative sizes, adding more statistical power to the test. As in the sign test, if there is an observed value that is equal to the reference value θ0, this observed value is eliminated from the sample.

Wilcoxon's rank sum test ranks all data points in order, calculates the rank sum of each sample and compares the difference in the rank sums.

Mann-Whitney test

It is used to test the null hypothesis that two samples have the same median or, alternatively, whether observations in one sample tend to be larger than observations in the other.

Mann–Whitney test compares all data (xi) belonging to the X group and all data (yi) belonging to the Y group and calculates the probability of xi being greater than yi: P (xi > yi). The null hypothesis states that P (xi > yi) = P (xi < yi) =1/2 while the alternative hypothesis states that P (xi > yi) ≠1/2.

Kolmogorov-Smirnov test

The two-sample Kolmogorov-Smirnov (KS) test was designed as a generic method to test whether two random samples are drawn from the same distribution. The null hypothesis of the KS test is that both distributions are identical. The statistic of the KS test is a distance between the two empirical distributions, computed as the maximum absolute difference between their cumulative curves.

Kruskal-Wallis test

The Kruskal–Wallis test is a non-parametric test to analyse the variance.[ 14 ] It analyses if there is any difference in the median values of three or more independent samples. The data values are ranked in an increasing order, and the rank sums calculated followed by calculation of the test statistic.

Jonckheere test

In contrast to Kruskal–Wallis test, in Jonckheere test, there is an a priori ordering that gives it a more statistical power than the Kruskal–Wallis test.[ 14 ]

Friedman test

The Friedman test is a non-parametric test for testing the difference between several related samples. The Friedman test is an alternative for repeated measures ANOVAs which is used when the same parameter has been measured under different conditions on the same subjects.[ 13 ]

Tests to analyse the categorical data

Chi-square test, Fischer's exact test and McNemar's test are used to analyse the categorical or nominal variables. The Chi-square test compares the frequencies and tests whether the observed data differ significantly from that of the expected data if there were no differences between groups (i.e., the null hypothesis). It is calculated by the sum of the squared difference between observed ( O ) and the expected ( E ) data (or the deviation, d ) divided by the expected data by the following formula:

An external file that holds a picture, illustration, etc.
Object name is IJA-60-662-g018.jpg

A Yates correction factor is used when the sample size is small. Fischer's exact test is used to determine if there are non-random associations between two categorical variables. It does not assume random sampling, and instead of referring a calculated statistic to a sampling distribution, it calculates an exact probability. McNemar's test is used for paired nominal data. It is applied to 2 × 2 table with paired-dependent samples. It is used to determine whether the row and column frequencies are equal (that is, whether there is ‘marginal homogeneity’). The null hypothesis is that the paired proportions are equal. The Mantel-Haenszel Chi-square test is a multivariate test as it analyses multiple grouping variables. It stratifies according to the nominated confounding variables and identifies any that affects the primary outcome variable. If the outcome variable is dichotomous, then logistic regression is used.

SOFTWARES AVAILABLE FOR STATISTICS, SAMPLE SIZE CALCULATION AND POWER ANALYSIS

Numerous statistical software systems are available currently. The commonly used software systems are Statistical Package for the Social Sciences (SPSS – manufactured by IBM corporation), Statistical Analysis System ((SAS – developed by SAS Institute North Carolina, United States of America), R (designed by Ross Ihaka and Robert Gentleman from R core team), Minitab (developed by Minitab Inc), Stata (developed by StataCorp) and the MS Excel (developed by Microsoft).

There are a number of web resources which are related to statistical power analyses. A few are:

  • StatPages.net – provides links to a number of online power calculators
  • G-Power – provides a downloadable power analysis program that runs under DOS
  • Power analysis for ANOVA designs an interactive site that calculates power or sample size needed to attain a given power for one effect in a factorial ANOVA design
  • SPSS makes a program called SamplePower. It gives an output of a complete report on the computer screen which can be cut and paste into another document.

It is important that a researcher knows the concepts of the basic statistical methods used for conduct of a research study. This will help to conduct an appropriately well-designed study leading to valid and reliable results. Inappropriate use of statistical techniques may lead to faulty conclusions, inducing errors and undermining the significance of the article. Bad statistics may lead to bad research, and bad research may lead to unethical practice. Hence, an adequate knowledge of statistics and the appropriate use of statistical tests are important. An appropriate knowledge about the basic statistical methods will go a long way in improving the research designs and producing quality medical research which can be utilised for formulating the evidence-based guidelines.

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.

Digital Commons @ University of South Florida

  • USF Research
  • USF Libraries

Digital Commons @ USF > College of Arts and Sciences > Mathematics and Statistics > Theses and Dissertations

Mathematics and Statistics Theses and Dissertations

Theses/dissertations from 2023 2023.

Classification of Finite Topological Quandles and Shelves via Posets , Hitakshi Lahrani

Applied Analysis for Learning Architectures , Himanshu Singh

Rational Functions of Degree Five That Permute the Projective Line Over a Finite Field , Christopher Sze

Theses/Dissertations from 2022 2022

New Developments in Statistical Optimal Designs for Physical and Computer Experiments , Damola M. Akinlana

Advances and Applications of Optimal Polynomial Approximants , Raymond Centner

Data-Driven Analytical Predictive Modeling for Pancreatic Cancer, Financial & Social Systems , Aditya Chakraborty

On Simultaneous Similarity of d-tuples of Commuting Square Matrices , Corey Connelly

Symbolic Computation of Lump Solutions to a Combined (2+1)-dimensional Nonlinear Evolution Equation , Jingwei He

Boundary behavior of analytic functions and Approximation Theory , Spyros Pasias

Stability Analysis of Delay-Driven Coupled Cantilevers Using the Lambert W-Function , Daniel Siebel-Cortopassi

A Functional Optimization Approach to Stochastic Process Sampling , Ryan Matthew Thurman

Theses/Dissertations from 2021 2021

Riemann-Hilbert Problems for Nonlocal Reverse-Time Nonlinear Second-order and Fourth-order AKNS Systems of Multiple Components and Exact Soliton Solutions , Alle Adjiri

Zeros of Harmonic Polynomials and Related Applications , Azizah Alrajhi

Combination of Time Series Analysis and Sentiment Analysis for Stock Market Forecasting , Hsiao-Chuan Chou

Uncertainty Quantification in Deep and Statistical Learning with applications in Bio-Medical Image Analysis , K. Ruwani M. Fernando

Data-Driven Analytical Modeling of Multiple Myeloma Cancer, U.S. Crop Production and Monitoring Process , Lohuwa Mamudu

Long-time Asymptotics for mKdV Type Reduced Equations of the AKNS Hierarchy in Weighted L 2 Sobolev Spaces , Fudong Wang

Online and Adjusted Human Activities Recognition with Statistical Learning , Yanjia Zhang

Theses/Dissertations from 2020 2020

Bayesian Reliability Analysis of The Power Law Process and Statistical Modeling of Computer and Network Vulnerabilities with Cybersecurity Application , Freeh N. Alenezi

Discrete Models and Algorithms for Analyzing DNA Rearrangements , Jasper Braun

Bayesian Reliability Analysis for Optical Media Using Accelerated Degradation Test Data , Kun Bu

On the p(x)-Laplace equation in Carnot groups , Robert D. Freeman

Clustering methods for gene expression data of Oxytricha trifallax , Kyle Houfek

Gradient Boosting for Survival Analysis with Applications in Oncology , Nam Phuong Nguyen

Global and Stochastic Dynamics of Diffusive Hindmarsh-Rose Equations in Neurodynamics , Chi Phan

Restricted Isometric Projections for Differentiable Manifolds and Applications , Vasile Pop

On Some Problems on Polynomial Interpolation in Several Variables , Brian Jon Tuesink

Numerical Study of Gap Distributions in Determinantal Point Process on Low Dimensional Spheres: L -Ensemble of O ( n ) Model Type for n = 2 and n = 3 , Xiankui Yang

Non-Associative Algebraic Structures in Knot Theory , Emanuele Zappala

Theses/Dissertations from 2019 2019

Field Quantization for Radiative Decay of Plasmons in Finite and Infinite Geometries , Maryam Bagherian

Probabilistic Modeling of Democracy, Corruption, Hemophilia A and Prediabetes Data , A. K. M. Raquibul Bashar

Generalized Derivations of Ternary Lie Algebras and n-BiHom-Lie Algebras , Amine Ben Abdeljelil

Fractional Random Weighted Bootstrapping for Classification on Imbalanced Data with Ensemble Decision Tree Methods , Sean Charles Carter

Hierarchical Self-Assembly and Substitution Rules , Daniel Alejandro Cruz

Statistical Learning of Biomedical Non-Stationary Signals and Quality of Life Modeling , Mahdi Goudarzi

Probabilistic and Statistical Prediction Models for Alzheimer’s Disease and Statistical Analysis of Global Warming , Maryam Ibrahim Habadi

Essays on Time Series and Machine Learning Techniques for Risk Management , Michael Kotarinos

The Systems of Post and Post Algebras: A Demonstration of an Obvious Fact , Daviel Leyva

Reconstruction of Radar Images by Using Spherical Mean and Regular Radon Transforms , Ozan Pirbudak

Analyses of Unorthodox Overlapping Gene Segments in Oxytricha Trifallax , Shannon Stich

An Optimal Medium-Strength Regularity Algorithm for 3-uniform Hypergraphs , John Theado

Power Graphs of Quasigroups , DayVon L. Walker

Theses/Dissertations from 2018 2018

Groups Generated by Automata Arising from Transformations of the Boundaries of Rooted Trees , Elsayed Ahmed

Non-equilibrium Phase Transitions in Interacting Diffusions , Wael Al-Sawai

A Hybrid Dynamic Modeling of Time-to-event Processes and Applications , Emmanuel A. Appiah

Lump Solutions and Riemann-Hilbert Approach to Soliton Equations , Sumayah A. Batwa

Developing a Model to Predict Prevalence of Compulsive Behavior in Individuals with OCD , Lindsay D. Fields

Generalizations of Quandles and their cohomologies , Matthew J. Green

Hamiltonian structures and Riemann-Hilbert problems of integrable systems , Xiang Gu

Optimal Latin Hypercube Designs for Computer Experiments Based on Multiple Objectives , Ruizhe Hou

Human Activity Recognition Based on Transfer Learning , Jinyong Pang

Signal Detection of Adverse Drug Reaction using the Adverse Event Reporting System: Literature Review and Novel Methods , Minh H. Pham

Statistical Analysis and Modeling of Cyber Security and Health Sciences , Nawa Raj Pokhrel

Machine Learning Methods for Network Intrusion Detection and Intrusion Prevention Systems , Zheni Svetoslavova Stefanova

Orthogonal Polynomials With Respect to the Measure Supported Over the Whole Complex Plane , Meng Yang

Theses/Dissertations from 2017 2017

Modeling in Finance and Insurance With Levy-It'o Driven Dynamic Processes under Semi Markov-type Switching Regimes and Time Domains , Patrick Armand Assonken Tonfack

Prevalence of Typical Images in High School Geometry Textbooks , Megan N. Cannon

On Extending Hansel's Theorem to Hypergraphs , Gregory Sutton Churchill

Contributions to Quandle Theory: A Study of f-Quandles, Extensions, and Cohomology , Indu Rasika U. Churchill

Linear Extremal Problems in the Hardy Space H p for 0 p , Robert Christopher Connelly

Statistical Analysis and Modeling of Ovarian and Breast Cancer , Muditha V. Devamitta Perera

Statistical Analysis and Modeling of Stomach Cancer Data , Chao Gao

Structural Analysis of Poloidal and Toroidal Plasmons and Fields of Multilayer Nanorings , Kumar Vijay Garapati

Dynamics of Multicultural Social Networks , Kristina B. Hilton

Cybersecurity: Stochastic Analysis and Modelling of Vulnerabilities to Determine the Network Security and Attackers Behavior , Pubudu Kalpani Kaluarachchi

Generalized D-Kaup-Newell integrable systems and their integrable couplings and Darboux transformations , Morgan Ashley McAnally

Patterns in Words Related to DNA Rearrangements , Lukas Nabergall

Time Series Online Empirical Bayesian Kernel Density Segmentation: Applications in Real Time Activity Recognition Using Smartphone Accelerometer , Shuang Na

Schreier Graphs of Thompson's Group T , Allen Pennington

Cybersecurity: Probabilistic Behavior of Vulnerability and Life Cycle , Sasith Maduranga Rajasooriya

Bayesian Artificial Neural Networks in Health and Cybersecurity , Hansapani Sarasepa Rodrigo

Real-time Classification of Biomedical Signals, Parkinson’s Analytical Model , Abolfazl Saghafi

Lump, complexiton and algebro-geometric solutions to soliton equations , Yuan Zhou

Theses/Dissertations from 2016 2016

A Statistical Analysis of Hurricanes in the Atlantic Basin and Sinkholes in Florida , Joy Marie D'andrea

Statistical Analysis of a Risk Factor in Finance and Environmental Models for Belize , Sherlene Enriquez-Savery

Putnam's Inequality and Analytic Content in the Bergman Space , Matthew Fleeman

On the Number of Colors in Quandle Knot Colorings , Jeremy William Kerr

Statistical Modeling of Carbon Dioxide and Cluster Analysis of Time Dependent Information: Lag Target Time Series Clustering, Multi-Factor Time Series Clustering, and Multi-Level Time Series Clustering , Doo Young Kim

Some Results Concerning Permutation Polynomials over Finite Fields , Stephen Lappano

Hamiltonian Formulations and Symmetry Constraints of Soliton Hierarchies of (1+1)-Dimensional Nonlinear Evolution Equations , Solomon Manukure

Modeling and Survival Analysis of Breast Cancer: A Statistical, Artificial Neural Network, and Decision Tree Approach , Venkateswara Rao Mudunuru

Generalized Phase Retrieval: Isometries in Vector Spaces , Josiah Park

Leonard Systems and their Friends , Jonathan Spiewak

Resonant Solutions to (3+1)-dimensional Bilinear Differential Equations , Yue Sun

Statistical Analysis and Modeling Health Data: A Longitudinal Study , Bhikhari Prasad Tharu

Global Attractors and Random Attractors of Reaction-Diffusion Systems , Junyi Tu

Time Dependent Kernel Density Estimation: A New Parameter Estimation Algorithm, Applications in Time Series Classification and Clustering , Xing Wang

On Spectral Properties of Single Layer Potentials , Seyed Zoalroshd

Theses/Dissertations from 2015 2015

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach , Wei Chen

Active Tile Self-assembly and Simulations of Computational Systems , Daria Karpenko

Nearest Neighbor Foreign Exchange Rate Forecasting with Mahalanobis Distance , Vindya Kumari Pathirana

Statistical Learning with Artificial Neural Network Applied to Health and Environmental Data , Taysseer Sharaf

Radial Versus Othogonal and Minimal Projections onto Hyperplanes in l_4^3 , Richard Alan Warner

Ensemble Learning Method on Machine Maintenance Data , Xiaochuang Zhao

Theses/Dissertations from 2014 2014

Properties of Graphs Used to Model DNA Recombination , Ryan Arredondo

Recursive Methods in Number Theory, Combinatorial Graph Theory, and Probability , Jonathan Burns

On the Classification of Groups Generated by Automata with 4 States over a 2-Letter Alphabet , Louis Caponi

Statistical Analysis, Modeling, and Algorithms for Pharmaceutical and Cancer Systems , Bong-Jin Choi

Topological Data Analysis of Properties of Four-Regular Rigid Vertex Graphs , Grant Mcneil Conine

Trend Analysis and Modeling of Health and Environmental Data: Joinpoint and Functional Approach , Ram C. Kafle

Advanced Search

  • Email Notifications and RSS
  • All Collections
  • USF Faculty Publications
  • Open Access Journals
  • Conferences and Events
  • Theses and Dissertations
  • Textbooks Collection

Useful Links

  • Mathematics and Statistics Department
  • Rights Information
  • SelectedWorks
  • Submit Research

Home | About | Help | My Account | Accessibility Statement | Language and Diversity Statements

Privacy Copyright

Have a thesis expert improve your writing

Check your thesis for plagiarism in 10 minutes, generate your apa citations for free.

  • Knowledge Base

The Beginner's Guide to Statistical Analysis | 5 Steps & Examples

Statistical analysis means investigating trends, patterns, and relationships using quantitative data . It is an important research tool used by scientists, governments, businesses, and other organisations.

To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process . You need to specify your hypotheses and make decisions about your research design, sample size, and sampling procedure.

After collecting data from your sample, you can organise and summarise the data using descriptive statistics . Then, you can use inferential statistics to formally test hypotheses and make estimates about the population. Finally, you can interpret and generalise your findings.

This article is a practical introduction to statistical analysis for students and researchers. We’ll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables.

Table of contents

Step 1: write your hypotheses and plan your research design, step 2: collect data from a sample, step 3: summarise your data with descriptive statistics, step 4: test hypotheses or make estimates with inferential statistics, step 5: interpret your results, frequently asked questions about statistics.

To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design.

Writing statistical hypotheses

The goal of research is often to investigate a relationship between variables within a population . You start with a prediction, and use statistical analysis to test that prediction.

A statistical hypothesis is a formal way of writing a prediction about a population. Every research prediction is rephrased into null and alternative hypotheses that can be tested using sample data.

While the null hypothesis always predicts no effect or no relationship between variables, the alternative hypothesis states your research prediction of an effect or relationship.

  • Null hypothesis: A 5-minute meditation exercise will have no effect on math test scores in teenagers.
  • Alternative hypothesis: A 5-minute meditation exercise will improve math test scores in teenagers.
  • Null hypothesis: Parental income and GPA have no relationship with each other in college students.
  • Alternative hypothesis: Parental income and GPA are positively correlated in college students.

Planning your research design

A research design is your overall strategy for data collection and analysis. It determines the statistical tests you can use to test your hypothesis later on.

First, decide whether your research will use a descriptive, correlational, or experimental design. Experiments directly influence variables, whereas descriptive and correlational studies only measure variables.

  • In an experimental design , you can assess a cause-and-effect relationship (e.g., the effect of meditation on test scores) using statistical tests of comparison or regression.
  • In a correlational design , you can explore relationships between variables (e.g., parental income and GPA) without any assumption of causality using correlation coefficients and significance tests.
  • In a descriptive design , you can study the characteristics of a population or phenomenon (e.g., the prevalence of anxiety in U.S. college students) using statistical tests to draw inferences from sample data.

Your research design also concerns whether you’ll compare participants at the group level or individual level, or both.

  • In a between-subjects design , you compare the group-level outcomes of participants who have been exposed to different treatments (e.g., those who performed a meditation exercise vs those who didn’t).
  • In a within-subjects design , you compare repeated measures from participants who have participated in all treatments of a study (e.g., scores from before and after performing a meditation exercise).
  • In a mixed (factorial) design , one variable is altered between subjects and another is altered within subjects (e.g., pretest and posttest scores from participants who either did or didn’t do a meditation exercise).
  • Experimental
  • Correlational

First, you’ll take baseline test scores from participants. Then, your participants will undergo a 5-minute meditation exercise. Finally, you’ll record participants’ scores from a second math test.

In this experiment, the independent variable is the 5-minute meditation exercise, and the dependent variable is the math test score from before and after the intervention. Example: Correlational research design In a correlational study, you test whether there is a relationship between parental income and GPA in graduating college students. To collect your data, you will ask participants to fill in a survey and self-report their parents’ incomes and their own GPA.

Measuring variables

When planning a research design, you should operationalise your variables and decide exactly how you will measure them.

For statistical analysis, it’s important to consider the level of measurement of your variables, which tells you what kind of data they contain:

  • Categorical data represents groupings. These may be nominal (e.g., gender) or ordinal (e.g. level of language ability).
  • Quantitative data represents amounts. These may be on an interval scale (e.g. test score) or a ratio scale (e.g. age).

Many variables can be measured at different levels of precision. For example, age data can be quantitative (8 years old) or categorical (young). If a variable is coded numerically (e.g., level of agreement from 1–5), it doesn’t automatically mean that it’s quantitative instead of categorical.

Identifying the measurement level is important for choosing appropriate statistics and hypothesis tests. For example, you can calculate a mean score with quantitative data, but not with categorical data.

In a research study, along with measures of your variables of interest, you’ll often collect data on relevant participant characteristics.

Population vs sample

In most cases, it’s too difficult or expensive to collect data from every member of the population you’re interested in studying. Instead, you’ll collect data from a sample.

Statistical analysis allows you to apply your findings beyond your own sample as long as you use appropriate sampling procedures . You should aim for a sample that is representative of the population.

Sampling for statistical analysis

There are two main approaches to selecting a sample.

  • Probability sampling: every member of the population has a chance of being selected for the study through random selection.
  • Non-probability sampling: some members of the population are more likely than others to be selected for the study because of criteria such as convenience or voluntary self-selection.

In theory, for highly generalisable findings, you should use a probability sampling method. Random selection reduces sampling bias and ensures that data from your sample is actually typical of the population. Parametric tests can be used to make strong statistical inferences when data are collected using probability sampling.

But in practice, it’s rarely possible to gather the ideal sample. While non-probability samples are more likely to be biased, they are much easier to recruit and collect data from. Non-parametric tests are more appropriate for non-probability samples, but they result in weaker inferences about the population.

If you want to use parametric tests for non-probability samples, you have to make the case that:

  • your sample is representative of the population you’re generalising your findings to.
  • your sample lacks systematic bias.

Keep in mind that external validity means that you can only generalise your conclusions to others who share the characteristics of your sample. For instance, results from Western, Educated, Industrialised, Rich and Democratic samples (e.g., college students in the US) aren’t automatically applicable to all non-WEIRD populations.

If you apply parametric tests to data from non-probability samples, be sure to elaborate on the limitations of how far your results can be generalised in your discussion section .

Create an appropriate sampling procedure

Based on the resources available for your research, decide on how you’ll recruit participants.

  • Will you have resources to advertise your study widely, including outside of your university setting?
  • Will you have the means to recruit a diverse sample that represents a broad population?
  • Do you have time to contact and follow up with members of hard-to-reach groups?

Your participants are self-selected by their schools. Although you’re using a non-probability sample, you aim for a diverse and representative sample. Example: Sampling (correlational study) Your main population of interest is male college students in the US. Using social media advertising, you recruit senior-year male college students from a smaller subpopulation: seven universities in the Boston area.

Calculate sufficient sample size

Before recruiting participants, decide on your sample size either by looking at other studies in your field or using statistics. A sample that’s too small may be unrepresentative of the sample, while a sample that’s too large will be more costly than necessary.

There are many sample size calculators online. Different formulas are used depending on whether you have subgroups or how rigorous your study should be (e.g., in clinical research). As a rule of thumb, a minimum of 30 units or more per subgroup is necessary.

To use these calculators, you have to understand and input these key components:

  • Significance level (alpha): the risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%.
  • Statistical power : the probability of your study detecting an effect of a certain size if there is one, usually 80% or higher.
  • Expected effect size : a standardised indication of how large the expected result of your study will be, usually based on other similar studies.
  • Population standard deviation: an estimate of the population parameter based on a previous study or a pilot study of your own.

Once you’ve collected all of your data, you can inspect them and calculate descriptive statistics that summarise them.

Inspect your data

There are various ways to inspect your data, including the following:

  • Organising data from each variable in frequency distribution tables .
  • Displaying data from a key variable in a bar chart to view the distribution of responses.
  • Visualising the relationship between two variables using a scatter plot .

By visualising your data in tables and graphs, you can assess whether your data follow a skewed or normal distribution and whether there are any outliers or missing data.

A normal distribution means that your data are symmetrically distributed around a center where most values lie, with the values tapering off at the tail ends.

Mean, median, mode, and standard deviation in a normal distribution

In contrast, a skewed distribution is asymmetric and has more values on one end than the other. The shape of the distribution is important to keep in mind because only some descriptive statistics should be used with skewed distributions.

Extreme outliers can also produce misleading statistics, so you may need a systematic approach to dealing with these values.

Calculate measures of central tendency

Measures of central tendency describe where most of the values in a data set lie. Three main measures of central tendency are often reported:

  • Mode : the most popular response or value in the data set.
  • Median : the value in the exact middle of the data set when ordered from low to high.
  • Mean : the sum of all values divided by the number of values.

However, depending on the shape of the distribution and level of measurement, only one or two of these measures may be appropriate. For example, many demographic characteristics can only be described using the mode or proportions, while a variable like reaction time may not have a mode at all.

Calculate measures of variability

Measures of variability tell you how spread out the values in a data set are. Four main measures of variability are often reported:

  • Range : the highest value minus the lowest value of the data set.
  • Interquartile range : the range of the middle half of the data set.
  • Standard deviation : the average distance between each value in your data set and the mean.
  • Variance : the square of the standard deviation.

Once again, the shape of the distribution and level of measurement should guide your choice of variability statistics. The interquartile range is the best measure for skewed distributions, while standard deviation and variance provide the best information for normal distributions.

Using your table, you should check whether the units of the descriptive statistics are comparable for pretest and posttest scores. For example, are the variance levels similar across the groups? Are there any extreme values? If there are, you may need to identify and remove extreme outliers in your data set or transform your data before performing a statistical test.

From this table, we can see that the mean score increased after the meditation exercise, and the variances of the two scores are comparable. Next, we can perform a statistical test to find out if this improvement in test scores is statistically significant in the population. Example: Descriptive statistics (correlational study) After collecting data from 653 students, you tabulate descriptive statistics for annual parental income and GPA.

It’s important to check whether you have a broad range of data points. If you don’t, your data may be skewed towards some groups more than others (e.g., high academic achievers), and only limited inferences can be made about a relationship.

A number that describes a sample is called a statistic , while a number describing a population is called a parameter . Using inferential statistics , you can make conclusions about population parameters based on sample statistics.

Researchers often use two main methods (simultaneously) to make inferences in statistics.

  • Estimation: calculating population parameters based on sample statistics.
  • Hypothesis testing: a formal process for testing research predictions about the population using samples.

You can make two types of estimates of population parameters from sample statistics:

  • A point estimate : a value that represents your best guess of the exact parameter.
  • An interval estimate : a range of values that represent your best guess of where the parameter lies.

If your aim is to infer and report population characteristics from sample data, it’s best to use both point and interval estimates in your paper.

You can consider a sample statistic a point estimate for the population parameter when you have a representative sample (e.g., in a wide public opinion poll, the proportion of a sample that supports the current government is taken as the population proportion of government supporters).

There’s always error involved in estimation, so you should also provide a confidence interval as an interval estimate to show the variability around a point estimate.

A confidence interval uses the standard error and the z score from the standard normal distribution to convey where you’d generally expect to find the population parameter most of the time.

Hypothesis testing

Using data from a sample, you can test hypotheses about relationships between variables in the population. Hypothesis testing starts with the assumption that the null hypothesis is true in the population, and you use statistical tests to assess whether the null hypothesis can be rejected or not.

Statistical tests determine where your sample data would lie on an expected distribution of sample data if the null hypothesis were true. These tests give two main outputs:

  • A test statistic tells you how much your data differs from the null hypothesis of the test.
  • A p value tells you the likelihood of obtaining your results if the null hypothesis is actually true in the population.

Statistical tests come in three main varieties:

  • Comparison tests assess group differences in outcomes.
  • Regression tests assess cause-and-effect relationships between variables.
  • Correlation tests assess relationships between variables without assuming causation.

Your choice of statistical test depends on your research questions, research design, sampling method, and data characteristics.

Parametric tests

Parametric tests make powerful inferences about the population based on sample data. But to use them, some assumptions must be met, and only some types of variables can be used. If your data violate these assumptions, you can perform appropriate data transformations or use alternative non-parametric tests instead.

A regression models the extent to which changes in a predictor variable results in changes in outcome variable(s).

  • A simple linear regression includes one predictor variable and one outcome variable.
  • A multiple linear regression includes two or more predictor variables and one outcome variable.

Comparison tests usually compare the means of groups. These may be the means of different groups within a sample (e.g., a treatment and control group), the means of one sample group taken at different times (e.g., pretest and posttest scores), or a sample mean and a population mean.

  • A t test is for exactly 1 or 2 groups when the sample is small (30 or less).
  • A z test is for exactly 1 or 2 groups when the sample is large.
  • An ANOVA is for 3 or more groups.

The z and t tests have subtypes based on the number and types of samples and the hypotheses:

  • If you have only one sample that you want to compare to a population mean, use a one-sample test .
  • If you have paired measurements (within-subjects design), use a dependent (paired) samples test .
  • If you have completely separate measurements from two unmatched groups (between-subjects design), use an independent (unpaired) samples test .
  • If you expect a difference between groups in a specific direction, use a one-tailed test .
  • If you don’t have any expectations for the direction of a difference between groups, use a two-tailed test .

The only parametric correlation test is Pearson’s r . The correlation coefficient ( r ) tells you the strength of a linear relationship between two quantitative variables.

However, to test whether the correlation in the sample is strong enough to be important in the population, you also need to perform a significance test of the correlation coefficient, usually a t test, to obtain a p value. This test uses your sample size to calculate how much the correlation coefficient differs from zero in the population.

You use a dependent-samples, one-tailed t test to assess whether the meditation exercise significantly improved math test scores. The test gives you:

  • a t value (test statistic) of 3.00
  • a p value of 0.0028

Although Pearson’s r is a test statistic, it doesn’t tell you anything about how significant the correlation is in the population. You also need to test whether this sample correlation coefficient is large enough to demonstrate a correlation in the population.

A t test can also determine how significantly a correlation coefficient differs from zero based on sample size. Since you expect a positive correlation between parental income and GPA, you use a one-sample, one-tailed t test. The t test gives you:

  • a t value of 3.08
  • a p value of 0.001

The final step of statistical analysis is interpreting your results.

Statistical significance

In hypothesis testing, statistical significance is the main criterion for forming conclusions. You compare your p value to a set significance level (usually 0.05) to decide whether your results are statistically significant or non-significant.

Statistically significant results are considered unlikely to have arisen solely due to chance. There is only a very low chance of such a result occurring if the null hypothesis is true in the population.

This means that you believe the meditation intervention, rather than random factors, directly caused the increase in test scores. Example: Interpret your results (correlational study) You compare your p value of 0.001 to your significance threshold of 0.05. With a p value under this threshold, you can reject the null hypothesis. This indicates a statistically significant correlation between parental income and GPA in male college students.

Note that correlation doesn’t always mean causation, because there are often many underlying factors contributing to a complex variable like GPA. Even if one variable is related to another, this may be because of a third variable influencing both of them, or indirect links between the two variables.

Effect size

A statistically significant result doesn’t necessarily mean that there are important real life applications or clinical outcomes for a finding.

In contrast, the effect size indicates the practical significance of your results. It’s important to report effect sizes along with your inferential statistics for a complete picture of your results. You should also report interval estimates of effect sizes if you’re writing an APA style paper .

With a Cohen’s d of 0.72, there’s medium to high practical significance to your finding that the meditation exercise improved test scores. Example: Effect size (correlational study) To determine the effect size of the correlation coefficient, you compare your Pearson’s r value to Cohen’s effect size criteria.

Decision errors

Type I and Type II errors are mistakes made in research conclusions. A Type I error means rejecting the null hypothesis when it’s actually true, while a Type II error means failing to reject the null hypothesis when it’s false.

You can aim to minimise the risk of these errors by selecting an optimal significance level and ensuring high power . However, there’s a trade-off between the two errors, so a fine balance is necessary.

Frequentist versus Bayesian statistics

Traditionally, frequentist statistics emphasises null hypothesis significance testing and always starts with the assumption of a true null hypothesis.

However, Bayesian statistics has grown in popularity as an alternative approach in the last few decades. In this approach, you use previous research to continually update your hypotheses based on your expectations and observations.

Bayes factor compares the relative strength of evidence for the null versus the alternative hypothesis rather than making a conclusion about rejecting the null hypothesis or not.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

The research methods you use depend on the type of data you need to answer your research question .

  • If you want to measure something or test a hypothesis , use quantitative methods . If you want to explore ideas, thoughts, and meanings, use qualitative methods .
  • If you want to analyse a large amount of readily available data, use secondary data. If you want data specific to your purposes with control over how they are generated, collect primary data.
  • If you want to establish cause-and-effect relationships between variables , use experimental methods. If you want to understand the characteristics of a research subject, use descriptive methods.

Statistical analysis is the main method for analyzing quantitative research data . It uses probabilities and models to test predictions about a population from sample data.

Is this article helpful?

Other students also liked, a quick guide to experimental design | 5 steps & examples, controlled experiments | methods & examples of control, between-subjects design | examples, pros & cons, more interesting articles.

  • Central Limit Theorem | Formula, Definition & Examples
  • Central Tendency | Understanding the Mean, Median & Mode
  • Correlation Coefficient | Types, Formulas & Examples
  • Descriptive Statistics | Definitions, Types, Examples
  • How to Calculate Standard Deviation (Guide) | Calculator & Examples
  • How to Calculate Variance | Calculator, Analysis & Examples
  • How to Find Degrees of Freedom | Definition & Formula
  • How to Find Interquartile Range (IQR) | Calculator & Examples
  • How to Find Outliers | Meaning, Formula & Examples
  • How to Find the Geometric Mean | Calculator & Formula
  • How to Find the Mean | Definition, Examples & Calculator
  • How to Find the Median | Definition, Examples & Calculator
  • How to Find the Range of a Data Set | Calculator & Formula
  • Inferential Statistics | An Easy Introduction & Examples
  • Levels of measurement: Nominal, ordinal, interval, ratio
  • Missing Data | Types, Explanation, & Imputation
  • Normal Distribution | Examples, Formulas, & Uses
  • Null and Alternative Hypotheses | Definitions & Examples
  • Poisson Distributions | Definition, Formula & Examples
  • Skewness | Definition, Examples & Formula
  • T-Distribution | What It Is and How To Use It (With Examples)
  • The Standard Normal Distribution | Calculator, Examples & Uses
  • Type I & Type II Errors | Differences, Examples, Visualizations
  • Understanding Confidence Intervals | Easy Examples & Formulas
  • Variability | Calculating Range, IQR, Variance, Standard Deviation
  • What is Effect Size and Why Does It Matter? (Examples)
  • What Is Interval Data? | Examples & Definition
  • What Is Nominal Data? | Examples & Definition
  • What Is Ordinal Data? | Examples & Definition
  • What Is Ratio Data? | Examples & Definition
  • What Is the Mode in Statistics? | Definition, Examples & Calculator
  • Utility Menu

University Logo

Department of Statistics

4c69b3a36a33a4c1c5b5cd3ef5360949.

  • Open Positions

Undergraduate

Students concentrating in Statistics acquire the conceptual, computational, and mathematical tools for quantifying uncertainty and making sense of complex data arising from many applications.

Statistics provides quantitative methods for analyzing data, making rational decisions under uncertainty, designing experiments, and modeling randomness and variability in the world. Statistics has a theoretical core surrounded by a vast number of domains of application in fields such as: anthropology, astronomy, biology, business, chemistry, computer science, economics, education, engineering, environmental sciences, epidemiology, finance, government, history, law, linguistics, medicine, physics, psychology, sociology, and many others. A New York Times article pointed out the increasing demand for statisticians in an article with the headline: "For Today's Graduate, Just One Word: Statistics".

For Additional Questions

Undergraduate news.

Hoopes Winners at Commencement

Harvard Statistics Congratulates 2024 Taliesin Prize and 7 Hoopes Winners 

With commencement and award season upon us, we are pleased to announce Harvard prizes that our concentrators have received this year, including our seven 2024 Hoopes Prize winners and one Taliesin Prize for Distinction in the Art of Learning.  

We celebrate the accomplishments of all our seniors, including 36 who took on the challenging feat of writing a thesis.  Supported by the estate of Thomas T. Hoopes (class of 1919), the ...

statistics for thesis

William Nickols and Kevin Luo Receive 2024 Department of Statistics Concurrent Masters Prize at Commencement

We are thrilled to announce that the recipients of the 2024 Department of Statistics Concurrent Masters Prize are William Nickols (a Joint Concentrator in Chemical & Physical Biology and Statistics) and Kevin Luo (a Joint Concentrator in Statistics and Mathematics).  During the Statistics Department Commencement Celebration on May 23rd, Will and Kevin received award certificates acknowledging their accomplishments.  Congratulations to Will and Kevin!

Skyler Wu receiving his Senior Concentrator award from Professor Joe Blitzstein

Skyler Wu Receives 2024 Department of Statistics Senior Concentrator Prize at Commencement

The Department awarded the 2024 Department of Statistics Senior Concentrator Prize to Skyler Wu at our Commencement Celebration on May 23rd.  Skyler, who is graduating with an A.B. in Statistics and Mathematics (Joint) and S.M. in Applied Mathematics, was selected based on the high quality of his coursework and thesis.  Congratulations on receiving this recognition for your hard work, Skyler!

The Department of Statistics Senior Concentrator Prize (originally named the Department of Statistics Undergraduate Prize) was founded in...

Rachel Li Receiving Award

Rachel Li Receives 2023 Department of Statistics Senior Concentrator Prize

This year, concentrator alum Rachel Li received the May 2023 Department of Statistics Senior Concentrator Prize for her superb coursework in the concentration and for an outstanding thesis.  Originally named the Department of Statistics Undergraduate Prize, the award was founded in 2020 and is given annually to the graduating senior concentrator who has the best overall performance and has contributed significantly to the department.  In an interview (edited and excerpted below), Rachel shared her most...

Ginnie Ma and Jason Zhou receive awards

Jason Zhou Receives 2023 Department of Statistics Concurrent Masters Prize

Statistics concentrator and master’s alum Jason Zhou was awarded the inaugural Department of Statistics Concurrent Masters Prize in May 2023 (along with alum Virginia Ma).  This prize is given annually to the graduating concurrent master’s student who has the best overall performance (as indicated by coursework results), has demonstrated achievements in statistics outside of coursework, and has contributed significantly to the department.  To celebrate his award and learn about his thesis...

Dinan Elsyad

Dinan Hamdi Elsyad Receives Best Video Presentation at eUSR 2023

The Statistics Department congratulates undergraduate Dinan Hamdi Elsyad (a junior statistics concentrator) on receiving the best video presentation at the  2023 Electronic Undergraduate Statistics Research Conference  for her summer research work with Niels Korsgaard (Harvard), Kate Hu (Harvard), Grayson White (Michigan State), Josh Yamamoto, and George Gaines (research scientist...

Lucy Tu

Lucy Tu Receives AAAS Mass Media Fellowship from ASA

The Department of Statistics would like to congratulate Harvard undergraduate Lucy Tu, who is studying sociology, neuroscience, and the history of science, on her American Association for the Advancement of Science (AAAS) Mass Media Fellowship from the American Statistical Association.  The purpose of the AAAS Mass Media Fellowship is to support scholars on projects related to increasing public understanding of science and technology.  

Concentrators James Celi Kitch, Kavya Mehul Shah, and Jason Zhou Receive Awards

The Department of Statistics would like to congratulate our two concentrator recipients of the  Sophia Freund Prize , Kavya Mehul Shah ('23 AB) and Jason Zhou ('23 AB/AM), and our two concentrators who have been awarded the  Thomas T....

Rachel Li Receives 2023 Department of Statistics Senior Concentrator Prize at Commencement

The Department has awarded the 2023 Department of Statistics Senior Concentrator Prize to Rachel Li for her superb coursework in the concentration and for an outstanding thesis. Congratulations on your achievement, Rachel!

The Department of Statistics Senior Concentrator Prize (originally named the Department of Statistics Undergraduate Prize) was founded in 2020.  It is given annually to the graduating senior concentrator who has the best overall performance (as indicated by coursework results and thesis) and who has contributed...

Virginia Linqian Ma and Jason Zhou Receive 2023 Department of Statistics Concurrent Masters Prize

We are pleased to announce that Virginia (Ginnie) Linqian Ma, '23 AB in Mathematics and AM in Statistics, and Jason Zhou, '23 AB in Mathematics and AB/AM in Statistics, have received the inaugural 2023 Department of Statistics Concurrent Masters Prize.  This prize will be awarded annually to the graduating student having completed the Concurrent Masters program in Statistics who has the best overall performance (as indicated by coursework results), who has demonstrated achievements in Statistics outside of coursework, and who has contributed significantly to the...

Kelly McConville Picture

Dr. Kelly McConville is Awarded 2023 Alpha-Iota Prize for Excellence in Teaching

FND 2022 Student Activity

Harvard Statistics Hosts Inaugural Florence Nightingale Outreach Day

On Saturday, October 22 nd , the halls of the Harvard Science Center were buzzing with middle and high school students at Harvard’s inaugural Florence Nightingale Day (FND).  Introduced by the American Statistical Association and the Caucus for Women in Statistics (CWS) in 2018 , FND is part of an...

Jing and Asteria

Asteria Chilambo and Jing Shang Receive Best Video Presentation at eUSR Conference

2022 Concentrators

Department of Statistics Launches 2022 Newsletter

Michele Zemplenyi

Profile of Alumna Michele Zemplenyi, Fellow at the U.S. Department of Energy

After graduating from Harvard with an A.B. in Statistics in 2013, alumna Michele Zemplenyi had a plethora of career options to consider. Dr. Zemplenyi was not only a statistics concentrator but also completed premed requirements and a thesis in conjunction with the Stem Cell and Regenerative Biology Department. Choice is great, but it can also be intimidating (as our senior concentrators probably know!).

Fortunately, Michele shared with us how she carved out a career path, starting with statistics in undergrad to a PhD in biostats to a current Fellow position in the Federal...

Professors Janson and Kakade Introduce Undergraduate Reinforcement Learning Course

It’s the start of the fall semester and time to map out a course schedule, but what will work best for you? You love math, so maybe you should stock up on quantitative courses; on the other hand, it might be a good idea to get some course requirements out of the way. You’re a night owl, but maybe you should take those morning classes so that you can play badminton later in the day. While the new course...

Kelly McConville

Professor Kelly McConville Reimagines Harvard’s Stat 100 Course

“The exciting challenge of revamping Stat 100, which is our most general audience intro stats class, was an opportunity that I couldn't say no to,” reflects Professor Kelly McConville, a Senior Lecturer in the Harvard Statistics Department, on her decision to join the Department.  A survey statistician with interests in machine learning and statistics and data science education, Professor McConville arrived in January 2022 from Reed College, where she was an Associate Professor of...

McConville's Summer Research Group

Professor Kelly McConville's Undergraduate Research Group Conducts US Forest Service Projects

Professor Kelly McConville and her summer undergraduate research group are collaborating on survey statistics projects for the US Forest Service.  More specifically, they are working on several different projects, including an exploration of zero-inflation models for estimating forest parameters over small regions, the creation of a watersheds estimation dashboard to inform new precision targets for forest inventory, small area estimates, and the construction of a tool to share carbon accounting estimates and their standard errors.  During their busy day of research, they...

Yash Nair Undergraduate Thesis Prize

Undergraduate Thesis Prize Profile: Yash Nair

“Every weekend I go down to the river and try to learn the first trick that everyone does.  When you see it, it doesn’t look impressive but it’s very scary.  I need all the knee pads and elbow pads before I am comfortable doing it.”

Graduating stats concentrator Yash Nair is not afraid to tackle new challenges, whether it’s landing that first skateboarding trick or officially becoming a stats concentrator only a few months ago (it helped that he took many relevant classes along the way!).  Yash capped off his undergraduate experience by receiving the...

Undergraduate Program Information

concentrations

Concentration Requirements

Learning in our Department

secondary field

Secondary Field Requirements 

Students primarily focusing on another discipline benefit from a solid grasp of Statistics, as well.

tracks

Degree Tracks

Concentration requirements are fulfilled via any of the four tracks. Explore them here.  

FAQs

Frequently Asked Questions

We've got you covered. Find answers here.  

Using Your Statistics Degree

Stephen Blyth

Stephanie Mitchell/Harvard Staff Photographer Harvard Gazette , April 1, 2010

Recent alumni are applying their knowledge in a very wide variety of companies and graduate programs, including tech companies such as Google and IBM, investment banks such as Goldman Sachs and Credit Suisse, hedge funds such as D.E. Shaw and AlphaSimplex, medical schools, and Statistics PhD programs.  

Opportunities at Harvard

Primo (business), bliss (social sciences), harvard undergraduate research through prise (physical and life sciences), harvard school of public health summer program in quantitative sciences (for minority, first-generation or low income college students), department resources, statistics courses tree, senior theses guidelines, need assistance.

The Writing Center • University of North Carolina at Chapel Hill

There are lies, damned lies, and statistics. —Mark Twain

What this handout is about

The purpose of this handout is to help you use statistics to make your argument as effectively as possible.

Introduction

Numbers are power. Apparently freed of all the squishiness and ambiguity of words, numbers and statistics are powerful pieces of evidence that can effectively strengthen any argument. But statistics are not a panacea. As simple and straightforward as these little numbers promise to be, statistics, if not used carefully, can create more problems than they solve.

Many writers lack a firm grasp of the statistics they are using. The average reader does not know how to properly evaluate and interpret the statistics they read. The main reason behind the poor use of statistics is a lack of understanding about what statistics can and cannot do. Many people think that statistics can speak for themselves. But numbers are as ambiguous as words and need just as much explanation.

In many ways, this problem is quite similar to that experienced with direct quotes. Too often, quotes are expected to do all the work and are treated as part of the argument, rather than a piece of evidence requiring interpretation (see our handout on how to quote .) But if you leave the interpretation up to the reader, who knows what sort of off-the-wall interpretations may result? The only way to avoid this danger is to supply the interpretation yourself.

But before we start writing statistics, let’s actually read a few.

Reading statistics

As stated before, numbers are powerful. This is one of the reasons why statistics can be such persuasive pieces of evidence. However, this same power can also make numbers and statistics intimidating. That is, we too often accept them as gospel, without ever questioning their veracity or appropriateness. While this may seem like a positive trait when you plug them into your paper and pray for your reader to submit to their power, remember that before we are writers of statistics, we are readers. And to be effective readers means asking the hard questions. Below you will find a useful set of hard questions to ask of the numbers you find.

1. Does your evidence come from reliable sources?

This is an important question not only with statistics, but with any evidence you use in your papers. As we will see in this handout, there are many ways statistics can be played with and misrepresented in order to produce a desired outcome. Therefore, you want to take your statistics from reliable sources (for more information on finding reliable sources, please see our handout on evaluating print sources ). This is not to say that reliable sources are infallible, but only that they are probably less likely to use deceptive practices. With a credible source, you may not need to worry as much about the questions that follow. Still, remember that reading statistics is a bit like being in the middle of a war: trust no one; suspect everyone.

2. What is the data’s background?

Data and statistics do not just fall from heaven fully formed. They are always the product of research. Therefore, to understand the statistics, you should also know where they come from. For example, if the statistics come from a survey or poll, some questions to ask include:

  • Who asked the questions in the survey/poll?
  • What, exactly, were the questions?
  • Who interpreted the data?
  • What issue prompted the survey/poll?
  • What (policy/procedure) potentially hinges on the results of the poll?
  • Who stands to gain from particular interpretations of the data?

All these questions help you orient yourself toward possible biases or weaknesses in the data you are reading. The goal of this exercise is not to find “pure, objective” data but to make any biases explicit, in order to more accurately interpret the evidence.

3. Are all data reported?

In most cases, the answer to this question is easy: no, they aren’t. Therefore, a better way to think about this issue is to ask whether all data have been presented in context. But it is much more complicated when you consider the bigger issue, which is whether the text or source presents enough evidence for you to draw your own conclusion. A reliable source should not exclude data that contradicts or weakens the information presented.

An example can be found on the evening news. If you think about ice storms, which make life so difficult in the winter, you will certainly remember the newscasters warning people to stay off the roads because they are so treacherous. To verify this point, they tell you that the Highway Patrol has already reported 25 accidents during the day. Their intention is to scare you into staying home with this number. While this number sounds high, some studies have found that the number of accidents actually goes down on days with severe weather. Why is that? One possible explanation is that with fewer people on the road, even with the dangerous conditions, the number of accidents will be less than on an “average” day. The critical lesson here is that even when the general interpretation is “accurate,” the data may not actually be evidence for the particular interpretation. This means you have no way to verify if the interpretation is in fact correct.

There is generally a comparison implied in the use of statistics. How can you make a valid comparison without having all the facts? Good question. You may have to look to another source or sources to find all the data you need.

4. Have the data been interpreted correctly?

If the author gives you their statistics, it is always wise to interpret them yourself. That is, while it is useful to read and understand the author’s interpretation, it is merely that—an interpretation. It is not the final word on the matter. Furthermore, sometimes authors (including you, so be careful) can use perfectly good statistics and come up with perfectly bad interpretations. Here are two common mistakes to watch out for:

  • Confusing correlation with causation. Just because two things vary together does not mean that one of them is causing the other. It could be nothing more than a coincidence, or both could be caused by a third factor. Such a relationship is called spurious.The classic example is a study that found that the more firefighters sent to put out a fire, the more damage the fire did. Yikes! I thought firefighters were supposed to make things better, not worse! But before we start shutting down fire stations, it might be useful to entertain alternative explanations. This seemingly contradictory finding can be easily explained by pointing to a third factor that causes both: the size of the fire. The lesson here? Correlation does not equal causation. So it is important not only to think about showing that two variables co-vary, but also about the causal mechanism.
  • Ignoring the margin of error. When survey results are reported, they frequently include a margin of error. You might see this written as “a margin of error of plus or minus 5 percentage points.” What does this mean? The simple story is that surveys are normally generated from samples of a larger population, and thus they are never exact. There is always a confidence interval within which the general population is expected to fall. Thus, if I say that the number of UNC students who find it difficult to use statistics in their writing is 60%, plus or minus 4%, that means, assuming the normal confidence interval of 95%, that with 95% certainty we can say that the actual number is between 56% and 64%.

Why does this matter? Because if after introducing this handout to the students of UNC, a new poll finds that only 56%, plus or minus 3%, are having difficulty with statistics, I could go to the Writing Center director and ask for a raise, since I have made a significant contribution to the writing skills of the students on campus. However, she would no doubt point out that a) this may be a spurious relationship (see above) and b) the actual change is not significant because it falls within the margin of error for the original results. The lesson here? Margins of error matter, so you cannot just compare simple percentages.

Finally, you should keep in mind that the source you are actually looking at may not be the original source of your data. That is, if you find an essay that quotes a number of statistics in support of its argument, often the author of the essay is using someone else’s data. Thus, you need to consider not only your source, but the author’s sources as well.

Writing statistics

As you write with statistics, remember your own experience as a reader of statistics. Don’t forget how frustrated you were when you came across unclear statistics and how thankful you were to read well-presented ones. It is a sign of respect to your reader to be as clear and straightforward as you can be with your numbers. Nobody likes to be played for a fool. Thus, even if you think that changing the numbers just a little bit will help your argument, do not give in to the temptation.

As you begin writing, keep the following in mind. First, your reader will want to know the answers to the same questions that we discussed above. Second, you want to present your statistics in a clear, unambiguous manner. Below you will find a list of some common pitfalls in the world of statistics, along with suggestions for avoiding them.

1. The mistake of the “average” writer

Nobody wants to be average. Moreover, nobody wants to just see the word “average” in a piece of writing. Why? Because nobody knows exactly what it means. There are not one, not two, but three different definitions of “average” in statistics, and when you use the word, your reader has only a 33.3% chance of guessing correctly which one you mean.

For the following definitions, please refer to this set of numbers: 5, 5, 5, 8, 12, 14, 21, 33, 38

  • Mean (arithmetic mean) This may be the most average definition of average (whatever that means). This is the weighted average—a total of all numbers included divided by the quantity of numbers represented. Thus the mean of the above set of numbers is 5+5+5+8+12+14+21+33+38, all divided by 9, which equals 15.644444444444 (Wow! That is a lot of numbers after the decimal—what do we do about that? Precision is a good thing, but too much of it is over the top; it does not necessarily make your argument any stronger. Consider the reasonable amount of precision based on your input and round accordingly. In this case, 15.6 should do the trick.)
  • Median Depending on whether you have an odd or even set of numbers, the median is either a) the number midway through an odd set of numbers or b) a value halfway between the two middle numbers in an even set. For the above set (an odd set of 9 numbers), the median is 12. (5, 5, 5, 8 < 12 < 14, 21, 33, 38)
  • Mode The mode is the number or value that occurs most frequently in a series. If, by some cruel twist of fate, two or more values occur with the same frequency, then you take the mean of the values. For our set, the mode would be 5, since it occurs 3 times, whereas all other numbers occur only once.

As you can see, the numbers can vary considerably, as can their significance. Therefore, the writer should always inform the reader which average they are using. Otherwise, confusion will inevitably ensue.

2. Match your facts with your questions

Be sure that your statistics actually apply to the point/argument you are making. If we return to our discussion of averages, depending on the question you are interesting in answering, you should use the proper statistics.

Perhaps an example would help illustrate this point. Your professor hands back the midterm. The grades are distributed as follows:

The professor felt that the test must have been too easy, because the average (median) grade was a 95.

When a colleague asked her about how the midterm grades came out, she answered, knowing that her classes were gaining a reputation for being “too easy,” that the average (mean) grade was an 80.

When your parents ask you how you can justify doing so poorly on the midterm, you answer, “Don’t worry about my 63. It is not as bad as it sounds. The average (mode) grade was a 58.”

I will leave it up to you to decide whether these choices are appropriate. Selecting the appropriate facts or statistics will help your argument immensely. Not only will they actually support your point, but they will not undermine the legitimacy of your position. Think about how your parents will react when they learn from the professor that the average (median) grade was 95! The best way to maintain precision is to specify which of the three forms of “average” you are using.

3. Show the entire picture

Sometimes, you may misrepresent your evidence by accident and misunderstanding. Other times, however, misrepresentation may be slightly less innocent. This can be seen most readily in visual aids. Do not shape and “massage” the representation so that it “best supports” your argument. This can be achieved by presenting charts/graphs in numerous different ways. Either the range can be shortened (to cut out data points which do not fit, e.g., starting a time series too late or ending it too soon), or the scale can be manipulated so that small changes look big and vice versa. Furthermore, do not fiddle with the proportions, either vertically or horizontally. The fact that USA Today seems to get away with these techniques does not make them OK for an academic argument.

Charts A, B, and C all use the same data points, but the stories they seem to be telling are quite different. Chart A shows a mild increase, followed by a slow decline. Chart B, on the other hand, reveals a steep jump, with a sharp drop-off immediately following. Conversely, Chart C seems to demonstrate that there was virtually no change over time. These variations are a product of changing the scale of the chart. One way to alleviate this problem is to supplement the chart by using the actual numbers in your text, in the spirit of full disclosure.

Another point of concern can be seen in Charts D and E. Both use the same data as charts A, B, and C for the years 1985-2000, but additional time points, using two hypothetical sets of data, have been added back to 1965. Given the different trends leading up to 1985, consider how the significance of recent events can change. In Chart D, the downward trend from 1990 to 2000 is going against a long-term upward trend, whereas in Chart E, it is merely the continuation of a larger downward trend after a brief upward turn.

One of the difficulties with visual aids is that there is no hard and fast rule about how much to include and what to exclude. Judgment is always involved. In general, be sure to present your visual aids so that your readers can draw their own conclusions from the facts and verify your assertions. If what you have cut out could affect the reader’s interpretation of your data, then you might consider keeping it.

4. Give bases of all percentages

Because percentages are always derived from a specific base, they are meaningless until associated with a base. So even if I tell you that after this reading this handout, you will be 23% more persuasive as a writer, that is not a very meaningful assertion because you have no idea what it is based on—23% more persuasive than what?

Let’s look at crime rates to see how this works. Suppose we have two cities, Springfield and Shelbyville. In Springfield, the murder rate has gone up 75%, while in Shelbyville, the rate has only increased by 10%. Which city is having a bigger murder problem? Well, that’s obvious, right? It has to be Springfield. After all, 75% is bigger than 10%.

Hold on a second, because this is actually much less clear than it looks. In order to really know which city has a worse problem, we have to look at the actual numbers. If I told you that Springfield had 4 murders last year and 7 this year, and Shelbyville had 30 murders last year and 33 murders this year, would you change your answer? Maybe, since 33 murders are significantly more than 7. One would certainly feel safer in Springfield, right?

Not so fast, because we still do not have all the facts. We have to make the comparison between the two based on equivalent standards. To do that, we have to look at the per capita rate (often given in rates per 100,000 people per year). If Springfield has 700 residents while Shelbyville has 3.3 million, then Springfield has a murder rate of 1,000 per 100,000 people, and Shelbyville’s rate is merely 1 per 100,000. Gadzooks! The residents of Springfield are dropping like flies. I think I’ll stick with nice, safe Shelbyville, thank you very much.

Percentages are really no different from any other form of statistics: they gain their meaning only through their context. Consequently, percentages should be presented in context so that readers can draw their own conclusions as you emphasize facts important to your argument. Remember, if your statistics really do support your point, then you should have no fear of revealing the larger context that frames them.

Important questions to ask (and answer) about statistics

  • Is the question being asked relevant?
  • Do the data come from reliable sources?
  • Margin of error/confidence interval—when is a change really a change?
  • Are all data reported, or just the best/worst?
  • Are the data presented in context?
  • Have the data been interpreted correctly?
  • Does the author confuse correlation with causation?

Now that you have learned the lessons of statistics, you have two options. Use this knowledge to manipulate your numbers to your advantage, or use this knowledge to better understand and use statistics to make accurate and fair arguments. The choice is yours. Nine out of ten writers, however, prefer the latter, and the other one later regrets their decision.

You may reproduce it for non-commercial use if you use the entire handout and attribute the source: The Writing Center, University of North Carolina at Chapel Hill

Make a Gift

Duke University Libraries

Statistical Science

  • Undergraduate theses
  • Finding information @ Duke
  • Data sets & collections
  • Data & visualization services This link opens in a new window
  • Statistics consulting This link opens in a new window
  • Citing sources
  • Excel This link opens in a new window
  • Bayesian statistics
  • Actuarial science
  • Sports analytics

Librarian for the Nicholas School of the Environment

Profile Photo

Ask a Librarian

Submit thesis to dukespace.

If you are an undergraduate honors student interested in submitting your thesis to DukeSpace , Duke University's online repository for publications and other archival materials in digital format, please contact Joan Durso to get this process started.

DukeSpace Electronic Theses and Dissertations (ETD) Submission Tutorial

  • DukeSpace Electronic Theses and Dissertation Self-Submission Guide

Need help submitting your thesis? Contact  [email protected] .

  • << Previous: Sports analytics
  • Last Updated: May 22, 2024 11:27 AM
  • URL: https://guides.library.duke.edu/stats

Duke University Libraries

Services for...

  • Faculty & Instructors
  • Graduate Students
  • Undergraduate Students
  • International Students
  • Patrons with Disabilities

Twitter

  • Harmful Language Statement
  • Re-use & Attribution / Privacy
  • Support the Libraries

Creative Commons License

Secondary Menu

  • Master's Thesis

As an integral component of the Master of Science in Statistical Science program, you can submit and defend a Master's Thesis. Your Master's Committee administers this oral examination. If you choose to defend a thesis, it is advisable to commence your research early, ideally during your second semester or the summer following your first year in the program. It's essential to allocate sufficient time for the thesis writing process. Your thesis advisor, who also serves as the committee chair, must approve both your thesis title and proposal. The final thesis work necessitates approval from all committee members and must adhere to the  Master's thesis requirements  set forth by the Duke University Graduate School.

Master’s BEST Award 

Each second-year Duke Master’s of Statistical Science (MSS) student defending their MSS thesis may be eligible for the  Master’s BEST Award . The Statistical Science faculty BEST Award Committee selects the awardee based on the submitted thesis of MSS thesis students, and the award is presented at the departmental graduation ceremony. 

Thesis Proposal

All second-year students choosing to do a thesis must submit a proposal (not more than two pages) approved by their thesis advisor to the Master's Director via Qualtrics by November 10th.  The thesis proposal should include a title,  the thesis advisor, committee members, and a description of your work. The description must introduce the research topic, outline its main objectives, and emphasize the significance of the research and its implications while identifying gaps in existing statistical literature. In addition, it can include some of the preliminary results. 

Committee members

MSS Students will have a thesis committee, which includes three faculty members - two must be departmental primary faculty, and the third could be from an external department in an applied area of the student’s interest, which must be a  Term Graduate Faculty through the Graduate School or have a secondary appointment with the Department of Statistical Science. All Committee members must be familiar with the Student’s work.  The department coordinates Committee approval. The thesis defense committee must be approved at least 30 days before the defense date.

Thesis Timeline and  Departmental Process:

Before defense:.

Intent to Graduate: Students must file an Intent to Graduate in ACES, specifying "Thesis Defense" during the application. For graduation deadlines, please refer to https://gradschool.duke.edu/academics/preparing-graduate .

Scheduling Thesis Defense: The student collaborates with the committee to set the date and time for the defense and communicates this information to the department, along with the thesis title. The defense must be scheduled during regular class sessions. Be sure to review the thesis defense and submission deadlines at https://gradschool.duke.edu/academics/theses-and-dissertations/

Room Reservations: The department arranges room reservations and sends confirmation details to the student, who informs committee members of the location.

Defense Announcement: The department prepares a defense announcement, providing a copy to the student and chair. After approval, it is signed by the Master's Director and submitted to the Graduate School. Copies are also posted on department bulletin boards.

Initial Thesis Submission: Two weeks before the defense, the student submits the initial thesis to the committee and the Graduate School. Detailed thesis formatting guidelines can be found at https://gradschool.duke.edu/academics/theses-and-dissertations.

Advisor Notification: The student requests that the advisor email [email protected] , confirming the candidate's readiness for defense. This step should be completed before the exam card appointment.

Format Check Appointment: One week before the defense, the Graduate School contacts the student to schedule a format check appointment. Upon approval, the Graduate School provides the Student Master’s Exam Card, which enables the student to send a revised thesis copy to committee members.

MSS Annual Report Form: The department provides the student with the MSS Annual Report Form to be presented at the defense.

Post Defense:

Communication of Defense Outcome: The committee chair conveys the defense results to the student, including any necessary follow-up actions in case of an unsuccessful defense.

In Case of Failure: If a student does not pass the thesis defense, the committee's decision to fail the student must be accompanied by explicit and clear comments from the chair, specifying deficiencies and areas that require attention for improvement.

Documentation: The student should ensure that the committee signs the Title Page, Abstract Page, and Exam Card.

Annual Report Form: The committee chair completes the Annual Report Form.

Master's Director Approval: The Master's director must provide their approval by signing the Exam Card.

Form Submission: Lastly, the committee chair is responsible for returning all completed and signed forms to the Department.

Final Thesis Submission: The student must meet the Graduate School requirement by submitting the final version of their Thesis to the Graduate School via ProQuest before the specified deadline. For detailed information, visit https://gradschool.duke.edu/academics/preparinggraduate .

  • The Stochastic Proximal Distance Algorithm
  • Logistic-tree Normal Mixture for Clustering Microbiome Compositions
  • Inference for Dynamic Treatment Regimes using Overlapping Sampling Splitting
  • Bayesian Modeling for Identifying Selection in B Cell Maturation
  • Differentially Private Verification with Survey Weights
  • Stable Variable Selection for Sparse Linear Regression in a Non-uniqueness Regime  
  • A Cost-Sensitive, Semi-Supervised, and Active Learning Approach for Priority Outlier Investigation
  • Bayesian Decoupling: A Decision Theory-Based Approach to Bayesian Variable Selection
  • A Differentially Private Bayesian Approach to Replication Analysis
  • Numerical Approximation of Gaussian-Smoothed Optimal Transport
  • Computational Challenges to Bayesian Density Discontinuity Regression
  • Hierarchical Signal Propagation for Household Level Sales in Bayesian Dynamic Models
  • Logistic Tree Gaussian Processes (LoTgGaP) for Microbiome Dynamics and Treatment Effects
  • Bayesian Inference on Ratios Subject to Differentially Private Noise
  • Multiple Imputation Inferences for Count Data
  • An Euler Characteristic Curve Based Representation of 3D Shapes in Statistical Analysis
  • An Investigation Into the Bias & Variance of Almost Matching Exactly Methods
  • Comparison of Bayesian Inference Methods for Probit Network Models
  • Differentially Private Counts with Additive Constraints
  • Multi-Scale Graph Principal Component Analysis for Connectomics
  • MCMC Sampling Geospatial Partitions for Linear Models
  • Bayesian Dynamic Network Modeling with Censored Flow Data  
  • An Application of Graph Diffusion for Gesture Classification
  • Easy and Efficient Bayesian Infinite Factor Analysis
  • Analyzing Amazon CD Reviews with Bayesian Monitoring and Machine Learning Methods
  • Missing Data Imputation for Voter Turnout Using Auxiliary Margins
  • Generalized and Scalable Optimal Sparse Decision Trees
  • Construction of Objective Bayesian Prior from Bertrand’s Paradox and the Principle of Indifference
  • Rethinking Non-Linear Instrumental Variables
  • Clustering-Enhanced Stochastic Gradient MCMC for Hidden Markov Models
  • Optimal Sparse Decision Trees
  • Bayesian Density Regression with a Jump Discontinuity at a Given Threshold
  • Forecasting the Term Structure of Interest Rates: A Bayesian Dynamic Graphical Modeling Approach
  • Testing Between Different Types of Poisson Mixtures with Applications to Neuroscience
  • Multiple Imputation of Missing Covariates in Randomized Controlled Trials
  • A Bayesian Strategy to the 20 Question Game with Applications to Recommender Systems
  • Applied Factor Dynamic Analysis for Macroeconomic Forecasting
  • A Theory of Statistical Inference for Ensuring the Robustness of Scientific Results
  • Bayesian Inference Via Partitioning Under Differential Privacy
  • A Bayesian Forward Simulation Approach to Establishing a Realistic Prior Model for Complex Geometrical Objects
  • Two Applications of Summary Statistics: Integrating Information Across Genes and Confidence Intervals with Missing Data
  • Our Mission
  • Diversity, Equity, and Inclusion
  • International Recognition
  • Department History
  • Past Recipients
  • Considering a Statistical Science major at Duke?
  • Careers for Statisticians
  • Typical Pathways
  • Applied Electives for BS
  • Interdepartmental Majors
  • Minor in Statistical Science
  • Getting Started with Statistics
  • Student Learning Outcomes
  • Study Abroad
  • Course Help & Tutoring
  • Past Theses
  • Research Teams
  • Independent Study
  • Transfer Credit
  • Conference Funding for Research
  • Statistical Science Majors Union
  • Duke Actuarial Society
  • Duke Sports Analytics Club
  • Trinity Ambassadors
  • Frequently Asked Questions
  • Summer Session Courses
  • How to Apply
  • Financial Support
  • Graduate Placements
  • Living in Durham
  • Preliminary Examination
  • Dissertation
  • English Language Requirement
  • TA Guidelines
  • Progress Toward Completion
  • Ph.D. Committees
  • Terminal MS Degree
  • Student Governance
  • Program Requirements
  • PhD / Research
  • Data Science & Analytics
  • Health Data Science
  • Finance & Economics
  • Marketing Research & Business Analytics
  • Social Science & Policy
  • Admission Statistics
  • Portfolio of Work
  • Capstone Project
  • Statistical Science Proseminar
  • Primary Faculty
  • Secondary Faculty
  • Visiting Faculty
  • Postdoctoral Fellows
  • Ph.D. Students
  • M.S. Students
  • Theory, Methods, and Computation
  • Interdisciplinary Collaborations
  • Statistical Consulting Center
  • Alumni Profiles
  • For Current Students
  • Assisting Duke Students
  • StatSci Alumni Network
  • Ph.D. Student - Alumni Fund
  • Our Ph.D. Alums
  • Our M.S. Alums
  • Our Undergrad Alums
  • Our Postdoc Alums

Purdue Online Writing Lab Purdue OWL® College of Liberal Arts

Welcome to the Purdue Online Writing Lab

OWL logo

Welcome to the Purdue OWL

This page is brought to you by the OWL at Purdue University. When printing this page, you must include the entire legal notice.

Copyright ©1995-2018 by The Writing Lab & The OWL at Purdue and Purdue University. All rights reserved. This material may not be published, reproduced, broadcast, rewritten, or redistributed without permission. Use of this site constitutes acceptance of our terms and conditions of fair use.

The Online Writing Lab at Purdue University houses writing resources and instructional material, and we provide these as a free service of the Writing Lab at Purdue. Students, members of the community, and users worldwide will find information to assist with many writing projects. Teachers and trainers may use this material for in-class and out-of-class instruction.

The Purdue On-Campus Writing Lab and Purdue Online Writing Lab assist clients in their development as writers—no matter what their skill level—with on-campus consultations, online participation, and community engagement. The Purdue Writing Lab serves the Purdue, West Lafayette, campus and coordinates with local literacy initiatives. The Purdue OWL offers global support through online reference materials and services.

A Message From the Assistant Director of Content Development 

The Purdue OWL® is committed to supporting  students, instructors, and writers by offering a wide range of resources that are developed and revised with them in mind. To do this, the OWL team is always exploring possibilties for a better design, allowing accessibility and user experience to guide our process. As the OWL undergoes some changes, we welcome your feedback and suggestions by email at any time.

Please don't hesitate to contact us via our contact page  if you have any questions or comments.

All the best,

Social Media

Facebook twitter.

DigitalCommons@University of Nebraska - Lincoln

Home > Statistics > Dissertations, Theses, and Student Work

Statistics, Department of

Department of statistics: dissertations, theses, and student work.

Examining the Effect of Word Embeddings and Preprocessing Methods on Fake News Detection , Jessica Hauschild

Exploring Experimental Design and Multivariate Analysis Techniques for Evaluating Community Structure of Bacteria in Microbiome Data , Kelsey Karnik

Human Perception of Exponentially Increasing Data Displayed on a Log Scale Evaluated Through Experimental Graphics Tasks , Emily Robinson

Factors Influencing Student Outcomes in a Large, Online Simulation-Based Introductory Statistics Course , Ella M. Burnham

Comparing Machine Learning Techniques with State-of-the-Art Parametric Prediction Models for Predicting Soybean Traits , Susweta Ray

Using Stability to Select a Shrinkage Method , Dean Dustin

Statistical Methodology to Establish a Benchmark for Evaluating Antimicrobial Resistance Genes through Real Time PCR assay , Enakshy Dutta

Group Testing Identification: Objective Functions, Implementation, and Multiplex Assays , Brianna D. Hitt

Community Impact on the Home Advantage within NCAA Men's Basketball , Erin O'Donnell

Optimal Design for a Causal Structure , Zaher Kmail

Role of Misclassification Estimates in Estimating Disease Prevalence and a Non-Linear Approach to Study Synchrony Using Heart Rate Variability in Chickens , Dola Pathak

A Characterization of a Value Added Model and a New Multi-Stage Model For Estimating Teacher Effects Within Small School Systems , Julie M. Garai

Methods to Account for Breed Composition in a Bayesian GWAS Method which Utilizes Haplotype Clusters , Danielle F. Wilson-Wells

Beta-Binomial Kriging: A New Approach to Modeling Spatially Correlated Proportions , Aimee Schwab

Simulations of a New Response-Adaptive Biased Coin Design , Aleksandra Stein

MODELING THE DYNAMIC PROCESSES OF CHALLENGE AND RECOVERY (STRESS AND STRAIN) OVER TIME , Fan Yang

A New Approach to Modeling Multivariate Time Series on Multiple Temporal Scales , Tucker Zeleny

A Reduced Bias Method of Estimating Variance Components in Generalized Linear Mixed Models , Elizabeth A. Claassen

NEW STATISTICAL METHODS FOR ANALYSIS OF HISTORICAL DATA FROM WILDLIFE POPULATIONS , Trevor Hefley

Informative Retesting for Hierarchical Group Testing , Michael S. Black

A Test for Detecting Changes in Closed Networks Based on the Number of Communications Between Nodes , Christopher S. Wichman

GROUP TESTING REGRESSION MODELS , Boan Zhang

A Comparison of Spatial Prediction Techniques Using Both Hard and Soft Data , Megan L. Liedtke Tesar

STUDYING THE HANDLING OF HEAT STRESSED CATTLE USING THE ADDITIVE BI-LOGISTIC MODEL TO FIT BODY TEMPERATURE , Fan Yang

Estimating Teacher Effects Using Value-Added Models , Jennifer L. Green

SEQUENCE COMPARISON AND STOCHASTIC MODEL BASED ON MULTI-ORDER MARKOV MODELS , Xiang Fang

DETECTING DIFFERENTIALLY EXPRESSED GENES WHILE CONTROLLING THE FALSE DISCOVERY RATE FOR MICROARRAY DATA , SHUO JIAO

Spatial Clustering Using the Likelihood Function , April Kerby

FULLY EXPONENTIAL LAPLACE APPROXIMATION EM ALGORITHM FOR NONLINEAR MIXED EFFECTS MODELS , Meijian Zhou

Advanced Search

Search Help

  • Notify me via email or RSS
  • Administrator Resources
  • How to Cite Items From This Repository
  • Copyright Information
  • Collections
  • Disciplines

Author Corner

  • Guide to Submitting
  • Submit your paper or article
  • Statistics Website

Home | About | FAQ | My Account | Accessibility Statement

Privacy Copyright

  • DSpace@MIT Home
  • MIT Libraries

This collection of MIT Theses in DSpace contains selected theses and dissertations from all MIT departments. Please note that this is NOT a complete collection of MIT theses. To search all MIT theses, use MIT Libraries' catalog .

MIT's DSpace contains more than 58,000 theses completed at MIT dating as far back as the mid 1800's. Theses in this collection have been scanned by the MIT Libraries or submitted in electronic format by thesis authors. Since 2004 all new Masters and Ph.D. theses are scanned and added to this collection after degrees are awarded.

MIT Theses are openly available to all readers. Please share how this access affects or benefits you. Your story matters.

If you have questions about MIT theses in DSpace, [email protected] . See also Access & Availability Questions or About MIT Theses in DSpace .

If you are a recent MIT graduate, your thesis will be added to DSpace within 3-6 months after your graduation date. Please email [email protected] with any questions.

Permissions

MIT Theses may be protected by copyright. Please refer to the MIT Libraries Permissions Policy for permission information. Note that the copyright holder for most MIT theses is identified on the title page of the thesis.

Theses by Department

  • Comparative Media Studies
  • Computation for Design and Optimization
  • Computational and Systems Biology
  • Department of Aeronautics and Astronautics
  • Department of Architecture
  • Department of Biological Engineering
  • Department of Biology
  • Department of Brain and Cognitive Sciences
  • Department of Chemical Engineering
  • Department of Chemistry
  • Department of Civil and Environmental Engineering
  • Department of Earth, Atmospheric, and Planetary Sciences
  • Department of Economics
  • Department of Electrical Engineering and Computer Sciences
  • Department of Humanities
  • Department of Linguistics and Philosophy
  • Department of Materials Science and Engineering
  • Department of Mathematics
  • Department of Mechanical Engineering
  • Department of Nuclear Science and Engineering
  • Department of Ocean Engineering
  • Department of Physics
  • Department of Political Science
  • Department of Urban Studies and Planning
  • Engineering Systems Division
  • Harvard-MIT Program of Health Sciences and Technology
  • Institute for Data, Systems, and Society
  • Media Arts & Sciences
  • Operations Research Center
  • Program in Real Estate Development
  • Program in Writing and Humanistic Studies
  • Science, Technology & Society
  • Science Writing
  • Sloan School of Management
  • Supply Chain Management
  • System Design & Management
  • Technology and Policy Program

Collections in this community

Doctoral theses, graduate theses, undergraduate theses, recent submissions.

Thumbnail

The properties of amorphous and microcrystalline Ni - Nb alloys. 

Thumbnail

Towards Biologically Plausible Deep Neural Networks 

Thumbnail

Randomized Data Structures: New Perspectives and Hidden Surprises 

feed

2024 Theses Doctoral

Statistically Efficient Methods for Computation-Aware Uncertainty Quantification and Rare-Event Optimization

He, Shengyi

The thesis covers two fundamental topics that are important across the disciplines of operations research, statistics and even more broadly, namely stochastic optimization and uncertainty quantification, with the common theme to address both statistical accuracy and computational constraints. Here, statistical accuracy encompasses the precision of estimated solutions in stochastic optimization, as well as the tightness or reliability of confidence intervals. Computational concerns arise from rare events or expensive models, necessitating efficient sampling methods or computation procedures. In the first half of this thesis, we study stochastic optimization that involves rare events, which arises in various contexts including risk-averse decision-making and training of machine learning models. Because of the presence of rare events, crude Monte Carlo methods can be prohibitively inefficient, as it takes a sample size reciprocal to the rare-event probability to obtain valid statistical information about the rare-event. To address this issue, we investigate the use of importance sampling (IS) to reduce the required sample size. IS is commonly used to handle rare events, and the idea is to sample from an alternative distribution that hits the rare event more frequently and adjusts the estimator with a likelihood ratio to retain unbiasedness. While IS has been long studied, most of its literature focuses on estimation problems and methodologies to obtain good IS in these contexts. Contrary to these studies, the first half of this thesis provides a systematic study on the efficient use of IS in stochastic optimization. In Chapter 2, we propose an adaptive procedure that converts an efficient IS for gradient estimation to an efficient IS procedure for stochastic optimization. Then, in Chapter 3, we provide an efficient IS for gradient estimation, which serves as the input for the procedure in Chapter 2. In the second half of this thesis, we study uncertainty quantification in the sense of constructing a confidence interval (CI) for target model quantities or prediction. We are interested in the setting of expensive black-box models, which means that we are confined to using a low number of model runs, and we also lack the ability to obtain auxiliary model information such as gradients. In this case, a classical method is batching, which divides data into a few batches and then constructs a CI based on the batched estimates. Another method is the recently proposed cheap bootstrap that is constructed on a few resamples in a similar manner as batching. These methods could save computation since they do not need an accurate variability estimator which requires sufficient model evaluations to obtain. Instead, they cancel out the variability when constructing pivotal statistics, and thus obtain asymptotically valid t-distribution-based CIs with only few batches or resamples. The second half of this thesis studies several theoretical aspects of these computation-aware CI construction methods. In Chapter 4, we study the statistical optimality on CI tightness among various computation-aware CIs. Then, in Chapter 5, we study the higher-order coverage errors of batching methods. Finally, Chapter 6 is a related investigation on the higher-order coverage and correction of distributionally robust optimization (DRO) as another CI construction tool, which assumes an amount of analytical information on the model but bears similarity to Chapter 5 in terms of analysis techniques.

  • Operations research
  • Stochastic processes--Mathematical models
  • Mathematical optimization
  • Bootstrap (Statistics)
  • Sampling (Statistics)

thumnail for He_columbia_0054D_18524.pdf

More About This Work

  • DOI Copy DOI to clipboard

IMAGES

  1. Statistical Analysis Table for Professional Thesis PPT

    statistics for thesis

  2. All about descriptive statistics for thesis

    statistics for thesis

  3. College essay: Phd thesis statistics

    statistics for thesis

  4. Statistics Thesis Help UK

    statistics for thesis

  5. Example Statistical Treatment Of Data In Thesis

    statistics for thesis

  6. Statistics used in this thesis.

    statistics for thesis

VIDEO

  1. Descriptive Statistics and Inferential Statistics

  2. What is Thesis & Dissertation and Protocol & Synopsis ?

  3. Demographic Analysis in SPSS

  4. Draw Conclusions

  5. Descriptive Statistics

  6. Applied Descriptive Statistics

COMMENTS

  1. The Beginner's Guide to Statistical Analysis

    Table of contents. Step 1: Write your hypotheses and plan your research design. Step 2: Collect data from a sample. Step 3: Summarize your data with descriptive statistics. Step 4: Test hypotheses or make estimates with inferential statistics.

  2. Dissertation Results/Findings Chapter (Quantitative)

    The results chapter (also referred to as the findings or analysis chapter) is one of the most important chapters of your dissertation or thesis because it shows the reader what you've found in terms of the quantitative data you've collected. It presents the data using a clear text narrative, supported by tables, graphs and charts.

  3. PDF Guideline to Writing a Master's Thesis in Statistics

    a master's thesis in statistics. The contents are meant to reflect the System of Qualifications in the Higher Education Ordinance. Recommendations and guidelines regarding the structure and content of a master's thesis are given. Section 2 describes a typical outline for a master's thesis and

  4. A beginner's guide to statistics for PhD research

    Statistics can be invaluable for adding a level of rigour to your analysis, but they can be extremely difficult for non-specialists. ... If you have 1 month left to submit your thesis and you are doing analysis for the first time, it's going to be difficult. So do some analysis early, ...

  5. Reporting Statistics in APA Style

    To report the results of a correlation, include the following: the degrees of freedom in parentheses. the r value (the correlation coefficient) the p value. Example: Reporting correlation results. We found a strong correlation between average temperature and new daily cases of COVID-19, r (357) = .42, p < .001.

  6. Thesis Life: 7 ways to tackle statistics in your thesis

    Since it is an immitigable part of your thesis, you can neither run from statistics nor cry for help. The penultimate part of this process involves analysis of results which is very crucial for coherence of your thesis assignment.This analysis usually involve use of statistical tools to help draw inferences. Most students who don't pursue ...

  7. Writing with Descriptive Statistics

    Usually there is no good way to write a statistic. It rarely sounds good, and often interrupts the structure or flow of your writing. Oftentimes the best way to write descriptive statistics is to be direct. If you are citing several statistics about the same topic, it may be best to include them all in the same paragraph or section.

  8. Dissertation Statistics and Thesis Statistics

    There are all kinds of statistics you could use for your Master's thesis, Master's dissertation, Ph.D. thesis, and Ph.D. dissertation. These days, it is assumed and maybe required that you use multivariate statistics of some kind. The days of simple bivariate correlations and t-tests seem to be gone forever - depending on the area of ...

  9. Descriptive Statistics

    Types of descriptive statistics. There are 3 main types of descriptive statistics: The distribution concerns the frequency of each value.; The central tendency concerns the averages of the values.; The variability or dispersion concerns how spread out the values are.; You can apply these to assess only one variable at a time, in univariate analysis, or to compare two or more, in bivariate and ...

  10. Statistical Methods in Theses: Guidelines and Explanations

    Guidelines and Explanations. In light of the changes in psychology, faculty members who teach statistics/methods have reviewed the literature and generated this guide for graduate students. The guide is intended to enhance the quality of student theses by facilitating their engagement in open and transparent research practices and by helping ...

  11. PDF Study Design and Statistical Analysis

    4 Univariate statistics 52 4.1 How should I describe my data? 52 4.2 How should I describe my interval and ordinal variables? 52 4.3 How should I describe my dichotomous variables? 57 4.4 How should I describe my nominal variables? 59 4.5 How should I describe my ordinal variables? 60 4.6 How should I describe events that occur over time? 60

  12. How to collect data for your thesis

    After choosing a topic for your thesis, you'll need to start gathering data. In this article, we focus on how to effectively collect theoretical and empirical data. Glossary. Empirical data: unique research that may be quantitative, qualitative, or mixed. Theoretical data: secondary, scholarly sources like books and journal articles that ...

  13. Basic statistical tools in research and data analysis

    Statistics is a branch of science that deals with the collection, organisation, analysis of data and drawing of inferences from the samples to the whole population. This requires a proper design of the study, an appropriate selection of the study sample and choice of a suitable statistical test. An adequate knowledge of statistics is necessary ...

  14. Mathematics and Statistics Theses and Dissertations

    Theses/Dissertations from 2016 PDF. A Statistical Analysis of Hurricanes in the Atlantic Basin and Sinkholes in Florida, Joy Marie D'andrea. PDF. Statistical Analysis of a Risk Factor in Finance and Environmental Models for Belize, Sherlene Enriquez-Savery. PDF

  15. Descriptive Statistics

    Measures of Central Tendency and Other Commonly Used Descriptive Statistics. The mean, median, and the mode are all measures of central tendency. They attempt to describe what the typical data point might look like. In essence, they are all different forms of 'the average.'. When writing statistics, you never want to say 'average' because it is ...

  16. The Beginner's Guide to Statistical Analysis

    Have a thesis expert improve your writing ... Example: Descriptive statistics (experiment) After collecting pretest and posttest data from 30 students across the city, you calculate descriptive statistics. Because you have normal distributed data on an interval scale, you tabulate the mean, standard deviation, variance and range. ...

  17. What do senior theses in Statistics look like?

    Senior theses in Statistics cover a wide range of topics, across the spectrum from applied to theoretical. Typically, senior theses are expected to have one of the following three flavors: 1. Novel statistical theory or methodology, supported by extensive mathematical and/or simulation results, along with a clear account of how the research ...

  18. Undergraduate

    The Department awarded the 2024 Department of Statistics Senior Concentrator Prize to Skyler Wu at our Commencement Celebration on May 23rd. Skyler, who is graduating with an A.B. in Statistics and Mathematics (Joint) and S.M. in Applied Mathematics, was selected based on the high quality of his coursework and thesis.

  19. Statistics

    There is always a confidence interval within which the general population is expected to fall. Thus, if I say that the number of UNC students who find it difficult to use statistics in their writing is 60%, plus or minus 4%, that means, assuming the normal confidence interval of 95%, that with 95% certainty we can say that the actual number is ...

  20. Undergraduate theses

    Watch on. DukeSpace Electronic Theses and Dissertation Self-Submission Guide. Need help submitting your thesis? Contact [email protected]. Last Updated: May 22, 2024 11:27 AM. URL: https://guides.library.duke.edu/stats.

  21. Master's Thesis

    Master's Thesis. As an integral component of the Master of Science in Statistical Science program, you can submit and defend a Master's Thesis. Your Master's Committee administers this oral examination. If you choose to defend a thesis, it is advisable to commence your research early, ideally during your second semester or the summer following ...

  22. Welcome to the Purdue Online Writing Lab

    Mission. The Purdue On-Campus Writing Lab and Purdue Online Writing Lab assist clients in their development as writers—no matter what their skill level—with on-campus consultations, online participation, and community engagement. The Purdue Writing Lab serves the Purdue, West Lafayette, campus and coordinates with local literacy initiatives.

  23. Statistics, Department of

    It usually starts with something like "A THESIS Presented to the Faculty …" and ends with "Lincoln, Nebraska [month] [year]." ABSTRACT: Just include the body of the abstract, not the title or your name, but DO add your advisor's name at the end of the abstract after the word Advisor and a colon, like this: Advisor: ….

  24. PhD Theses

    PhD Theses. 2023. Title. Author. Supervisor. Statistical Methods for the Analysis and Prediction of Hierarchical Time Series Data with Applications to Demography. Daphne Liu. Adrian E Raftery. Exponential Family Models for Rich Preference Ranking Data.

  25. MIT Theses

    MIT's DSpace contains more than 58,000 theses completed at MIT dating as far back as the mid 1800's. Theses in this collection have been scanned by the MIT Libraries or submitted in electronic format by thesis authors. Since 2004 all new Masters and Ph.D. theses are scanned and added to this collection after degrees are awarded.

  26. Statistically Efficient Methods for Computation-Aware Uncertainty

    The thesis covers two fundamental topics that are important across the disciplines of operations research, statistics and even more broadly, namely stochastic optimization and uncertainty quantification, with the common theme to address both statistical accuracy and computational constraints. Here, statistical accuracy encompasses the precision of estimated solutions in stochastic optimization ...