Stat 3701 Lecture Notes: Bootstrap

Charles j. geyer, november 23, 2022.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License ( http://creativecommons.org/licenses/by-sa/4.0/ ).

The version of R used to make this document is 4.2.1.

The version of the rmarkdown package used to make this document is 2.17.

The version of the knitr package used to make this document is 1.40.

The version of the bootstrap package used to make this document is 2019.6.

3 Relevant and Irrelevant Simulation

3.1 irrelevant.

Most statisticians think a statistics paper isn’t really a statistics paper or a statistics talk isn’t really a statistics talk if it doesn’t have simulations demonstrating that the methods proposed work great (at least in some toy problems).

IMHO, this is nonsense. Simulations of the kind most statisticians do prove nothing. The toy problems used are often very special and do not stress the methods at all. In fact, they may be (consciously or unconsciously) chosen to make the methods look good.

In scientific experiments, we know how to use randomization, blinding, and other techniques to avoid biasing the results. Analogous things are never AFAIK done with simulations.

When all of the toy problems simulated are very different from the statistical model you intend to use for your data, what could the simulation study possibly tell you that is relevant? Nothing.

Hence, for short, your humble author calls all of these millions of simulation studies statisticians have done irrelevant simulation .

3.2 Relevant

But there is a well-known methodology of relevant simulation , except that it isn’t called that. It is called the bootstrap .

It idea is, for each statistical model and each data set to which it is applied, one should do a simulation study of this model on data of this form.

But there is a problem: the fundamental problem of statistics, that \(\hat{\theta}\) is not \(\theta\) . To be truly relevant we should simulate from the true unknown distribution of the data, but we don’t know what that is. (If we did, we wouldn’t need statistics.)

So as a second best choice we have to simulate from our best estimate of the true unknown distribution, the one corresponding to the parameter value \(\hat{\theta}\) if that is the best estimator we know.

But we know that is the Wrong Thing . So we have to be sophisticated about this. We have to arrange what we do with our simulations to come as close to the Right Thing as possible.

And bootstrap theory and methods are extraordinarily sophisticated with many different methods of coming very close to the Right Thing .

4 R Packages and Textbooks

There are two well known R packages concerned with the bootstrap. They go with two well known textbooks.

R package boot is an R recommended package that is installed by default in every installation of R. As the package description says, it goes with the textbook Davison and Hinkley (1997) .

The CRAN package bootstrap goes with, as its package description says, the textbook Efron and Tibshirani (1993) .

The package description also says that “new projects should preferentially use the recommended package ‘ boot ’”. But I do not agree. The package maintainer is neither of Efron or Tibshirani, and I do not think they would agree. Whatever the politics of the R core team that make the boot package “recommended”, they have nothing to do with the quality of the package or with the quality of the textbook they go with. If you like Efron and Tibshirani (1993), you should be using the R package bootstrap that goes with it.

These authors range from moderately famous (for a statistician) to very, very famous (for a statistician). Efron is the inventor of the term bootstrap in its statistical meaning.

5 The Bootstrap Analogy

5.1 the name of the game.

The term “bootstrap” recalls the English idiom “pull oneself up by one’s bootstraps” .

The literal meaning of “bootstrap” in non-technical language is leather loops at the top of boots used to pull them on. So the literal meaning of “pull oneself up by one’s bootstraps” is to reach down, grab your shoes, and lift yourself off the ground — a physical impossibility. But, idiomatically, it doesn’t mean do the physically impossible; it means something like “succeed by one’s own efforts”, especially when this is difficult.

The technical meaning in statistics plays off this idiom. It means to get a good approximation to the sampling distribution of an estimator without using any theory. (At least not using any theory in the computation. A great deal of very technical theory may be used in justifying the bootstrap in certain situations.)

5.2 Introduction

The discussion in this section (all of Section 5) is stolen from Efron and Tibshirani (1993, Figure 8.1 and the surrounding text).

To understand the bootstrap you have to understand a simple analogy. Otherwise it is quite mysterious. I recall being mystified about it when I was a graduate student. I hope the students I teach are much less mystified because of this analogy. This appears to the untutored to be impossible or magical. But it isn’t really. It is sound statistical methodology.

5.3 The Nonparametric Bootstrap

The nonparametric bootstrap (or, to be more precise, Efron’s original nonparametric bootstrap, because others have been proposed in the literature, although no other is widely used AFAIK) is based on a nonparametric estimate of the true unknown distribution of the data.

This nonparametric estimate is just the sample itself, thought of as a finite population to sample from. Let \(P\) denote the true unknown probability distribution that we assume the data are an IID sample from, and let \(\widehat{P}_n\) denote probability model that samples IID from the original sample thought of as a finite population to sample from.

As we said above, this is the Wrong Thing with a capital W and a capital T. The sample is not the population. But it will be close for large sample sizes. Thus all justification for the nonparametric bootstrap is asymptotic. It only works for large sample sizes. We emphasize this because many naive users have picked up the opposite impression somewhere. The notion that the bootstrap (any kind of bootstrap) is an exact statistical method seems to be floating around in the memeosphere and impossible to stamp out.

The bootstrap makes an analogy between the real world and a mythical bootstrap world.

The explanation.

In the real world we have the true unknown distribution of the data \(P\) . In the bootstrap world we have the “true” pretend unknown distribution of the data \(\widehat{P}_n\) . Actually the distribution \(\widehat{P}_n\) is known, and that’s a good thing, because it allows us to simulate data from it. But we pretend it is unknown when we are reasoning in the bootstrap world. It is the analog in the bootstrap world of the true unknown distribution \(P\) in the real world.

In the real world we have the true unknown parameter \(\theta\) . It is the aspect of \(P\) that we want to estimate. In the bootstrap world we have the “true” pretend unknown parameter \(\hat{\theta}_n\) . Actually the parameter \(\hat{\theta}_n\) is known, and that’s a good thing, because it allows to see how close estimators come to it. But we pretend it is unknown when we are reasoning in the bootstrap world. It is the analog in the bootstrap world of the true unknown parameter \(\theta\) in the real world.

\(\hat{\theta}_n\) is the same function of \(\widehat{P}_n\) as \(\theta\) is of \(P\) .

If \(\theta\) is the population mean, then \(\hat{\theta}_n\) is the sample mean.

If \(\theta\) is the population median, then \(\hat{\theta}_n\) is the sample median.

and so forth.

In the real world we have data \(X_1\) , \(\ldots,\) \(X_n\) that are assumed IID from \(P\) , whatever it is. In the bootstrap world we simulate data \(X_1^*\) , \(\ldots,\) \(X_n^*\) that are IID from \(\widehat{P}_n\) .

The way we simulate IID \(\widehat{P}_n\) is to take samples from the original data considered as a finite population to sample. These are samples with replacement because that is what IID requires.

Sometimes the nonparametric bootstrap is called “resampling” because it samples from the sample, called resampling for short. But this terminology misdirects the naive. What is important is that we have the correct analogy on the “data” line of the table.

We have some estimator of \(\theta\) , which must be a statistic , that is some function of the data that does not depend on the unknown parameter. In order to have the correct analogy in the bootstrap world, our estimate there must be the same function of the bootstrap data .

Many procedures require some estimate of standard error of \(\hat{\theta}_n\) . Call that \(\hat{s}_n\) . It too must be a statistic , that is some function of the data that does not depend on the unknown parameter. In order to have the correct analogy in the bootstrap world, our estimate there must be the same function of the bootstrap data .

Many procedures use so-called pivotal quantities , either exact or approximate.

An exact pivotal quantity is a function of the data and the parameter of interest whose distribution does not depend on any parameters . The prototypical example is the \(t\) statistic \[ \frac{\overline{X}_n - \mu}{s_n / \sqrt{n}} \] which has, when the data are assumed to be exactly normal, an exact \(t\) distribution on \(n - 1\) degrees of freedom (which does not depend on the unknown parameters \(\mu\) and \(\sigma\) of the distribution of the data). Note that the pivotal quantity is a function of \(\mu\) but the sampling distribution of the pivotal quantity does not depend on \(\mu\) or \(\sigma\) : the \(t\) distribution with \(n - 1\) degrees of freedom does not does not have any unknown parameters.

An asymptotic pivotal quantity is a function of the data and the parameter of interest whose asymptotic distribution does not depend on any parameters . The prototypical example is the \(z\) statistic \[ \frac{\overline{X}_n - \mu}{s_n / \sqrt{n}} \] (actually the same function of data and parameters as the \(t\) statistic discussed above), which has, when the data are assumed to have any distribution with finite variance, an asymptotic standard normal distribution (which does not depend on the unknown the distribution of the data). Note that the pivotal quantity is a function of \(\mu\) but the sampling distribution of the pivotal quantity does not depend on the unknown distribution of the data : the standard normal distribution does not does not have any unknown parameters.

An approximate pivotal quantity is a function of the data and the parameter of interest whose sampling distribution does not depend on the unknown distribution of the data, at least not very much. Often such quantities are made by standardizing in a manner similar to those discussed above: by standardization. Any time we have some purported standard errors of estimators, we can use them to make approximate pivotal quantities. \[ \frac{\hat{\theta}_n - \theta}{\hat{s}_n} \] as in the bottom left cell of the table above.

The importance of pivotal quantities in (frequentist) statistics cannot be overemphasized. They are what allow valid exact or approximate inference. When we invert the pivotal quantity to make confidence intervals, for example, \[ \hat{\theta}_n \pm 1.96 \cdot \hat{s}_n \] this is (exactly or approximately) valid because the sampling distribution does not depend on the true unknown distribution of the data, at least not much . If it did depend strongly on the true distribution of the data, then our coverage could be way off, because our estimated sampling distribution of the pivotal quantity might be far from its correct sampling distribution.

As we shall see, even when we have no \(\hat{s}_n\) available, the bootstrap can find one for us.

5.3.1 Cautions

5.3.1.1 use the correct analogies.

In the bottom right cell of the table above there is a strong tendency for naive users to replace \(\hat{\theta}_n\) with \(\theta\) . But this is clearly incorrect. What plays the role of true unknown parameter value in the bootstrap world is \(\hat{\theta}_n\) not \(\theta\) .

5.3.1.2 Hypothesis Tests are Problematic

Any hypothesis test calculates critical values or \(P\) -values using the distribution under the null hypothesis . But the bootstrap does not sample that unless the null hypothesis happens to be correct. Usually, we want to reject the null hypothesis, meaning we hope it is not correct . And in any case, we would not be doing a hypothesis test unless we did not know whether the null hypothesis is correct.

Thus the obvious naive way to calculate a bootstrap \(P\) -value, which has been re-invented time and time again by naive users, is completely bogus. It says, if \(w(X_1, \ldots, X_n)\) is the test statistic of the test, then the naive bootstrap \(P\) -value is the fraction of simulations of bootstrap data in which \(w(X_1^*, \ldots, X_n^*) \ge w(X_1, \ldots, X_n)\) . This test typically has no power. It rejects at level \(\alpha\) with probability \(\alpha\) no matter how far the true unknown distribution of the data is from the null hypothesis. This is because the bootstrap samples (approximately, for large \(n\) ) from the true unknown distribution, not from the null hypothesis.

Of course, there are non-bogus ways of doing bootstrap tests, but one has to be a bit less naive. For example, any valid bootstrap confidence interval also gives a valid bootstrap test. The test rejects \(H_0 : \theta = \theta_0\) (two-tailed) at level \(\alpha\) if and only if a valid confidence interval with coverage probability \(1 - \alpha\) does not cover \(\theta_0\) .

We won’t say any more about bootstrap hypothesis tests. The textbooks cited above each have a chapter on the subject.

5.3.1.3 Regression is Problematic

If we consider our data to be IID pairs \((X_i, Y_i)\) , then the naive bootstrap procedure is to resample pairs \((X_i^*, Y_i^*)\) where each \((X_i^*, Y_i^*) = (X_j, Y_j)\) for some \(j\) . But this mimics the joint distribution of \(X\) and \(Y\) and regression is about the conditional distribution of \(Y\) given \(X\) . So again the naive bootstrap samples the wrong distribution.

A solution to this problem is to resample residuals rather than data. Suppose we are assuming a parametric model for the regression function but are being nonparametric about the error distribution, as in Section 3.4.1 of the course notes about models, Part I . Just for concreteness, assume the regression function is simple \(\alpha + \beta x\) . Then the relation between the bootstrap world and the real world changes as follows.

The table is not quite as neat as before because there is no good way to say that \(\hat{\alpha}_n\) and \(\hat{\beta}_n\) are the same function of the regression data, thought of as a finite population to sample, as \(\alpha\) and \(\beta\) are of the population, and similarly that \(\alpha^*_n\) and \(\beta^*_n\) are the same function of the bootstrap data as \(\hat{\alpha}_n\) and \(\hat{\beta}_n\) are of the original data.

The textbooks cited above each have a chapter on this subject.

Bootstrapping residuals is usually not fully nonparametric because the estimate of the residuals depends on some parametric part of the model (either the mean function is parametric or the error distribution, or both).

5.4 The Parametric Bootstrap

The parametric bootstrap was also invented by Efron.

Now we have a parametric model. Let \(P_\theta\) denote the true unknown probability distribution that we assume the data are an IID sample from,

We won’t be so detailed in our explanation as above. The main point is that everything is the same except as with the nonparametric bootstrap except that we are using parametric estimates of distributions rather than nonparametric.

The same caution about being careful about the analogy applies as with the nonparametric bootstrap. But the other cautions do not apply. Neither hypothesis tests nor regression are problematic with the parametric bootstrap. One simply samples from the correct parametric distribution. For hypothesis tests, one estimates the parameters under the null hypothesis and then simulates that distribution. For regression, one estimates the parameters and then simulates new response data from the estimated conditional distribution of the response given the predictors.

6.1 Nonparametric Bootstrap

We will use the following highly skewed data.

Suppose we wish to estimate the population mean using the sample mean as its estimator. We have the asymptotically valid confidence interval \[ \bar{x}_n \pm \text{critical value} \cdot \frac{s_n}{\sqrt{n}} \] where \(s_n\) is the sample standard deviation. We also have the rule of thumb widely promulgated by intro statistics books that this interval is valid when \(n \ge 30\) . That is, according to intro statistics books, \(30 = \infty\) . These data show how dumb that rule of thumb is.

6.1.2 Bootstrap

So let us bootstrap these data. There is an R function boot in the R recommended package of the same name that does bootstrap samples, but we find it so complicated as to be not worth using. We will just use a loop.

As the histogram shows, the sampling distribution of our estimator is also skewed (the vertical line shows \(\hat{\mu}_n\) ).

We want to use the method of pivotal quantities here using the sample standard deviation as the standardizer.

We can see that the distribution of z.star which is supposed to be standard normal (it would be standard normal when \(n = \infty\) ) is actually for these data far from standard normal.

6.1.3 Bootstrap Confidence Interval

But since we have the bootstrap estimate of the actual sampling distribution we can use that to determine critical values.

I chose nboot to be 999 (a round number minus one) in order for the following trick to work. Observe that \(n\) values divide the number line into \(n + 1\) parts. It can be shown by statistical theory that each part has the same sampling distribution of when stated in terms of fraction of the population distribution covered. Thus sound estimates of the quantiles of the distribution are z[k] estimates the \(k / (n + 1)\) quantile. So we want to arrange the bootstrap sample size so that (nboot + 1) * alpha is an integer, where alpha is the probability for the critical value we want.

The last command (the result of which we don’t bother to save) shows that we are doing (arguably) the right thing. And we don’t have to decide among the 9 different “types” of quantile estimator that the R function quantile offers. The recipe used here is unarguably correct so long as (nboot + 1) * alpha is an integer.

Note that our critical values are very different from

which asymptotic (large sample) theory would have us use.

Our confidence interval is now \[ c_1 < \frac{\bar{x}_n - \mu}{s_n / \sqrt{n}} < c_2 \] where \(c_1\) and \(c_2\) are the critical values. We “solve” these inequalities for \(\mu\) as follows. \[\begin{gather*} c_1 \cdot \frac{s_n}{\sqrt{n}} < \bar{x}_n - \mu < c_2 \cdot \frac{s_n}{\sqrt{n}} \\ c_1 \cdot \frac{s_n}{\sqrt{n}} - \bar{x}_n < - \mu < c_2 \cdot \frac{s_n}{\sqrt{n}} - \bar{x}_n \\ \bar{x}_n - c_2 \cdot \frac{s_n}{\sqrt{n}} < \mu < \bar{x}_n - c_1 \cdot \frac{s_n}{\sqrt{n}} \end{gather*}\] (in going from the second line to the third, multiplying an inequality through by \(- 1\) reverses the inequality).

Now we use the last line of the nonparametric bootstrap analogy table. We suppose that the critical values are the same for both distributions on the bottom line (in the real world and in the bootstrap world).

Thus the bootstrap 95% confidence interval is

which is very different from

6.1.4 Using boott

There is an R function boott in the CRAN package bootstrap that does this whole calculation for us.

where the weird signature of the sdfun

is required by boott as help(boott) explains. Even though we have no use for the arguments nbootsd and theta , we have to have them in the function arguments list because the function is going to be passed them by boott whether we need them or not.

And what if you cannot think up a useful standardizing function? Then boott can find one for you using the bootstrap to the standard deviation of the sampling distribution of the estimator. So there is another bootstrap inside the main bootstrap. We call this a double bootstrap.

Pretty cool.

6.1.5 Bootstrapping the Bootstrap

So how much better is the bootstrap confidence interval than the asymptotic confidence interval? We should do a simulation study to find out. But we don’t have any idea what the population distribution is, and anyway, as argued in Section 3 above , simulations are irrelevant unless they are instances of the bootstrap. So we should check using the bootstrap. In order to not have our code too messy, we will use boott .

This is the first thing that actually takes more than a second of computing time, but it is still not very long.

The bootstrap is apparently quite a bit better, but we can’t really say that until we look at MCSE. For this kind of problem where we are looking at a dichotomous result (hit or miss), we know from intro stats how to calculate standard errors. This is the same as the problem of estimating a population proportion. The standard error is \[ \sqrt{ \frac{\hat{p}_n (1 - \hat{p}_n) }{n} } \]

We say “bootstrap estimates” and “bootstrap standard errors” here rather than “Monte Carlo estimates” and “Monte Carlo standard errors” or “simulation estimates” and “simulation standard errors” because, of course, we are not doing the Right Thing (with a capital R and a capital T) which is simulating from the true unknown population distribution because, of course, we don’t know what that is.

6.1.6 A Plethora of Bootstrap Confidence Intervals

The recipe for bootstrap confidence intervals illustrated here is a good one but far from the only good one. There are, in fact, a plethora of bootstrap confidence intervals covered in the textbooks cited above and even more in statistical journals.

Some of these are covered in the course materials for your humble author’s version of STAT 5601 . So a lot more could be said about the bootstrap. But we won’t.

6.1.7 The Moral of the Story

The bootstrap can do even better than theory. Theory needs \(n\) to be large enough for theory to work. The bootstrap needs \(n\) to be large enough for the bootstrap to work. The \(n\) for the latter can be smaller than the \(n\) for the former.

This is well understood theoretically. Good bootstrap confidence intervals like the so-called bootstrap \(t\) intervals illustrated above, have the property called higher-order accuracy or second-order correctness . Asymptotic theory says that the coverage error of the asymptotic interval will be of order \(n^{- 1 / 2}\) . Like everything else in asymptotics it too obeys the square root law. The actual coverage probability of the interval will differ from the nominal coverage probability by an error term that has approximate size \(c / \sqrt{n}\) for some constant \(c\) (which we usually do not know, as it depends on the true unknown population distribution). For a second-order correct bootstrap interval the error will have approximate size \(c / n\) for some different (and unknown) constant \(c\) . The point is that \(1 / n\) is a lot smaller than \(1 / \sqrt{n}\) .

We expect second-order correct bootstrap intervals to do better than asymptotics.

And we don’t need to do any theory ourselves! The computer does it for us!

6.2 Parametric Bootstrap

We are going to use the same data to illustrate the parametric bootstrap.

But we now need a parametric model for these data. It was simulated from a gamma distribution, so we will use that.

6.2.1 The Gamma Distribution

The gamma distribution is a continuous distribution of a strictly positive random variable having PDF \[ f_{\alpha, \lambda}(x) = \frac{1}{\beta^\alpha \Gamma(\alpha)} x^{\alpha - 1} e^{- x / \beta}, \qquad 0 < x < \infty, \] where \(\alpha\) and \(\beta\) are unknown parameters that are strictly positive.

It has a lot of appearances in theoretical statistics. The chi-square distribution is a special case. So is the exponential distribution, which is a model for failure times of random thingummies that do not get worse as they age. Also random variables having the \(F\) distribution can be written as a function of independent gamma random variables. In Bayesian statistics, it is the conjugate prior for several well-known families of distributions. But here we are just using it as a statistical model for data.

The function \(\Gamma\) is called the gamma function . It gives the probability distribution its name. If you haven’t heard of it, don’t worry about it. Just think of \(\Gamma(\alpha)\) as a term that has to be what it is to make the PDF integrate to one.

The parameter \(\alpha\) is called the shape parameter because different \(\alpha\) correspond to distributions of different shape. In fact, radically different.

For \(\alpha < 1\) the PDF goes to infinity as \(x \to 0\) .

For \(\alpha > 1\) the PDF goes to zero as \(x \to 0\) .

For \(\alpha = 1\) the PDF goes to \(\lambda\) as \(x \to 0\) .

The parameter \(\beta\) is called the scale parameter because it is one. If \(X\) has the gamma distribution with shape parameter \(\alpha\) and scale parameter one, then \(\beta X\) has the gamma distribution with shape parameter \(\alpha\) and scale parameter \(\beta\) . So changing \(\beta\) does not change the shape of the distribution. We could use the same plot for all \(\beta\) if we don’t put numbers on the axes.

The mean and variance are \[\begin{align*} E(X) & = \alpha \beta \\ \mathop{\rm var}(X) & = \alpha \beta^2 \end{align*}\]

6.2.2 Method of Moments Estimators

Solving the last two equations for the parameters gives \[\begin{align*} \alpha & = \frac{E(X)^2}{\mathop{\rm var}(X)} \\ \beta & = \frac{\mathop{\rm var}(X)}{E(X)} \end{align*}\]

This suggests the corresponding sample quantities as reasonable parameter estimates.

These are called method of moments estimators because expectations of polynomial functions of data are called moments (mean and variance are special cases).

6.2.3 Maximum Likelihood

Since we got warnings, redo.

We commented out the checks that the parameter values are strictly positive because optim often goes outside the parameter space but then gets back on track, as it does here. We seem to have gotten the correct answer despite the warnings.

So now we are ready to bootstrap. Let us suppose we want a confidence interval for \(\alpha\) .

6.2.4 Bootstrap

The parametric bootstrap simulates from the MLE distribution.

We now follow the same “bootstrap \(t\) ” idea with the parametric bootstrap that we did for the nonparametric.

This tells the asymptotics is working pretty well at \(n = 30\) . Perhaps the bootstrap is unnecessary. (But we didn’t know that without using the bootstrap to show it.)

Not a lot of difference in the critical values from the standard normal ones.

Since we forgot about the Hessian when estimating the parameters for the real data, we have to get it now.

6.2.5 The Moral of the Story

The moral of the story here is different from the nonparametric story above. Here we didn’t need the bootstrap, and the confidence interval it made wasn’t any better than the interval derived from the usual asymptotics of maximum likelihood.

But we didn’t know that would happen until we did it. If anyone ever asks you “How do you know the sample size is large enough to use asymptotic theory?”, this is the answer.

If the asymptotics agrees with the bootstrap, then both are correct. If the asymptotics does not agree with the bootstrap, use the bootstrap.

hypothesis testing with bootstrap

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

11.2.1 - bootstrapping methods.

Point estimates are helpful to estimate unknown parameters but in order to make inference about an unknown parameter, we need interval estimates. Confidence intervals are based on information from the sampling distribution, including the standard error.

What if the underlying distribution is unknown? What if we are interested in a population parameter that is not the mean, such as the median? How can we construct a confidence interval for the population median?

If we have sample data, then we can use bootstrapping methods to construct a bootstrap sampling distribution to construct a confidence interval.

Bootstrapping is a topic that has been studied extensively for many different population parameters and many different situations. There are parametric bootstrap, nonparametric bootstraps, weighted bootstraps, etc. We merely introduce the very basics of the bootstrap method. To introduce all of the topics would be an entire class in itself.

Let’s show how to create a bootstrap sample for the median. Let the sample median be denoted as \(M\).

  • Replace the population with the sample
  • Sample with replacement \(B\) times. \(B\) should be large, say 1000.
  • Compute sample medians each time, \(M_i\)
  • Obtain the approximate distribution of the sample median.

If we have the approximate distribution, we can find an estimate of the standard error of the sample median by finding the standard deviation of \(M_1,...,M_B\).

Sampling with replacement is important. If we did not sample with replacement, we would always get the same sample median as the observed value. The sample we get from sampling from the data with replacement is called the bootstrap sample .

Once we find the bootstrap sample, we can create a confidence interval. For a 90% confidence interval, for example, we would find the 5th percentile and the 95th percentile of the bootstrap sample.

You can create a bootstrap sample to find the approximate sampling distribution of any statistic, not just the median. The steps would be the same except you would calculate the appropriate statistic instead of the median.

Video: Bootstrapping

  Sampling R Code from the Bootstrapping Video

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

2.9: Confidence intervals and bootstrapping

  • Last updated
  • Save as PDF
  • Page ID 33222

  • Mark Greenwood
  • Montana State University

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

Up to this point the focus has been on hypotheses, p-values, and estimates of the size of differences. But so far this has not explored inference techniques for the size of the difference. Confidence intervals provide an interval where we are __% confident that the true parameter lies. The idea of “confidence” is that if we repeated randomly sampling from the same population and made a similar confidence interval, the collection of all these confidence intervals would contain the true parameter at the specified confidence level (usually 95%). We only get to make one interval and so it either has the true parameter in it or not, and we don’t know the truth in real situations.

Confidence intervals can be constructed with parametric and a nonparametric approaches. The nonparametric approach will be using what is called bootstrapping and draws its name from “pull yourself up by your bootstraps” where you improve your situation based on your own efforts. In statistics, we make our situation or inferences better by re-using the observations we have by assuming that the sample represents the population. Since each observation represents other similar observations in the population that we didn’t get to measure, if we sample with replacement to generate a new data set of size n from our data set (also of size n ) it mimics the process of taking repeated random samples of size \(n\) from our population of interest. This process also ends up giving us useful sampling distributions of statistics even when our standard normality assumption is violated, similar to what we encountered in the permutation tests. Bootstrapping is especially useful in situations where we are interested in statistics other than the mean (say we want a confidence interval for a median or a standard deviation) or when we consider functions of more than one parameter and don’t want to derive the distribution of the statistic (say the difference in two medians). Here, bootstrapping is used to provide more trustworthy inferences when some of our assumptions (especially normality) might be violated for our parametric confidence interval procedure.

To perform bootstrapping, the resample function from the mosaic package will be used. We can apply this function to a data set and get a new version of the data set by sampling new observations with replacement from the original one 52 . The new, bootstrapped version of the data set (called dsample_BTS below) contains a new variable called orig.id which is the number of the subject from the original data set. By summarizing how often each of these id’s occurred in a bootstrapped data set, we can see how the re-sampling works. The table function will count up how many times each observation was used in the bootstrap sample, providing a row with the id followed by a row with the count 53 . In the first bootstrap sample shown, the 1 st , 14 th , and 26 th observations were sampled twice, the 9 th and 28 th observations were sampled four times, and the 4 th , 5 th , 6 th , and many others were not sampled at all. Bootstrap sampling thus picks some observations multiple times and to do that it has to ignore some 54 observations.

Like in permutations, one randomization isn’t enough. A second bootstrap sample is also provided to help you get a sense of what bootstrap data sets contain. It did not select observations two through five but did select eight others more than once. You can see other variations in the resulting re-sampling of subjects with the most sampled observation used four times. With \(n = 30\) , the chance of selecting any observation for any slot in the new data set is \(1/30\) and the expected or mean number of appearances we expect to see for an observation is the number of random draws times the probably of selection on each so \(30*1/30 = 1\) . So we expect to see each observation in the bootstrap sample on average once but random variability in the samples then creates the possibility of seeing it more than once or not all.

We can use the two results to get an idea of distribution of results in terms of number of times observations might be re-sampled when sampling with replacement and the variation in those results, as shown in Figure 2.22. We could also derive the expected counts for each number of times of re-sampling when we start with all observations having an equal chance and sampling with replacement but this isn’t important for using bootstrapping methods.

Counts of number of times of observation (or not observed for times re-sampled of 0) for two bootstrap samples.

The main point of this exploration was to see that each run of the resample function provides a new version of the data set. Repeating this \(B\) times using another for loop, we will track our quantity of interest, say \(T\) , in all these new “data sets” and call those results \(T^*\) . The distribution of the bootstrapped \(T^*\) statistics tells us about the range of results to expect for the statistic. The middle % of the \(T^*\) ’s provides a % bootstrap confidence interval 55 for the true parameter – here the difference in the two population means .

To make this concrete, we can revisit our previous examples, starting with the dsample data created before and our interest in comparing the mean passing distances for the commuter and casual outfit groups in the \(n = 30\) stratified random sample that was extracted. The bootstrapping code is very similar to the permutation code except that we apply the resample function to the entire data set used in lm as opposed to the shuffle function that was applied only to the explanatory variable.

Histogram and density curve of bootstrap distributions of difference in sample mean Distances with vertical line for the observed difference in the means of -25.933.

In this situation, the observed difference in the mean passing distances is -25.933 cm ( commute - casual ), which is the bold vertical line in Figure 2.23. The bootstrap distribution shows the results for the difference in the sample means when fake data sets are re-constructed by sampling from the original data set with replacement. The bootstrap distribution is approximately centered at the observed value (difference in the sample means) and is relatively symmetric.

The permutation distribution in the same situation (Figure 2.10) had a similar shape but was centered at 0. Permutations create sampling distributions based on assuming the null hypothesis is true, which is useful for hypothesis testing. Bootstrapping creates distributions centered at the observed result, which is the sampling distribution “under the alternative” or when no null hypothesis is assumed; bootstrap distributions are useful for generating confidence intervals for the true parameter values.

To create a 95% bootstrap confidence interval for the difference in the true mean distances ( \(\mu_\text{commute}-\mu_\text{casual}\) ), select the middle 95% of results from the bootstrap distribution. Specifically, find the 2.5 th percentile and the 97.5 th percentile (values that put 2.5 and 97.5% of the results to the left) in the bootstrap distribution, which leaves 95% in the middle for the confidence interval. To find percentiles in a distribution in R, functions are of the form q[Name of distribution] , with the function qt extracting percentiles from a \(t\) -distribution (examples below). From the bootstrap results, use the qdata function on the Tstar results that contain the bootstrap distribution of the statistic of interest.

These results tell us that the 2.5 th percentile of the bootstrap distribution is at -50.006 cm and the 97.5 th percentile is at -2.249 cm. We can combine these results to provide a 95% confidence for \(\mu_\text{commute}-\mu_\text{casaual}\) that is between -50.01 and -2.25 cm. This interval is interpreted as with any confidence interval, that we are 95% confident that the difference in the true mean distances ( commute minus casual groups) is between -50.01 and -2.25 cm. Or we can switch the direction of the comparison and say that we are 95% confident that the difference in the true means is between 2.25 and 50.01 cm ( casual minus commute ). This result would be incorporated into step 5 of the hypothesis testing protocol to accompany discussing the size of the estimated difference in the groups or used as a result of interest in itself. Both percentiles can be obtained in one line of code using:

Figure 2.24 displays those same percentiles on the bootstrap distribution residing in Tstar .

Histogram and density curve of bootstrap distribution with 95% bootstrap confidence intervals displayed (bold, dashed vertical lines).

Although confidence intervals can exist without referencing hypotheses, we can revisit our previous hypotheses and see what this confidence interval tells us about the test of \(H_0: \mu_\text{commute} = \mu_\text{casual}\) . This null hypothesis is equivalent to testing \(H_0: \mu_\text{commute} - \mu_\text{casual} = 0\) , that the difference in the true means is equal to 0 cm. And the difference in the means was the scale for our confidence interval, which did not contain 0 cm. The 0 cm values is an interesting reference value for the confidence interval, because here it is the value where the true means are equal to each other (have a difference of 0 cm). In general, if our confidence interval does not contain 0, then it is saying that 0 is not one of the likely values for the difference in the true means at the selected confidence level. This implies that we should reject a claim that they are equal. This provides the same inferences for the hypotheses that we considered previously using both parametric and permutation approaches using a fixed \(\alpha\) approach where \(\alpha\) = 1 - confidence level.

The general summary is that we can use confidence intervals to test hypotheses by assessing whether the reference value under the null hypothesis is in the confidence interval (suggests insufficient evidence against \(H_0\) to reject it, at least at the \(\alpha\) level and equivalent to having a p-value larger than \(\alpha\) ) or outside the confidence interval (sufficient evidence against \(H_0\) to reject it and equivalent to having a p-value that is less than \(\alpha\) ). P-values are more informative about hypotheses (measure of evidence against the null hypothesis) but confidence intervals are more informative about the size of differences, so both offer useful information and, as shown here, can provide consistent conclusions about hypotheses. But it is best practice to use p-values to assess evidence against null hypotheses and confidence intervals to do inferences for the size of differences.

As in the previous situation, we also want to consider the parametric approach for comparison purposes and to have that method available, especially to help us understand some methods where we will only consider parametric inferences in later chapters. The parametric confidence interval is called the equal variance, two-sample t confidence interval and additionally assumes that the populations being sampled from are normally distributed instead of just that they have similar shapes in the bootstrap approach. The parametric method leads to using a \(t\) -distribution to form the interval with the degrees of freedom for the \(t\) -distribution of \(n-2\) although we can obtain it without direct reference to this distribution using the confint function applied to the lm model. This function generates two confidence intervals and the one in the second row is the one we are interested as it pertains to the difference in the true means of the two groups. The parametric 95% confidence interval here is from -51.6 to -0.26 cm which is a bit different in width from the nonparametric bootstrap interval that was from -50.01 and -2.25 cm.

The bootstrap interval was narrower by almost 4 cm and its upper limit was much further from 0. The bootstrap CI can vary depending on the random number seed used and additional runs of the code produced intervals of (-49.6, -2.8), (-48.3, -2.5), and (-50.9, -1.1) so the differences between the parametric and nonparametric approaches was not just due to an unusual bootstrap distribution. It is not entirely clear why the two intervals differ but there are slightly more results in the left tail of Figure 2.24 than in the right tail and this shifts the 95% confidence slightly away from 0 as compared to the parametric approach. All intervals have the same interpretation, only the methods for calculating the intervals and the assumptions differ. Specifically, the bootstrap interval can tolerate different distribution shapes other than normal and still provide intervals that work well 56 . The other assumptions are all the same as for the hypothesis test, where we continue to assume that we have independent observations with equal variances for the two groups and maintain concerns about inferences here due to the violation of independence in these responses.

The formula that lm is using to calculate the parametric equal variance, two-sample \(t\) -based confidence interval is:

\[\bar{x}_1 - \bar{x}_2 \mp t^*_{df}s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}\]

In this situation, the df is again \(n_1+n_2-2\) (the total sample size - 2) and \(s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}}\) . The \(t^*_{df}\) is a multiplier that comes from finding the percentile from the \(t\) -distribution that puts \(C\) % in the middle of the distribution with \(C\) being the confidence level. It is important to note that this \(t^*\) has nothing to do with the previous test statistic \(t\) . It is confusing and students first engaging these two options often happily take the result from a test statistic calculation and use it for a multiplier in a \(t\) -based confidence interval – try to focus on which \(t\) you are interested in before you use either. Figure 2.25 shows the \(t\) -distribution with 28 degrees of freedom and the cut-offs that put 95% of the area in the middle.

Plot of \(t(28)\) with cut-offs for putting 95% of distribution in the middle that delineate the \(t^*\) multiplier to make a 95% confidence interval.

For 95% confidence intervals, the multiplier is going to be close to 2 and anything else is a likely indication of a mistake. We can use R to get the multipliers for confidence intervals using the qt function in a similar fashion to how qdata was used in the bootstrap results, except that this new value must be used in the previous confidence interval formula. This function produces values for requested percentiles, so if we want to put 95% in the middle, we place 2.5% in each tail of the distribution and need to request the 97.5 th percentile. Because the \(t\) -distribution is always symmetric around 0, we merely need to look up the value for the 97.5 th percentile and know that the multiplier for the 2.5 th percentile is just \(-t^*\) . The \(t^*\) multiplier to form the confidence interval is 2.0484 for a 95% confidence interval when the \(df = 28\) based on the results from qt :

Note that the 2.5 th percentile is just the negative of this value due to symmetry and the real source of the minus in the minus/plus in the formula for the confidence interval.

We can also re-write the confidence interval formula into a slightly more general forms as

\[\bar{x}_1 - \bar{x}_2 \mp t^*_{df}SE_{\bar{x}_1 - \bar{x}_2}\ \text{ OR }\ \bar{x}_1 - \bar{x}_2 \mp ME\]

where \(SE_{\bar{x}_1 - \bar{x}_2} = s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}\) and \(ME = t^*_{df}SE_{\bar{x}_1 - \bar{x}_2}\) . The SE is available in the lm model summary for the line related to the difference in groups in the “Std. Error” column. In some situations, researchers will report the standard error (SE) or margin of error (ME) as a method of quantifying the uncertainty in a statistic. The SE is an estimate of the standard deviation of the statistic (here \(\bar{x}_1 - \bar{x}_2\) ) and the ME is an estimate of the precision of a statistic that can be used to directly form a confidence interval. The ME depends on the choice of confidence level although 95% is almost always selected.

To finish this example, R can be used to help you do calculations much like a calculator except with much more power “under the hood”. You have to make sure you are careful with using ( ) to group items and remember that the asterisk (*) is used for multiplication. We need the pertinent information which is available from the favstats output repeated below to calculate the confidence interval “by hand” 57 using R.

Start with typing the following command to calculate \(s_p\) and store it in a variable named sp :

Then calculate the confidence interval that confint provided using:

Or using the information from the model summary:

The previous results all use c(-1, 1) times the margin of error to subtract and add the ME to the difference in the sample means ( \(109.8667 - 135.8\) ), which generates the lower and then upper bounds of the confidence interval. If desired, we can also use just the last portion of the calculation to find the margin of error, which is 25.675 here.

For the entire \(n = 1,636\) data set for these two groups, the results are obtained using the following code. The estimated difference in the means is -3 cm ( commute minus casual ). The \(t\) -based 95% confidence interval is from -5.89 to -0.11.

The bootstrap 95% confidence interval is from -5.816 to -0.076. With this large data set, the differences between parametric and permutation approaches decrease and they essentially equivalent here. The bootstrap distribution (not displayed) for the differences in the sample means is relatively symmetric and centered around the estimated difference of 6 cm. So using all the observations we would be 95% confident that the true mean difference in overtake distances ( commute - casual ) is between -5.82 and -0.08 cm, providing additional information about the estimated difference in the sample means of 6 cm.

Hypothesis testing and bootstrapping

Permutation tests, empirical distribution functions, chi-squared tests for categorical data, nonparametric bootstrapping of regression standard errors, using the boot package in r, parametric bootstrapping of regression standard errors, kolmogorov-smirnov: bootstrapped p-values.

Introduction to Computational Finance and Financial Econometrics with R

9.8 using the bootstrap for hypothesis testing.

To be completed

  • duality between HT and CIs can be exploited by the bootstrap. Use bootstrap to compute CI and see if hypothesized values lies in the interval. Illustrate with testing equality of 2 Sharpe ratios
  • Introduction to Machine Learning
  • Machine Learning with R
  • Machine Learning with Python
  • Statistics in R
  • Math for Machine Learning
  • Machine Learning Interview Questions
  • Projects in R
  • Deep Learning with R
  • AI Algorithm
  • How to Perform a Breusch-Pagan Test in Python
  • How to Perform a Wald Test in R
  • How to Perform Grubbs’ Test in Python
  • How to Perform a Mann-Kendall Trend Test in R
  • How to Perform McNemar’s Test in R
  • How to Perform Runs Test in R
  • How to Perform a Brown – Forsythe Test in Python
  • How to Perform an F-Test in Python
  • How to Perform Bartlett’s Test in Python?
  • How to Perform Fisher’s Exact Test in Python
  • How to Perform an Augmented Dickey-Fuller Test in R
  • How to Perform a Mann-Kendall Trend Test in Python
  • How to Perform an Anderson-Darling Test in Python
  • How to Perform the Nemenyi Test in Python
  • How to Conduct a Sobel Test in R
  • How to Perform a One-Way ANOVA in Python
  • How to Check if Characters are Present in a String in R.
  • How to compare time in R?
  • How to Create a Forest Plot in R?

How to Perform a Breusch-Pagan Test in R

The Breusch-Pagan test is a statistical test used to detect heteroscedasticity in a regression model. Heteroscedasticity occurs when the variance of the errors is not constant across all levels of the independent variables, which can lead to inefficient estimates and affect the reliability of hypothesis tests.

Understanding the Breusch-Pagan Test

The Breusch-Pagan test checks if the variance of errors in a regression model changes based on the predictors. It does this by squaring the errors and then checking if there’s a relationship between these squared errors and the original predictors. If there’s no relationship, it means the error variance is constant (no heteroscedasticity). If there is a relationship, it suggests heteroscedasticity is present.

Syntax: bptest(formula, data, studentize = TRUE) Here, formula: The formula specifies the linear model. data: The data frame containing the variables in the model. studentize: Logical; if TRUE, the test uses studentized residuals. The default is TRUE.

Implement Breusch-Pagan Test in R

Now we will discuss how to Implement Breusch-Pagan Test in R Programming Language step by step.

Step 1: Install and Load Required Packages

Here are we first install and load the Required Packages.

Step 2: Fit a Linear Regression Model

Here we will fit a Linear Regression Model to Perform a Breusch-Pagan Test.

Step 3: Perform the Breusch-Pagan Test

Now we will Perform the Breusch-Pagan Test with the help of bptest function.

Since the p-value (0.6438) is much greater than 0.05, we fail to reject the null hypothesis. This indicates that there is no significant evidence of heteroscedasticity in the residuals of the regression model. Therefore, we can conclude that the assumption of homoscedasticity holds for this model, meaning the variance of the error terms is constant across observations.

The Breusch-Pagan test is an essential tool for diagnosing heteroscedasticity in regression models. By following the outlined steps, it is easy to perform this test in R and take necessary actions if heteroscedasticity is detected. Addressing heteroscedasticity ensures the reliability of regression analysis, leading to more accurate and valid inferences.

Breusch-Pagan Test in R-FAQs

What is heteroscedasticity.

Heteroscedasticity occurs when the variance of the error terms in a regression model is not constant across observations.

Why is heteroscedasticity a problem?

Heteroscedasticity can lead to inefficient estimates and affect the validity of statistical tests, making it crucial to detect and correct.

How can I address heteroscedasticity in my regression model?

You can transform the dependent variable, use weighted least squares, or apply robust standard errors.

What is the null hypothesis of the Breusch-Pagan test?

The null hypothesis is that the variance of the errors is constant (homoscedasticity).

What does a significant p-value in the Breusch-Pagan test indicate?

A significant p-value indicates the presence of heteroscedasticity in the regression model.

author

Please Login to comment...

Similar reads.

  • R Machine Learning

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

IMAGES

  1. Bootstrap Hypothesis Testing in R with Example

    hypothesis testing with bootstrap

  2. Bootstrap hypothesis testing p-value confusion

    hypothesis testing with bootstrap

  3. hypothesis testing

    hypothesis testing with bootstrap

  4. Permutation vs. bootstrap test of hypothesis

    hypothesis testing with bootstrap

  5. Hypothesis Testing- Meaning, Types & Steps

    hypothesis testing with bootstrap

  6. Bootstrap Hypothesis Testing in R, free R Video Tutorial

    hypothesis testing with bootstrap

VIDEO

  1. When Intrusive Thoughts Take Over VR Physics Testing (Bootstrap Island)

  2. Testing Bootstrap Farmer Trays

  3. Computational Statistics

  4. A Gentle Introduction to The Bootstrap Method

  5. Sample size selection, bootstrap method, and hypothesis testing introduction

  6. Bootstrap distribution, estimation. standard error and bias || In Bengali

COMMENTS

  1. Introduction to Bootstrapping in Statistics with an Example

    Bootstrapping is a statistical procedure that resamples a single dataset to create many simulated samples. This process allows you to calculate standard errors, construct confidence intervals, and perform hypothesis testing for numerous types of sample statistics.Bootstrap methods are alternative approaches to traditional hypothesis testing and are notable for being easier to understand and ...

  2. The Two-Sample Hypothesis Tests using the Bootstrap

    Using the Bootstrap for Two-Sample Hypothesis Tests. Since each bootstrap replicate is a possible representation of the population, we can compute the relevant test-statistics from this bootstrap sample. By repeating this, we can have many simulated values of the test-statistics that form the null distribution to test the hypothesis.

  3. PDF Hypothesis Testing with the Bootstrap

    A bootstrap hypothesis test starts with a test statistic - P( ) (not necessary an estimate of a parameter). We seek an achieved significance level 𝑆𝐿=đ‘ƒđ‘đ» 0 P ∗ ≄ P( ) Where the random variable ∗ has a distribution specified by the null hypothesis 0 - denote as 0. Bootstrap hypothesis testing uses a "plug-in" style to ...

  4. Bootstrapping (statistics)

    This represents an empirical bootstrap distribution of sample mean. From this empirical distribution, one can derive a bootstrap confidence interval for the purpose of hypothesis testing. Regression. In regression problems, case resampling refers to the simple scheme of resampling individual cases - often rows of a data set. For regression ...

  5. PDF STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

    Hypothesis Testing The terminology we use when conducting a hypothesis test is that we either: I Have enough evidence (based on our statistic, T(X 1;:::;X n)) to reject the null hypothesis H 0 in favor of the alternative hypothesis H 1, or I We do not have enough evidence to reject the null hypothesis H 0, and so our data is consistent with the ...

  6. Stat 3701 Lecture Notes: Bootstrap

    5.3.1.2 Hypothesis Tests are Problematic. Any hypothesis test calculates critical values or \(P\)-values using the distribution under the null hypothesis. But the bootstrap does not sample that unless the null hypothesis happens to be correct. Usually, we want to reject the null hypothesis, meaning we hope it is not correct. And in any case, we ...

  7. PDF Bootstrap Hypothesis Test

    Bootstrap Hypothesis Test In 1882 Simon Newcomb performed an experiment to measure the speed of light. The numbers below represent the measured time it took for light to travel from Fort Myer on the west bank of the Potomac River to a fixed mirror at the foot of the Washington monument 3721 meters away.

  8. Lesson 11: Introduction to Nonparametric Tests and Bootstrap

    The p-value for this test is 0.086. The p-value is less than our significance level and therefore we reject the null hypothesis. There is enough evidence in the data to suggest the population median time is greater than 4. If we assume the data are normal and perform a test for the mean, the p-value was 0.0798.

  9. Lesson 11: Introduction to Nonparametric Tests and Bootstrap

    Objectives. Upon the completion of this lesson, you shoul be able to: Determine when to use nonparametric methods. Explain how to conduct the Sign test. Generate a bootstrap sample. Find a confidence interval for any statistic from the bootstrap sample. 11.1 - Inference for the Population Median. 11.2 - Introduction to Bootstrapping.

  10. THE BOOTSTRAP IN HYPOTHESIS TESTING

    The Bootstrap in Hypothesis Testing 93 brought his paper to our attention). We focus here on checkable conditions and examples. In Section 3, we state and prove a theorem showing that the m out of n bootstrap is an approach that generally provides correct significance level, asymptotic power under contiguous alternatives, and consistency. This is

  11. Bootstrap Resampling for Hypothesis Tests: A Modern Approach to

    The bootstrap method for hypothesis testing involves several key steps. Initially, the researcher formulates the null (0 H 0 ) and alternative (1 H 1 ) hypotheses, focusing on a parameter of ...

  12. 11.2.1

    Bootstrapping is a resampling procedure that uses data from one sample to generate a sampling distribution by repeatedly taking random samples from the known sample, with replacement. Let's show how to create a bootstrap sample for the median. Let the sample median be denoted as M. Steps to create a bootstrap sample: Replace the population ...

  13. hypothesis testing

    I would just do a regular bootstrap test: compute the t-statistic in your data and store it; change the data such that the null-hypothesis is true. In this case, subtract the mean in group 1 for group 1 and add the overall mean, and do the same for group 2, that way the means in both group will be the overall mean.

  14. Bootstrap Hypothesis Testing in Statistics with Example ...

    Bootstrap Hypothesis Testing in Statistics with Example: How to test a hypothesis using a bootstrapping approach in Statistics? đŸ‘‰đŸŒRelated Video: Hypothes...

  15. Bootstrap Hypothesis Testing

    Bootstrap and Monte Carlo tests. Finite-sample properties of bootstrap tests. Double bootstrap and fast double bootstrap tests. Bootstrap data generating processes. Multiple test statistics. Finite-sample properties of bootstrap supF tests. Conclusion. Acknowledgments. References

  16. 2.9: Confidence intervals and bootstrapping

    Permutations create sampling distributions based on assuming the null hypothesis is true, which is useful for hypothesis testing. Bootstrapping creates distributions centered at the observed result, which is the sampling distribution "under the alternative" or when no null hypothesis is assumed; bootstrap distributions are useful for ...

  17. Hypothesis testing and bootstrapping in R

    Hypothesis testing and bootstrapping. This tutorial demonstrates some of the many statistical tests that R can perform. It is impossible to give an exhaustive list of such testing functionality, but we hope not only to provide several examples but also to elucidate some of the logic of statistical hypothesis tests with these examples.

  18. The One-Sample Hypothesis Tests Using the Bootstrap

    Using the same hypothesis testing framework. We first establish the null and the alternative hypothesis. is the test statistic computed from the bootstrap replicate and is the basis value that we are testing. For example, a standard deviation of 16.5 is and standard deviation computed from one bootstrap sample is .

  19. 9.8 Using the Bootstrap for Hypothesis Testing

    9.8 Using the Bootstrap for Hypothesis Testing. To be completed. duality between HT and CIs can be exploited by the bootstrap. Use bootstrap to compute CI and see if hypothesized values lies in the interval. Illustrate with testing equality of 2 Sharpe ratios

  20. Bootstrap for hypothesis testing

    3. Scenario: I have two measurement tools A and B and only about n=5-10 measurements of the same object for each tool. I want to test if there is a difference in the mean of the measurements between the two tools. I want to use the bootstrap hypothesis approach creating 1000 bootstrapped samples with replacement for each tool (or do I need to ...

  21. Understanding bootstrap hypothesis testing

    Formally, the null hypothesis of the test is that the two means are equal, because the distribution of the test statistic is simulated by bootstrap assuming that the means are equal. This is however not different from the standard (non-bootstrap) two-sample t-test. The reason why the H0 can there be stated as "mean of X is not greater as mean ...

  22. What is the difference between bootstrap hypothesis testing/permutation

    Since null hypothesis tests and confidence intervals are complementary, this can be used to construct a bootstrap-based null hypothesis test. The bootstrap is very versatile because it does not need (strong) distributional assumptions, but because it relies on an approximating distribution it is an approximate procedure.

  23. How to Perform a Breusch-Pagan Test in R

    In this article, we will be looking at the approach to perform a one-proportion Z-test in the Python programming language. Z-test is a statistical test to determine whether two population means are different when the variances are known and the sample size is large. One-proportion Z-test formula: [Tex]z = \frac{(P-Po)}{\sqrt{Po(1-Po)/n}}[/Tex] Wher

  24. r

    I have a small question about the concept behind hypothesis testing using bootstrap. Assume that I need to evaluate two independent population mean differences: population a and population b. My doubt is the following: Should I apply bootstrap on a single population, and check the difference of the mean after that? Mean[BOOT(a)-BOOT(b)]