Logo for Open Library Publishing Platform

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

8.8 Hypothesis Tests for a Population Proportion

Learning objectives.

  • Conduct and interpret hypothesis tests for a population proportion.

Some notes about conducting a hypothesis test:

  • The null hypothesis [latex]H_0[/latex] is always an “equal to.”  The null hypothesis is the original claim about the population parameter.
  • The alternative hypothesis [latex]H_a[/latex] is a “less than,” “greater than,” or “not equal to.”  The form of the alternative hypothesis depends on the context of the question.
  • If the alternative hypothesis is a “less than”, then the test is left-tail.  The p -value is the area in the left-tail of the distribution.
  • If the alternative hypothesis is a “greater than”, then the test is right-tail.  The p -value is the area in the right-tail of the distribution.
  • If the alternative hypothesis is a “not equal to”, then the test is two-tail.  The p -value is the sum of the area in the two-tails of the distribution.  Each tail represents exactly half of the p -value.
  • Think about the meaning of the p -value.  A data analyst (and anyone else) should have more confidence that they made the correct decision to reject the null hypothesis with a smaller p -value (for example, 0.001 as opposed to 0.04) even if using a significance level of 0.05. Similarly, for a large p -value such as 0.4, as opposed to a p -value of 0.056 (a significance level of 0.05 is less than either number), a data analyst should have more confidence that they made the correct decision in not rejecting the null hypothesis. This makes the data analyst use judgment rather than mindlessly applying rules.
  • The significance level must be identified before collecting the sample data and conducting the test.  Generally, the significance level will be included in the question.  If no significance level is given, a common standard is to use a significance level of 5%.

Suppose the hypotheses for a hypothesis test are:

[latex]\begin{eqnarray*} H_0: & & p=20 \% \\ H_a: & & p \gt 20\% \end{eqnarray*}[/latex]

Because the alternative hypothesis is a [latex]\gt[/latex], this is a right-tail test.  The p -value is the area in the right-tail of the distribution.

Normal distribution curve of a single population proportion with the value of 0.2 on the x-axis. The p-value points to the area on the right tail of the curve.

[latex]\begin{eqnarray*} H_0: & & p=50 \% \\ H_a: & & p \neq  50\% \end{eqnarray*}[/latex]

Because the alternative hypothesis is a [latex]\neq[/latex], this is a two-tail test.  The p -value is the sum of the areas in the two tails of the distribution.  Each tail contains exactly half of the p -value.

Normal distribution curve of a single population mean with a value of 50 on the x-axis. The p-value formulas, 1/2(p-value), for a two-tailed test is shown for the areas on the left and right tails of the curve.

[latex]\begin{eqnarray*} H_0: & & p=10\% \\ H_a: & & p \lt  10\% \end{eqnarray*}[/latex]

Because the alternative hypothesis is a [latex]\lt[/latex], this is a left-tail test.  The p -value is the area in the left-tail of the distribution.

Steps to Conduct a Hypothesis Test for a Population Proportion

  • Write down the null and alternative hypotheses in terms of the population proportion [latex]p[/latex].  Include appropriate units with the values of the proportion.
  • Use the form of the alternative hypothesis to determine if the test is left-tailed, right-tailed, or two-tailed.
  • Collect the sample information for the test and identify the significance level.
  • If [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p) \geq 5[/latex], use the normal distribution with [latex]\displaystyle{z=\frac{\hat{p}-p}{\sqrt{\frac{p \times (1-p)}{n}}}}[/latex].
  • If one of [latex]n \times p \lt 5[/latex] or [latex]n \times (1-p) \lt 5[/latex], use a binomial distribution.
  • The results of the sample data are significant.  There is sufficient evidence to conclude that the null hypothesis [latex]H_0[/latex] is an incorrect belief and that the alternative hypothesis [latex]H_a[/latex] is most likely correct.
  • The results of the sample data are not significant.  There is not sufficient evidence to conclude that the alternative hypothesis [latex]H_a[/latex] may be correct.
  • Write down a concluding sentence specific to the context of the question.

USING EXCEL TO CALCULE THE P -VALUE FOR A HYPOTHESIS TEST ON A POPULATION PROPORTION

The p -value for a hypothesis test on a population proportion is the area in the tail(s) of distribution of the sample proportion.  If both [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p) \geq 5[/latex], use the normal distribution to find the p -value.  If at least one of [latex]n \times p \lt 5[/latex] or [latex]n \times (1-p) \lt 5[/latex], use the binomial distribution to find the p -value.

If both [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p) \geq 5[/latex]:

  • For x , enter the value for [latex]\hat{p}[/latex].
  • For [latex]\mu[/latex] , enter the mean of the sample proportions [latex]p[/latex].  Note:  Because the test is run assuming the null hypothesis is true, the value for [latex]p[/latex] is the claim from the null hypothesis.
  • For [latex]\sigma[/latex] , enter the standard error of the proportions [latex]\displaystyle{\sqrt{\frac{p \times (1-p)}{n}}}[/latex].
  • For the logic operator , enter true .  Note:  Because we are calculating the area under the curve, we always enter true for the logic operator.
  • Use the appropriate technique with the norm.dist function to find the area in the left-tail or the area in the right-tail.

If at least one of [latex]n \times p \lt 5[/latex] or [latex]n \times (1-p) \lt 5[/latex]:

  • The p -value is found using the binomial distribution.
  • For x , enter the number of successes.
  • For n , enter the sample size.
  • For p , enter the the value of the population proportion [latex]p[/latex] from the null hypothesis.
  • For the logic operator , enter true .  Note:  Because we are calculating an at most probability, the logic operator is always true.
  • For p , enter the the value of the population proportion [latex]p[/latex] in the null hypothesis.
  • For the logic operator , enter true .  Note:  Because we are calculating an at least probability, the logic operator is always true.

Marketers believe that 92% of adults own a cell phone.  A cell phone manufacturer believes that number is actually lower.  In a sample of 200 adults, 87% own a cell phone.  At the 1% significance level, determine if the proportion of adults that own a cell phone is lower than the marketers’ claim.

Hypotheses:

[latex]\begin{eqnarray*} H_0: & & p=92\% \mbox{ of adults own a cell phone} \\ H_a: & & p \lt 92\% \mbox{ of adults own a cell phone} \end{eqnarray*}[/latex]

From the question, we have [latex]n=200[/latex], [latex]\hat{p}=0.87[/latex], and [latex]\alpha=0.01[/latex].

To determine the distribution, we check [latex]n \times p[/latex] and [latex]n \times (1-p)[/latex].  For the value of [latex]p[/latex], we use the claim from the null hypothesis ([latex]p=0.92[/latex]).

[latex]\begin{eqnarray*} n \times p & = & 200 \times 0.92=184 \geq 5 \\ n \times (1-p) & = & 200 \times (1-0.92)=16 \geq 5\end{eqnarray*}[/latex]

Because both [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p)  \geq 5[/latex] we use a normal distribution to calculate the p -value.  Because the alternative hypothesis is a [latex]\lt[/latex], the p -value is the area in the left tail of the distribution.

This is a normal distribution curve. On the left side of the center a vertical line extends to the curve with the area to the left of this vertical line shaded. The p-value equals the area of this shaded region.

norm.dist
0.87 0.0046
0.92
sqrt(0.92*(1-0.92)/200)
true

So the p -value[latex]=0.0046[/latex].

Conclusion:

Because p -value[latex]=0.0046 \lt 0.01=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis.  At the 1% significance level there is enough evidence to suggest that the proportion of adults who own a cell phone is lower than 92%.

  • The null hypothesis [latex]p=92\%[/latex] is the claim that 92% of adults own a cell phone.
  • The alternative hypothesis [latex]p \lt 92\%[/latex] is the claim that less than 92% of adults own a cell phone.
  • The function is norm.dist because we are finding the area in the left tail of a normal distribution.
  • Field 1 is the value of [latex]\hat{p}[/latex].
  • Field 2 is the value of [latex]p[/latex] from the null hypothesis.  Remember, we run the test assuming the null hypothesis is true, so that means we assume [latex]p=0.92[/latex].
  • Field 3 is the standard deviation for the sample proportions [latex]\displaystyle{\sqrt{\frac{p \times (1-p)}{n}}}[/latex].
  • The p -value of 0.0046 tells us that under the assumption that 92% of adults own a cell phone (the null hypothesis), there is only a 0.46% chance that the proportion of adults who own a cell phone in a sample of 200 is 87% or less.  This is a small probability, and so is unlikely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely incorrect, and so the conclusion of the test is to reject the null hypothesis in favour of the alternative hypothesis.  In other words, the proportion of adults who own a cell phone is most likely less than 92%.

A consumer group claims that the proportion of households that have at least three cell phones is 30%.  A cell phone company has reason to believe that the proportion of households with at least three cell phones is much higher.  Before they start a big advertising campaign based on the proportion of households that have at least three cell phones, they want to test their claim.  Their marketing people survey 150 households with the result that 54 of the households have at least three cell phones.  At the 1% significance level, determine if the proportion of households that have at least three cell phones is less than 30%.

[latex]\begin{eqnarray*} H_0: & & p=30\% \mbox{ of household have at least 3 cell phones} \\ H_a: & & p \gt 30\% \mbox{ of household have at least 3 cell phones} \end{eqnarray*}[/latex]

From the question, we have [latex]n=150[/latex], [latex]\displaystyle{\hat{p}=\frac{54}{150}=0.36}[/latex], and [latex]\alpha=0.01[/latex].

To determine the distribution, we check [latex]n \times p[/latex] and [latex]n \times (1-p)[/latex].  For the value of [latex]p[/latex], we use the claim from the null hypothesis ([latex]p=0.3[/latex]).

[latex]\begin{eqnarray*} n \times p & = & 150 \times 0.3=45 \geq 5 \\ n \times (1-p) & = & 150 \times (1-0.3)=105 \geq 5\end{eqnarray*}[/latex]

Because both [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p)  \geq  5[/latex] we use a normal distribution to calculate the p -value.  Because the alternative hypothesis is a [latex]\gt[/latex], the p -value is the area in the right tail of the distribution.

This is a normal distribution curve. On the right side of the center a vertical line extends to the curve with the area to the right of this vertical line shaded. The p-value equals the area of this shaded region.

1-norm.dist
0.36 0.0544
0.3
sqrt(0.3*(1-0.3)/150)
true

So the p -value[latex]=0.0544[/latex].

Because p -value[latex]=0.0544 \gt 0.01=\alpha[/latex], we do not reject the null hypothesis.  At the 1% significance level there is not enough evidence to suggest that the proportion of households with at least three cell phones is more than 30%.

  • The null hypothesis [latex]p=30\%[/latex] is the claim that 30% of households have at least three cell phones.
  • The alternative hypothesis [latex]p \gt 30\%[/latex] is the claim that more than 30% of households have at least three cell phones.
  • The function is 1-norm.dist because we are finding the area in the right tail of a normal distribution.
  • Field 2 is the value of [latex]p[/latex] from the null hypothesis.  Remember, we run the test assuming the null hypothesis is true, so that means we assume [latex]p=0.3[/latex].
  • The p -value of 0.0544 tells us that under the assumption that 30% of households have at least three cell phones (the null hypothesis), there is a 5.44% chance that the proportion of households with at least three cell phones in a sample of 150 is 36% or more.  Compared to the 1% significance level, this is a large probability, and so is likely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis.  In other words, the claim that 30% of households have at least three cell phones is most likely correct.

A teacher believes that 70% of students in the class will want to go on a field trip to the local zoo.  The students in the class believe the proportion is much higher and ask the teacher to verify her claim.  The teacher samples 50 students and 39 reply that they would want to go to the zoo.  At the 5% significance level, determine if the proportion of students who want to go on the field trip is higher than 70%.

[latex]\begin{eqnarray*} H_0: & & p = 70\% \mbox{ of students want to go on the field trip}  \\ H_a: & & p \gt 70\% \mbox{ of students want to go on the field trip}   \end{eqnarray*}[/latex]

From the question, we have [latex]n=50[/latex], [latex]\displaystyle{\hat{p}=\frac{39}{50}=0.78}[/latex], and [latex]\alpha=0.05[/latex].

[latex]\begin{eqnarray*} n \times p & = & 50 \times 0.7=35 \geq 5 \\ n \times (1-p) & = & 50 \times (1-0.7)=15 \geq 5\end{eqnarray*}[/latex]

Because both [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p)  \geq 5[/latex] we use a normal distribution to calculate the p -value.  Because the alternative hypothesis is a [latex]\gt[/latex], the p -value is the area in the right tail of the distribution.

1-norm.dist
0.78 0.1085
0.7
sqrt(0.7*(1-0.7)/50)
true

So the p -value[latex]=0.1085[/latex].

Because p -value[latex]=0.1085 \gt 0.05=\alpha[/latex], we do not reject the null hypothesis.  At the 5% significance level there is not enough evidence to suggest that the proportion of students who want to go on the field trip is higher than 70%.

  • The null hypothesis [latex]p=70\%[/latex] is the claim that 70% of the students want to go on the field trip.
  • The alternative hypothesis [latex]p \gt 70\%[/latex] is the claim that more than 70% of students want to go on the field trip.
  • The p -value of 0.1085 tells us that under the assumption that 70% of students want to go on the field trip (the null hypothesis), there is a 10.85% chance that the proportion of students who want to go on the field trip in a sample of 50 students is 78% or more.  Compared to the 5% significance level, this is a large probability, and so is likely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis.  In other words, the teacher’s claim that 70% of students want to go on the field trip is most likely correct.

Joan believes that 50% of first-time brides in the United States are younger than their grooms.  She performs a hypothesis test to determine if the percentage is the same or different from 50%.  Joan samples 100 first-time brides and 56 reply that they are younger than their grooms.  Use a 5% significance level.

[latex]\begin{eqnarray*} H_0: & & p=50\% \mbox{ of first-time brides are younger than the groom} \\ H_a: & & p \neq 50\% \mbox{ of first-time brides are younger than the groom} \end{eqnarray*}[/latex]

From the question, we have [latex]n=100[/latex], [latex]\displaystyle{\hat{p}=\frac{56}{100}=0.56}[/latex], and [latex]\alpha=0.05[/latex].

To determine the distribution, we check [latex]n \times p[/latex] and [latex]n \times (1-p)[/latex].  For the value of [latex]p[/latex], we use the claim from the null hypothesis ([latex]p=0.5[/latex]).

[latex]\begin{eqnarray*} n \times p & = & 100 \times 0.5=50 \geq 5 \\ n \times (1-p) & = & 100 \times (1-0.5)=50 \geq 5\end{eqnarray*}[/latex]

Because both [latex]n \times p \geq 5[/latex] and [latex]n \times (1-p)  \geq 5[/latex] we use a normal distribution to calculate the p -value.  Because the alternative hypothesis is a [latex]\neq[/latex], the p -value is the sum of area in the tails of the distribution.

This is a normal distribution curve. On the left side of the center a vertical line extends to the curve with the area to the left of this vertical line shaded and labeled as one half of the p-value. On the right side of the center a vertical line extends to the curve with the area to the right of this vertical line shaded and labeled as one half of the p-value. The p-value equals the sum of area of these two shaded regions.

Because there is only one sample, we only have information relating to one of the two tails, either the left or the right.  We need to know if the sample relates to the left or right tail because that will determine how we calculate out the area of that tail using the normal distribution.  In this case, the sample proportion [latex]\hat{p}=0.56[/latex] is greater than the value of the population proportion in the null hypothesis [latex]p=0.5[/latex] ([latex]\hat{p}=0.56>0.5=p[/latex]), so the sample information relates to the right-tail of the normal distribution.  This means that we will calculate out the area in the right tail using 1-norm.dist .  However, this is a two-tailed test where the p -value is the sum of the area in the two tails and the area in the right-tail is only one half of the p -value.  The area in the left tail equals the area in the right tail and the p -value is the sum of these two areas.

1-norm.dist
0.56 0.1151
0.5
sqrt(0.5*(1-0.5)/100)
true

So the area in the right tail is 0.1151 and  [latex]\frac{1}{2}[/latex]( p -value)[latex]=0.1151[/latex].  This is also the area in the left tail, so

p -value[latex]=0.1151+0.1151=0.2302[/latex]

Because p -value[latex]=0.2302 \gt 0.05=\alpha[/latex], we do not reject the null hypothesis.  At the 5% significance level there is not enough evidence to suggest that the proportion of first-time brides that are younger than the groom is different from 50%.

  • The null hypothesis [latex]p=50\%[/latex] is the claim that the proportion of first-time brides that are younger than the groom is 50%.
  • The alternative hypothesis [latex]p \neq 50\%[/latex] is the claim that the proportion of first-time brides that are younger than the groom is different from 50%.
  • We use norm.dist([latex]\hat{p}[/latex],[latex]p[/latex],[latex]\mbox{sqrt}(p*(1-p)/n)[/latex],true) to find the area in the left tail.  The area in the right tail equals the area in the left tail, so we can find the p -value by adding the output from this function to itself.
  • We use 1-norm.dist([latex]\hat{p}[/latex],[latex]p[/latex],[latex]\mbox{sqrt}(p*(1-p)/n)[/latex],true) to find the area in the right tail.  The area in the left tail equals the area in the right tail, so we can find the p -value by adding the output from this function to itself.
  • The p -value of 0.2302  is a large probability compared to the 5% significance level, and so is likely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis.  In other words, the claim that the proportion of first-time brides who are younger than the groom is most likely correct.

Watch this video: Hypothesis Testing for Proportions: z -test by ExcelIsFun [7:27] 

An online retailer believes that 93% of the visitors to its website will make a purchase.   A researcher in the marketing department thinks the actual percent is lower than claimed.  The researcher examines a sample of 50 visits to the website and finds that 45 of the visits resulted in a purchase.  At the 1% significance level, determine if the proportion of visits to the website that result in a purchase is lower than claimed.

[latex]\begin{eqnarray*} H_0: & & p=93\% \mbox{ of visitors make a purchase} \\ H_a: & & p \lt 93\% \mbox{ of visitors make a purchase} \end{eqnarray*}[/latex]

From the question, we have [latex]n=50[/latex], [latex]x=45[/latex], and [latex]\alpha=0.01[/latex].

To determine the distribution, we check [latex]n \times p[/latex] and [latex]n \times (1-p)[/latex].  For the value of [latex]p[/latex], we use the claim from the null hypothesis ([latex]p=0.93[/latex]).

[latex]\begin{eqnarray*} n \times p & = & 50 \times 0.93=46.5 \geq 5 \\ n \times (1-p) & = & 50 \times (1-0.93)=3.5 \lt 5\end{eqnarray*}[/latex]

Because [latex]n \times (1-p)  \lt 5[/latex] we use a binomial distribution to calculate the p -value.  Because the alternative hypothesis is a [latex]\lt[/latex], the p -value is the probability of getting at most 45 successes in 50 trials.

binom.dist
45 0.2710
50
0.93
true

So the p -value[latex]=0.2710[/latex].

Because p -value[latex]=0.2710 \gt 0.01=\alpha[/latex], we do not reject the null hypothesis.  At the 1% significance level there is not enough evidence to suggest that the proportion of visitors who make a purchase is lower than 93%.

  • The null hypothesis [latex]p=93\%[/latex] is the claim that 93% of visitors to the website make a purchase.
  • The alternative hypothesis [latex]p \lt 93\%[/latex] is the claim that less than 93% of visitors to the website make a purchase.
  • The function is binom.dist because we are finding the probability of at most 45 successes.
  • Field 1 is the number of successes [latex]x[/latex].
  • Field 2 is the sample size [latex]n[/latex].
  • Field 3 is the probability of success [latex]p[/latex].  This is the claim about the population proportion made in the null hypothesis, so that means we assume [latex]p=0.93[/latex].
  • The p -value of 0.2710 tells us that under the assumption that 93% of visitors make a purchase (the null hypothesis), there is a 27.10% chance that the number of visitors in a sample of 50 who make a purchase is 45 or less.  This is a large probability compared to the significance level, and so is likely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely correct, and so the conclusion of the test is to not reject the null hypothesis.  In other words, the proportion of visitors to the website who make a purchase adults is most likely 93%.

A drug company claims that only 4% of people who take their new drug experience any side effects from the drug.  A researcher believes that the percent is higher than drug company’s claim.  The researcher takes a sample of 80 people who take the drug and finds that 10% of the people in the sample experience side effects from the drug.  At the 5% significance level, determine if the proportion of people who experience side effects from taking the drug is higher than claimed.

[latex]\begin{eqnarray*} H_0: & & p=4\% \mbox{ of people experience side effects} \\ H_a: & & p \gt 4\% \mbox{ of people experience side effects} \end{eqnarray*}[/latex]

From the question, we have [latex]n=80[/latex], [latex]\hat{p}=0.1[/latex], and [latex]\alpha=0.05[/latex].

To determine the distribution, we check [latex]n \times p[/latex] and [latex]n \times (1-p)[/latex].  For the value of [latex]p[/latex], we use the claim from the null hypothesis ([latex]p=0.04[/latex]).

[latex]\begin{eqnarray*} n \times p & = & 80 \times 0.04=3.2 \lt 5\end{eqnarray*}[/latex]

Because [latex]n \times p  \lt 5[/latex] we use a binomial distribution to calculate the p -value.  Because the alternative hypothesis is a [latex]\gt[/latex], the p -value is the probability of getting at least 8 successes in 80 trials.  (Note:  In the sample of size 80, 10% have the characteristic of interest, so this means that [latex]80 \times 0.1=8[/latex] people in the sample have the characteristic of interest.)

1-binom.dist
7 0.0147
80
0.04
true

So the p -value[latex]=0.0147[/latex].

Because p -value[latex]=0.0147 \lt 0.05=\alpha[/latex], we reject the null hypothesis in favour of the alternative hypothesis.  At the 5% significance level there is enough evidence to suggest that the proportion of people who experience side effects from taking the drug is higher than 4%.

  • The null hypothesis [latex]p=4\%[/latex] is the claim that 4% of the people experience side effects from taking the drug.
  • The alternative hypothesis [latex]p \gt 4\%[/latex] is the claim that more than 4% of the people experience side effects from taking the drug.
  • The function is 1-binom.dist because we are finding the probability of at least 8 successes.
  • Field 1 is [latex]x-1[/latex] where [latex]x[/latex] is the number of successes.  In this case, we are using the compliment rule to change the probability of at least 8 successes into 1 minus the probability of at most 7 successes.
  • Field 3 is the probability of success [latex]p[/latex].  This is the claim about the population proportion made in the null hypothesis, so that means we assume [latex]p=0.04[/latex].
  • The p -value of 0.0147 tells us that under the assumption that 4% of people experience side effects (the null hypothesis), there is a 1.47% chance that the number of people in a sample of 80 who experience side effects is 8 or more.  This is a small probability compared to the significance level, and so is unlikely to happen assuming the null hypothesis is true.  This suggests that the assumption that the null hypothesis is true is most likely incorrect, and so the conclusion of the test is to reject the null hypothesis in favour of the alternative hypothesis.  In other words, the proportion of people who experience side effects is most likely greater than 4%.

Concept Review

The hypothesis test for a population proportion is a well-established process:

  • Find the p -value (the area in the corresponding tail) for the test using the appropriate distribution (normal or binomial).
  • Compare the p -value to the significance level and state the outcome of the test.

Attribution

“ 9.6   Hypothesis Testing of a Single Mean and Single Proportion “ in Introductory Statistics by OpenStax  is licensed under a  Creative Commons Attribution 4.0 International License.

Introduction to Statistics Copyright © 2022 by Valerie Watts is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Statistics Tutorial

Descriptive statistics, inferential statistics, stat reference, statistics - hypothesis testing a proportion.

A population proportion is the share of a population that belongs to a particular category .

Hypothesis tests are used to check a claim about the size of that population proportion.

Hypothesis Testing a Proportion

The following steps are used for a hypothesis test:

  • Check the conditions
  • Define the claims
  • Decide the significance level
  • Calculate the test statistic

For example:

  • Population : Nobel Prize winners
  • Category : Born in the United States of America

And we want to check the claim:

" More than 20% of Nobel Prize winners were born in the US"

By taking a sample of 40 randomly selected Nobel Prize winners we could find that:

10 out of 40 Nobel Prize winners in the sample were born in the US

The sample proportion is then: \(\displaystyle \frac{10}{40} = 0.25\), or 25%.

From this sample data we check the claim with the steps below.

1. Checking the Conditions

The conditions for calculating a confidence interval for a proportion are:

  • The sample is randomly selected
  • Being in the category
  • Not being in the category
  • 5 members in the category
  • 5 members not in the category

In our example, we randomly selected 10 people that were born in the US.

The rest were not born in the US, so there are 30 in the other category.

The conditions are fulfilled in this case.

Note: It is possible to do a hypothesis test without having 5 of each category. But special adjustments need to be made.

2. Defining the Claims

We need to define a null hypothesis (\(H_{0}\)) and an alternative hypothesis (\(H_{1}\)) based on the claim we are checking.

The claim was:

In this case, the parameter is the proportion of Nobel Prize winners born in the US (\(p\)).

The null and alternative hypothesis are then:

Null hypothesis : 20% of Nobel Prize winners were born in the US.

Alternative hypothesis : More than 20% of Nobel Prize winners were born in the US.

Which can be expressed with symbols as:

\(H_{0}\): \(p = 0.20 \)

\(H_{1}\): \(p > 0.20 \)

This is a ' right tailed' test, because the alternative hypothesis claims that the proportion is more than in the null hypothesis.

If the data supports the alternative hypothesis, we reject the null hypothesis and accept the alternative hypothesis.

Advertisement

3. Deciding the Significance Level

The significance level (\(\alpha\)) is the uncertainty we accept when rejecting the null hypothesis in a hypothesis test.

The significance level is a percentage probability of accidentally making the wrong conclusion.

Typical significance levels are:

  • \(\alpha = 0.1\) (10%)
  • \(\alpha = 0.05\) (5%)
  • \(\alpha = 0.01\) (1%)

A lower significance level means that the evidence in the data needs to be stronger to reject the null hypothesis.

There is no "correct" significance level - it only states the uncertainty of the conclusion.

Note: A 5% significance level means that when we reject a null hypothesis:

We expect to reject a true null hypothesis 5 out of 100 times.

4. Calculating the Test Statistic

The test statistic is used to decide the outcome of the hypothesis test.

The test statistic is a standardized value calculated from the sample.

The formula for the test statistic (TS) of a population proportion is:

\(\displaystyle \frac{\hat{p} - p}{\sqrt{p(1-p)}} \cdot \sqrt{n} \)

\(\hat{p}-p\) is the difference between the sample proportion (\(\hat{p}\)) and the claimed population proportion (\(p\)).

\(n\) is the sample size.

In our example:

The claimed (\(H_{0}\)) population proportion (\(p\)) was \( 0.20 \)

The sample size (\(n\)) was \(40\)

So the test statistic (TS) is then:

\(\displaystyle \frac{0.25-0.20}{\sqrt{0.2(1-0.2)}} \cdot \sqrt{40} = \frac{0.05}{\sqrt{0.2(0.8)}} \cdot \sqrt{40} = \frac{0.05}{\sqrt{0.16}} \cdot \sqrt{40} \approx \frac{0.05}{0.4} \cdot 6.325 = \underline{0.791}\)

You can also calculate the test statistic using programming language functions:

With Python use the scipy and math libraries to calculate the test statistic for a proportion.

With R use the built-in prop.test() function to calculate the test statistic for a proportion.

5. Concluding

There are two main approaches for making the conclusion of a hypothesis test:

  • The critical value approach compares the test statistic with the critical value of the significance level.
  • The P-value approach compares the P-value of the test statistic and with the significance level.

Note: The two approaches are only different in how they present the conclusion.

The Critical Value Approach

For the critical value approach we need to find the critical value (CV) of the significance level (\(\alpha\)).

For a population proportion test, the critical value (CV) is a Z-value from a standard normal distribution .

This critical Z-value (CV) defines the rejection region for the test.

The rejection region is an area of probability in the tails of the standard normal distribution.

Because the claim is that the population proportion is more than 20%, the rejection region is in the right tail:

Choosing a significance level (\(\alpha\)) of 0.05, or 5%, we can find the critical Z-value from a Z-table , or with a programming language function:

Note: The functions find the Z-value for an area from the left side.

To find the Z-value for a right tail we need to use the function on the area to the left of the tail (1-0.05 = 0.95).

With Python use the Scipy Stats library norm.ppf() function find the Z-value for an \(\alpha\) = 0.05 in the right tail.

With R use the built-in qnorm() function to find the Z-value for an \(\alpha\) = 0.05 in the right tail.

Using either method we can find that the critical Z-value is \(\approx \underline{1.6449}\)

For a right tailed test we need to check if the test statistic (TS) is bigger than the critical value (CV).

If the test statistic is bigger than the critical value, the test statistic is in the rejection region .

When the test statistic is in the rejection region, we reject the null hypothesis (\(H_{0}\)).

Here, the test statistic (TS) was \(\approx \underline{0.791}\) and the critical value was \(\approx \underline{1.6449}\)

Here is an illustration of this test in a graph:

Since the test statistic was smaller than the critical value we do not reject the null hypothesis.

This means that the sample data does not support the alternative hypothesis.

And we can summarize the conclusion stating:

The sample data does not support the claim that "more than 20% of Nobel Prize winners were born in the US" at a 5% significance level .

The P-Value Approach

For the P-value approach we need to find the P-value of the test statistic (TS).

If the P-value is smaller than the significance level (\(\alpha\)), we reject the null hypothesis (\(H_{0}\)).

The test statistic was found to be \( \approx \underline{0.791} \)

For a population proportion test, the test statistic is a Z-Value from a standard normal distribution .

Because this is a right tailed test, we need to find the P-value of a Z-value bigger than 0.791.

We can find the P-value using a Z-table , or with a programming language function:

Note: The functions find the P-value (area) to the left side of Z-value.

To find the P-value for a right tail we need to subtract the left area from the total area: 1 - the output of the function.

With Python use the Scipy Stats library norm.cdf() function find the P-value of a Z-value bigger than 0.791:

With R use the built-in pnorm() function find the P-value of a Z-value bigger than 0.791:

Using either method we can find that the P-value is \(\approx \underline{0.2145}\)

This tells us that the significance level (\(\alpha\)) would need to be bigger than 0.2145, or 21.45%, to reject the null hypothesis.

This P-value is bigger than any of the common significance levels (10%, 5%, 1%).

So the null hypothesis is kept at all of these significance levels.

The sample data does not support the claim that "more than 20% of Nobel Prize winners were born in the US" at a 10%, 5%, or 1% significance level .

Note: It may still be true that the real population proportion is more than 20%.

But there was not strong enough evidence to support it with this sample.

Calculating a P-Value for a Hypothesis Test with Programming

Many programming languages can calculate the P-value to decide outcome of a hypothesis test.

Using software and programming to calculate statistics is more common for bigger sets of data, as calculating manually becomes difficult.

The P-value calculated here will tell us the lowest possible significance level where the null-hypothesis can be rejected.

With Python use the scipy and math libraries to calculate the P-value for a right tailed hypothesis test for a proportion.

Here, the sample size is 40, the occurrences are 10, and the test is for a proportion bigger than 0.20.

With R use the built-in prop.test() function find the P-value for a right tailed hypothesis test for a proportion.

Note: The conf.level in the R code is the reverse of the significance level.

Here, the significance level is 0.05, or 5%, so the conf.level is 1-0.05 = 0.95, or 95%.

Left-Tailed and Two-Tailed Tests

This was an example of a right tailed test, where the alternative hypothesis claimed that parameter is bigger than the null hypothesis claim.

You can check out an equivalent step-by-step guide for other types here:

  • Left-Tailed Test
  • Two-Tailed Test

Get Certified

COLOR PICKER

colorpicker

Contact Sales

If you want to use W3Schools services as an educational institution, team or enterprise, send us an e-mail: [email protected]

Report Error

If you want to report an error, or if you want to make a suggestion, send us an e-mail: [email protected]

Top Tutorials

Top references, top examples, get certified.

Teach yourself statistics

Hypothesis Test for a Proportion

This lesson explains how to conduct a hypothesis test of a proportion, when the following conditions are met:

  • The sampling method is simple random sampling .
  • Each sample point can result in just two possible outcomes. We call one of these outcomes a success and the other, a failure.
  • The sample includes at least 10 successes and 10 failures.
  • The population size is at least 20 times as big as the sample size.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.

State the Hypotheses

Every hypothesis test requires the analyst to state a null hypothesis and an alternative hypothesis . The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true, the other must be false; and vice versa.

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. It should specify the following elements.

  • Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.
  • Test method. Use the one-sample z-test to determine whether the hypothesized population proportion differs significantly from the observed sample proportion.

Analyze Sample Data

Using sample data, find the test statistic and its associated P-Value.

σ = sqrt[ P * ( 1 - P ) / n ]

z = (p - P) / σ

  • P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a z-score, use the Normal Distribution Calculator to assess the probability associated with the z-score. (See sample problems at the end of this lesson for examples of how this is done.)

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level , and rejecting the null hypothesis when the P-value is less than the significance level.

Test Your Understanding

In this section, two hypothesis testing examples illustrate how to conduct a hypothesis test of a proportion. The first problem involves a a two-tailed test; the second problem, a one-tailed test.

Sample Size Calculator

As you probably noticed, the process of testing a hypothesis about a proportion can be complex. Stat Trek's Sample Size Calculator can do the same job quickly and easily. When you need to test a hypothesis, consider using the Sample Size Calculator. The calculator is free. It can found in the Stat Trek main menu under the Stat Tools tab. Or you can tap the button below.

Problem 1: Two-Tailed Test

The CEO of a large electric utility claims that 80 percent of his 1,000,000 customers are very satisfied with the service they receive. To test this claim, the local newspaper surveyed 100 customers, using simple random sampling. Among the sampled customers, 73 percent say they are very satisified. Based on these findings, can we reject the CEO's hypothesis that 80% of the customers are very satisfied? Use a 0.05 level of significance.

Solution: The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:

State the hypotheses. The first step is to state the null hypothesis and an alternative hypothesis.

Null hypothesis: P = 0.80

Alternative hypothesis: P ≠ 0.80

  • Formulate an analysis plan . For this analysis, the significance level is 0.05. The test method, shown in the next section, is a one-sample z-test .

σ = sqrt [(0.8 * 0.2) / 100]

σ = sqrt(0.0016) = 0.04

z = (p - P) / σ = (.73 - .80)/0.04 = -1.75

where P is the hypothesized value of population proportion in the null hypothesis, p is the sample proportion, and n is the sample size.

Since we have a two-tailed test , the P-value is the probability that the z-score is less than -1.75 or greater than 1.75. We use the Normal Distribution Calculator to find P(z < -1.75) = 0.04. Since the standard normal distribution is symmetric with a mean of zero, we know that P(z > 1.75) = 0.04. Thus, the P-value = 0.04 + 0.04 = 0.08.

  • Interpret results . Since the P-value (0.08) is greater than the significance level (0.05), we cannot reject the null hypothesis.

Note: If you use this approach on an exam, you may also want to mention why this approach is appropriate. Specifically, the approach is appropriate because the sampling method was simple random sampling, the sample included at least 10 successes and 10 failures, and the population size was at least 10 times the sample size.

Problem 2: One-Tailed Test Suppose the previous example is stated a little bit differently. Suppose the CEO claims that at least 80 percent of the company's 1,000,000 customers are very satisfied. Again, 100 customers are surveyed using simple random sampling. The result: 73 percent are very satisfied. Based on these results, should we accept or reject the CEO's hypothesis? Assume a significance level of 0.05.

Null hypothesis: P >= 0.80

Alternative hypothesis: P < 0.80

σ = sqrt[ P * ( 1 - P ) / n ] = sqrt [(0.8 * 0.2) / 100]

  • Interpret results . Since the P-value (0.04) is less than the significance level (0.05), we cannot accept the null hypothesis.

10.5 Hypothesis Testing for Two Means and Two Proportions

Hypothesis testing for two means and two proportions.

Student Learning Outcomes

  • The student will select the appropriate distributions to use in each case.
  • The student will conduct hypothesis tests and interpret the results.
  • The business section from two consecutive days’ newspapers
  • Three small packages of multicolored chocolates
  • Five small packages of peanut butter candies

Increasing Stocks Survey Look at yesterday’s newspaper business section. Conduct a hypothesis test to determine if the proportion of New York Stock Exchange (NYSE) stocks that increased is greater than the proportion of NASDAQ stocks that increased. As randomly as possible, choose 40 NYSE stocks and 32 NASDAQ stocks and complete the following statements.

  • H 0 : _________
  • H a : _________
  • In words, define the random variable.
  • The distribution to use for the test is _____________.
  • Calculate the test statistic using your data.
  • Calculate the p value.
  • Do you reject or not reject the null hypothesis? Why?
  • Write a clear conclusion using a complete sentence.

Decreasing Stocks Survey Randomly pick eight stocks from the newspaper. Using two consecutive days’ business sections, test whether the stocks went down, on average, for the second day.

  • H 0 : ________
  • H a : ________
  • Calculate the p value:

Candy Survey Buy three small packages of multicolored chocolates and five small packages of peanut butter candies (same net weight as the multicolored chocolates). Test whether the mean number of candy pieces per package is the same for the two brands.

  • What distribution should be used for this test?

Shoe Survey Test whether women have, on average, more pairs of shoes than men. Include all forms of sneakers, shoes, sandals, and boots. Use your class as the sample.

  • The distribution to use for the test is ________________.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute Texas Education Agency (TEA). The original material is available at: https://www.texasgateway.org/book/tea-statistics . Changes were made to the original material, including updates to art, structure, and other content updates.

Access for free at https://openstax.org/books/statistics/pages/1-introduction
  • Authors: Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Statistics
  • Publication date: Mar 27, 2020
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/statistics/pages/1-introduction
  • Section URL: https://openstax.org/books/statistics/pages/10-5-hypothesis-testing-for-two-means-and-two-proportions

© Apr 16, 2024 Texas Education Agency (TEA). The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

hypothesis testing for mean and proportion

Hypothesis Testing for Means & Proportions

  •   1  
  • |   2  
  • |   3  
  • |   4  
  • |   5  
  • |   6  
  • |   7  
  • |   8  
  • |   9  
  • |   10  

Learn More sidebar

All Modules

More Resources sidebar

Z score Table

t score Table

Here we presented hypothesis testing techniques for means and proportions in one and two sample situations. Tests of hypothesis involve several steps, including specifying the null and alternative or research hypothesis, selecting and computing an appropriate test statistic, setting up a decision rule and drawing a conclusion. There are many details to consider in hypothesis testing. The first is to determine the appropriate test. We discussed Z and t tests here for different applications. The appropriate test depends on the distribution of the outcome variable (continuous or dichotomous), the number of comparison groups (one, two) and whether the comparison groups are independent or dependent. The following table summarizes the different tests of hypothesis discussed here.

  • Continuous Outcome, One Sample: H0: μ = μ0
  • Continuous Outcome, Two Independent Samples: H0: μ1 = μ2
  • Continuous Outcome, Two Matched Samples: H0: μd = 0
  • Dichotomous Outcome, One Sample: H0: p = p 0
  • Dichotomous Outcome, Two Independent Samples: H0: p1 = p2, RD=0, RR=1, OR=1

Once the type of test is determined, the details of the test must be specified. Specifically, the null and alternative hypotheses must be clearly stated. The null hypothesis always reflects the "no change" or "no difference" situation. The alternative or research hypothesis reflects the investigator's belief. The investigator might hypothesize that a parameter (e.g., a mean, proportion, difference in means or proportions) will increase, will decrease or will be different under specific conditions (sometimes the conditions are different experimental conditions and other times the conditions are simply different groups of participants). Once the hypotheses are specified, data are collected and summarized. The appropriate test is then conducted according to the five step approach. If the test leads to rejection of the null hypothesis, an approximate p-value is computed to summarize the significance of the findings. When tests of hypothesis are conducted using statistical computing packages, exact p-values are computed. Because the statistical tables in this textbook are limited, we can only approximate p-values. If the test fails to reject the null hypothesis, then a weaker concluding statement is made for the following reason.

In hypothesis testing, there are two types of errors that can be committed. A Type I error occurs when a test incorrectly rejects the null hypothesis. This is referred to as a false positive result, and the probability that this occurs is equal to the level of significance, α. The investigator chooses the level of significance in Step 1, and purposely chooses a small value such as α=0.05 to control the probability of committing a Type I error. A Type II error occurs when a test fails to reject the null hypothesis when in fact it is false. The probability that this occurs is equal to β. Unfortunately, the investigator cannot specify β at the outset because it depends on several factors including the sample size (smaller samples have higher b), the level of significance (β decreases as a increases), and the difference in the parameter under the null and alternative hypothesis.    

We noted in several examples in this chapter, the relationship between confidence intervals and tests of hypothesis. The approaches are different, yet related. It is possible to draw a conclusion about statistical significance by examining a confidence interval. For example, if a 95% confidence interval does not contain the null value (e.g., zero when analyzing a mean difference or risk difference, one when analyzing relative risks or odds ratios), then one can conclude that a two-sided test of hypothesis would reject the null at α=0.05. It is important to note that the correspondence between a confidence interval and test of hypothesis relates to a two-sided test and that the confidence level corresponds to a specific level of significance (e.g., 95% to α=0.05, 90% to α=0.10 and so on). The exact significance of the test, the p-value, can only be determined using the hypothesis testing approach and the p-value provides an assessment of the strength of the evidence and not an estimate of the effect.

return to top | previous page | next page

Content ©2017. All Rights Reserved. Date last modified: November 6, 2017. Wayne W. LaMorte, MD, PhD, MPH

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

  • State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a  or H 1 ).
  • Collect data in a way designed to test the hypothesis.
  • Perform an appropriate statistical test .
  • Decide whether to reject or fail to reject your null hypothesis.
  • Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Table of contents

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

  • H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

hypothesis testing for mean and proportion

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

  • an estimate of the difference in average height between the two groups.
  • a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved August 13, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

hypothesis testing for mean and proportion

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

Lesson 6a: hypothesis testing for one-sample proportion, overview section  .

As mentioned before, methods of making inferences about parameters are either estimating the parameter or testing a hypothesis about the value of the parameter. In this lesson, we will introduce the concepts of hypothesis testing. Then we will discuss hypothesis testing for a population proportion. In the next Lesson, we discuss inference for the population mean.

  • Explain the concepts of hypothesis testing.
  • Set up hypotheses.
  • Perform hypothesis testing for a population proportion using the p-value approach and the rejection region approach.
  • Use a confidence interval to draw a conclusion about a two-sided test.

Blown out of Proportion? Housing Prices, Home Prices and New Tenant Rents

Housing services — which include tenants' rents and imputed homeowners' rents — continue to be a major contributor to inflation relative to the share of housing in personal consumption expenditure (PCE). As shown in Figure 1 below, rent and homeowners' imputed rents made up 33 percent of year-over-year PCE inflation in June, despite only accounting for 15.4 percent of the PCE spending basket. When considering the outlook for future housing services prices, economists have found it useful to examine the ratio of housing services prices to two other prices: home prices and rents for new tenants. In this week's post, we look at what these data might be indicating for the near-term outlook of housing services prices.

Figure 1: Contributions to Year-Over-Year PCE Inflation

Combined line and column chart showing how much different variables contributed to year-over-year PCE inflation since January 2017. The different variables being represented by columns are services, food, goods, energy, and housing.

Source: Author's calculations using Bureau of Economic Analysis data via Haver Analytics

Figure 2 below plots the ratio of housing services prices to home prices as measured by the national S&P CoreLogic Case-Shiller Home Price Index (CSHPI). The ratio's May reading of 0.42 is low relative to its long-run 1975-2024 historical distribution: 1.9 standard deviations below the long-run average of 0.59. Assuming this ratio returns to its pre-pandemic level, the latest low readings potentially indicate further upside risk for housing services prices. If homebuying prices remain unchanged, tenant and homeowners' imputed rents would have to increase for this ratio to return to pre-pandemic levels.

Figure 2: Ratio of Housing Services Prices to Case-Shiller Home Price Index

Line graph showing the ratio of housing services prices to the Case-Shiller Home Price Index since January 1975. Each recession is highlighted for awareness.

Source: Bureau of Economic Analysis, Standard & Poor's via Haver Analytics

In contrast, examining the ratio of housing services prices to new tenant rents hints at a more optimistic outlook for housing services price growth. (Note: An earlier Macro Minute post discusses the relationship between new rents and the rent measures captured in official inflation indexes .) In the latest data from the second quarter of 2024, the ratio of housing services prices to the new tenant rent index (NTRI) was 0.68 compared to a pre-pandemic ratio of 0.67 and a long-run (since 2005) average of 0.66. Assuming mean-reversion in this ratio and adjustment on the part of the housing services price index, this suggests housing services prices could potentially fall in the near term — although the difference between the latest reading and the pre-pandemic benchmark is small.

Figure 3: Ratio of Housing Services Prices to New Tenant Rent Index

Line graph showing the ratio of housing services prices to the new tenant rent index since the first quarter of 2005. Each recession is highlighted for awareness.

Source: Bureau of Economic Analysis, Bureau of Labor Statistics via Haver Analytics

When considering the overall balance of risks to the outlook, how does one decide between the upside risk to housing services prices noted in Figure 2 and the downside risk indicated in Figure 3? A key assumption we made in looking at both ratios was that both series tend to mean-revert. (In other words, they tend to return their long-run average over time.) We can judge which of the two ratios is more informative by testing whether they have demonstrated this property in the past.

One way to check a series for mean-reversion is to test whether the series is stationary (exhibiting a constant mean and variance over time), or non-stationary (not converging toward any particular level over time). Table 1 below reports the results of three statistical tests that evaluate whether a series exhibits non-stationarity. The Augmented Dickey-Fuller (ADF) and Phillips-Perron (PP) tests evaluate a null hypothesis that a series exhibits a kind of non-stationary behavior called a unit root. The Kiwatkowski-Phillips-Schmidt-Shin (KPSS) test evaluates the null hypothesis that a series is stationary.

Ratio of housing services prices to CSHPIRatio of housing services prices to NTRI
ADF testUnable to reject unit root hypothesis (p = 0.23).Reject unit root hypothesis (p = 0.04).
PP testUnable to reject unit root hypothesis (p = 0.88)Unable to reject unit root hypothesis (p = 0.50)
KPSS testReject stationarity hypothesis (p < 0.01).Unable to reject stationarity hypothesis (p > 0.10).
Source: Author's calculations

Table 1 shows that all three tests suggest that the ratio of housing services prices to the CSHPI is non-stationary and therefore does not exhibit mean-reverting behavior. In contrast, two of the three tests suggest the ratio of housing services prices to the NTRI may be stationary. Based on these results, economic forecasters might want to put more emphasis on the latter ratio when forming an outlook for housing services prices.

The challenge in dealing with ratios is that mean-reversion could be driven by either the numerator or the denominator. To identify which of the two price indexes is likely to be doing the adjustment, I estimate an error correction model for both series and use it to construct a five-year forecast. (Note: The model is described in greater detail in an earlier post that looks at the relationship between the CPI and the PPI .) The forecast is plotted below in Figure 4.

Figure 4: Forecast of Housing Services Prices and Ratio of Housing Services Prices to NTRI

Line graph comparing the forecast of housing services prices to the ratio of housing services prices to the new tenant rent index since the first quarter of 2005.  The forecast extends to the first quarter of 2029.

Source: Author's calculations using Bureau of Economic Analysis and Bureau of Labor Statistics data via Haver Analytics

The model forecasts that the process of mean-reversion in the ratio could be delayed, with the ratio of housing services prices to NTRI rising through the middle of 2025. The model also projects that the housing services price index will stall in the near term before resuming an upward trajectory similar to its pre-pandemic pace. This indicates that the projected near-term increase in the ratio is driven by declines in the NTRI through mid-2025. Thereafter, the ratio of housing services prices to NTRI is projected to decline toward its long-run average as NTRI growth outpaces growth in housing services prices.

These results show the trickiness of forecasting based on a narrative of mean-reversion in ratios: The process of mean-reversion doesn't always kick in immediately, and adjustments in ratios depend on the interplay of the numerator and the denominator.

Views expressed in this article are those of the author and not necessarily those of the Federal Reserve Bank of Richmond or the Federal Reserve System.

Phone Icon Contact Us

ORIGINAL RESEARCH article

Examining technical efficiency, prospects, and policies of farmers: data from a developing nation’s pineapple production.

Tumpa Datta

  • 1 Department of Agricultural Finance and Banking, Sylhet Agricultural University, Sylhet, Bangladesh
  • 2 Department of Agricultural Finance and Banking, Bangladesh Agricultural University, Mymensingh, Bangladesh
  • 3 Department of Agricultural Statistics, Sylhet Agricultural University, Sylhet, Bangladesh
  • 4 Upazila Land Office (Golapganj), Sylhet, Bangladesh

Introduction: The unique characteristics of pineapples as a perennial plant, which guarantee their quick proliferation and adoption in both the tropics and subtropics, readily justify their economic significance. Although pineapple is a popular tropical fruit among Bangladeshi citizens, they continue to produce fewer pineapples than other international producers with limited export offerings. Hence, the study aimed to estimate the technological efficiency, prospects, and policies of pineapple growers in the northeastern district of Bangladesh.

Methods: One hundred respondent growers were surveyed directly to gather cross-sectional data using a multistage sampling technique. The technical efficiency scores of individual farms were calculated using the stochastic frontier model with the technical inefficiency model for identifying factors responsible for inefficiency.

Results: The technical efficiency scores range from about two-thirds to the absolute efficiency level, with a mean technical efficiency above the ninety percent level. The technical inefficiency effect model interpreted that farmers’ age and education had a significant positive impact, whereas credit, training, and family size had a significant negative impact on inefficiency.

Discussion: Findings indicated that sampled farmers may use inputs more efficiently and raise their yield by nearly one-twentieth. Therefore, the study suggests that the government should concentrate on strategies to attract young growers, as they are more capable of managing resources effectively and willing to accept technological breakthroughs. The study’s conclusions have significant policy ramifications specifically in the areas of finance, education and skills, and rural development that the Government should consider to increase farmer’s productivity and overcome various challenges while upholding national interests and ensuring the farming sector’s continued prosperity. To commercialize pineapple production and establish Bangladesh as a prominent production zone, more research and development are needed.

1 Introduction

Bangladesh’s agriculture industry is a vital economic pillar. Its GDP contribution immediately following liberation in 1971 was approximately 60 percent. We all know that it is the most significant industry in Bangladesh in terms of GDP contribution, employment opportunities, and support for people’s livelihoods. Notwithstanding its considerable potential for employment in the rural labor force, its percentage of the GDP has declined over the past 10 years, from 17 percent in 2010 to approximately 11.66 percent in 2020 ( BBS, 2022 ; BER, 2023 ). Improved agricultural productivity and food and nutrition security have been made possible by the adoption of agriculture-friendly policies and strategies, despite the effects of decreasing arable land, rising population demands for food and nutrition, climate change, the Russia-Ukraine crisis, and the coronavirus epidemic. This sector overwhelmingly impacts primary macroeconomic objectives like employment generation, poverty alleviation, human resources development, and food security. To meet the future needs of the expanding population, the government is working tirelessly to adopt short-term, medium-term, and long-term action plans. One such plan is to increase the export of high-value crops like pineapple ( Figure 1 ) to build sustainable, safe, and profitable agricultural systems that ensure food security ( Shakil, 2023 ).

www.frontiersin.org

Figure 1 . Safe and sustainable impacts of pineapple farming. This chart shows the favorable correlation between an economy’s overall development and pineapple production. Similar to many other fruits, pineapple production is important for the socioeconomic advancement of the populace, particularly for Bangladesh’s marginalized growers in the northeast.

Because of the success of commercial farming, fruit production in the nation has quietly undergone a revolution during the last 20 years. Bangladesh has maintained an average annual rise in fruit production of 11.5 percent over the previous 18 years, placing it among the top 10 tropical fruit-producing nations in the world ( FAO, 2018 ; BBS, 2019 ). Pineapple is one of Bangladesh’s most important commercial fruit crops. Among all the fruits produced in the country, pineapple ranks 3 rd in total garden area under fruit cultivation after banana and mango ( BBS, 2016 ). Because of its chemical composition, it is the primary raw material utilized by the confectionary industries to produce food additives and domestic fruit juices ( Akhilomen et al., 2015 ). As a significant fruit crop, pineapple is widely farmed in several districts, including Tangail, Rangamati, Mymensingh, Gazipur, Chattogram, Khagrachari, Bandarban, Moulvibazar, Sylhet, and Dhaka ( Hasan et al., 2011 ). These districts employ a large number of women and farmers. Bangladesh’s most crucial pineapple fruit-growing areas are the Madhupur upazila of Tangail and the Sreemangal upazila of Moulvibazar. Based on an estimated 14,164 hectares of land, the nation produced 208,000 tons of fruit in 2020–21. The number of pineapples that grew in 2021–2022 increased from 0.469 million tons to 0.538 million tons, according to Department of Agricultural Extension (DAE) data. Due to the more extended winter season in 2021–22, the honey queen kind of pineapple, also known locally as Joldubi, which is grown throughout Bangladesh, particularly in the central part and the north-eastern division, has produced a higher yield than it did last year. While the yield of pineapples in 2022 was more significant than in previous years, the cost of production is also higher because more labor, water, and fertilizers are needed. Policymakers consider the pineapple industry among the top priorities for growing exports ( Shakil, 2023 ). The agriculture department is addressing the issue of using chemicals to ripen and increase the size of pineapples.

In addition to utilizing chemical fertilizers, some avaricious growers and dealers treat immature pineapples with excessive growth hormones to get them onto the market sooner. Producers have not yet been able to lower their production risk due to the cultural practices used for pineapple growth up to that point. Nevertheless, pineapple output in Bangladesh is still far too low to meet export demands in the European Union (EU) and the subregion, despite the country’s potential for growing this fruit. There are some reasons for this low pineapple production, including the scarcity of good suckers, the producers’ inability to grasp basic principles like traceability and the crucial quality standard for fruit meant for export, and the high cost of specific inputs that have a detrimental effect on fruit quality and preservation. The fact that animals and birds consume naturally ripened crops while still in the fields presents another challenge for growers. However, animals and birds will not consume the fruits if ripening hormones are sprayed on them ( Datta et al., 2023 ).

Furthermore, it is unfortunate that Moulvibazar lacks standards for the amounts and mixes of mineral fertilizers appropriate for pineapple crops, which would better protect the environment’s components—particularly the soils and water resources vulnerable to pollution or degradation. These standards would also be economically beneficial. Because of this, Sreemangal pineapple growers use different doses and types of mineral fertilizers according to their needs. This impacts the production costs and the quality of pineapples ( Shakil, 2023 ) because confident growers overuse chemical fertilizers. Given that producers nowadays do not always have access to precise inputs and quality releases, knowing the efficiency level of producers allows for the definition of strategies to drive interventions ( Fassinou et al., 2012 ). Investigating production potentials is necessary to match demand and supply with worldwide standards for pineapple, which requires improving productivity and quality. To prevent resource waste and, more importantly, to focus guidance on increasing the output of pineapple growers, the technical efficiency of pineapple producers must be evaluated.

As we all know, efficiency is a comparative indicator of a company’s ability to use inputs in a production process relative to other companies in the same industry. It can encompass all pertinent production parameters and represents actual farm performance. Understanding how sound resources are being used and what opportunities there are to increase productivity with the resources and technology already in place is crucial ( Ahluwalia, 1996 ). Even though inputs are necessary for productive production, farmers in poor nations like Bangladesh confront significant obstacles when obtaining inputs. To help better comprehend the new demands of Bangladesh’s agricultural sector, it appears imperative to ascertain the study with two specific research questions: firstly, whether the farmers are producing in a technically efficient manner. Secondly, are there any factors or lack thereof for the technical inefficiency of pineapple growers?

However, several studies in the world and Bangladesh ( Bakh and Islam, 2005 ; Begum et al., 2010 ; Islam and Sumelius, 2011 ; Alam et al., 2012 ; Polas, 2013 ; Sarker and Alam, 2016 ; Razzaq et al., 2019 ; Phrommarat and Oonkasem, 2021 ) have investigated the overall efficiency along with technical efficiency in different fields. Akter et al. (2020) explored the technical efficiency of pineapple in the Tangail district. Nevertheless, the currently available literature ignores the technological efficiency level of pineapple growers in Bangladesh’s northeastern region. Studying the effectiveness of livelihood-focused interventions in that area is extremely important because of their unique ecological characteristics.

Furthermore, rice used to be the main focus of Bangladesh’s previous food policies, with the goal being rice self-sufficiency. While rice continues to play a significant role in society, it is now necessary to create regulations and other tools that encourage crop diversification to supply a variety of food products and encourage the consumption of balanced, nutrient-rich diets.

As one of the fastest-growing agricultural subsectors, pineapples ( Figure 1 ) are high-value commodities that need to be prioritized to ensure proper use of the nutrients ingested ( FAO, 2018 ).

Therefore, multi-sectoral interlinked interventions focused on enhancing nutritional outcomes can be designed with the assistance of a new policy that is holistic and encourages the use of a “nutrition lens” to evaluate and prioritize various choices. Furthermore, a recently implemented policy that crosses the purview of more than a dozen ministries can offer an institutional framework under a single roof for ensuring sustainable production-consumption systems (SDG 12 of United Nations global goals) in the agricultural sector. The National Food and Nutrition Security Policy (NFNSP) of Bangladesh will benefit from the study’s results, which will also likely serve as a reference for creating and carrying out the country’s eighth and ninth five-year plans ( MoF, 2022 ).

1.1 Background for identifying clear research gap

Despite extensive research on the Technical Efficiency of Pineapple, there is a research gap in classic literature and specific context. The contextual gap emerged when we could not explore any existing studies directly related to the topic-the understanding of the technological effectiveness of Pineapple in rural farm populations within the Sylhet Division. Hence, there was an opportunity for us to investigate a completely new geographic area with its high-value fruit crop.

The bulk of the studies were completed worldwide on the efficiency of different products including pineapple which helped us generate ideas regarding study design and analysis. For instance, Oladapo et al. (2007) examined the market margin and spatial pricing efficiency of pineapple in Nigeria. In addition, the technical efficiency and its determinants in Garden egg ( Solanum Spp.) production were determined by Okon et al. (2010) in Uyo Metropolis, Akwa Ibom State, while Trujillo and Iglesias (2013) and Ghimire et al. (2023) employed the stochastic frontier approach for measuring the small pineapple farmers and lentil producers’ technical efficiency in Colombia and Nepal, respectively.

However, no study has been documented on the technical efficiency of pineapple production in the northeastern part of Bangladesh to the best of the author’s knowledge. Although numerous published works were identified on the various contexts of pineapple cultivation in different areas of Bangladesh ( Table 1 ), our attempt could help mitigate this type of contextual gap in the advanced research arena.

www.frontiersin.org

Table 1 . Previous studies on the different aspects of pineapple.

To improve the economic, social, and consumer welfare of rural farmers, policies must be developed to increase their output through fruit farming. Hence, the aims of this study are twofold to mitigate the current research gaps: (1) To estimate the technical efficiency of pineapple production using the Stochastic Frontier Model (SFM) and (2) To identify the factors influencing the technical efficiency level of pineapple producers. The findings will add to the body of knowledge in light of Bangladesh’s shifting agricultural landscape, make it easier for the relevant authorities to pursue the right policies, and remove obstacles to the successful implementation of sustainable food systems through the productive production of pineapples in rural areas.

2 Materials and methods

2.1 study area: north-eastern district of bangladesh.

The northeastern part of Bangladesh (Sylhet basin; Figure 2 ) contains the most commercially and ecologically significant hillocks with evergreen pineapple orchards, which are referred to as pineapple villages ( Jahid, 2023 ). Sreemangal is situated southwest of the Moulvibazar district at 24.3083°N 91.7333°E ( Banglapedia, 2023 ). Agriculture is the communities’ primary industry as this sector occupies 30.90 percent compared to other sources of income in the study area. The Sreemangal lies 17 m above sea level where the climate was warm and temperate. In winter there is much more rainfall than in summer. The average annual temperature is 24.7°C/76.5°F in Sreemangal. The annual rainfall is 2,420 mm. Precisely, Sreemangal is notable for its extensive tea and pineapple gardens as well as its continuous rain. The presence of lush trees has enriched its dynamic vegetation. The terraced tea gardens, plantations, pineapple gardens, and evergreen hills of Sreemangal are popular tourist destinations ( Seema et al., 2023 ).

www.frontiersin.org

Figure 2 . Location of the study area. The research area, also known as “Pineapple Village,” is depicted on the map along with its physical borders and pertinent information. Because of the favorable biological circumstances in these areas, the fruit can be profitably grown on underutilized land, hillocks, and yards.

Since the Pakistani era, this region has been well-known for growing a variety of fruits, including pineapple, in the villages of Bishamoni, Bhunabir, Ashidron, Radhanagar, Ramnagar, Balishira, Noorjahan, Doluchhara, Satgaon, and Mohajerabad ( Deshwara, 2015 ). The district’s 1,210 hectares of land are used for pineapple cultivation. On the other hand, this season in Sreemangal, about 500 hectares of land have been planted with three different varieties of pineapples ( The Daily Tribunal, 2023 ). Nonetheless, this high-value cash crop can be grown in the northeastern region of Bangladesh on laterite soils on hillslopes and sandy, loamy soils high in humus. According to NHB (2015) , it favors soils with a pH range of 5.0–6.0. Since Sreemangal’s soil, environment, and warm, humid weather are ideal for growing pineapple, many pineapples are grown here in the upper hills. This region can help Bangladesh’s millions build a sustainable socioeconomic existence by providing jobs, wholesome food, fuel, and fodder. The study is focused on the Sreemangal Upazila ( Figure 2 ) due to its natural richness and biodiversity.

2.2 Variables of data collection, sampling technique, and sample size

Data on socioeconomic characteristics, pineapple production activities (such as input and output costs), and other farm-specific variables were collected from the farmers and analyzed using the stochastic frontier model, which was developed by Aigner et al. (1977) and adopted by Tadesse and Krishnamoorthy (1997) and Taylor and Shonkwiler (1986) were used to estimate the technical efficiency of pineapple farmers ( Balogun et al., 2018 ). In analyzing the efficiency or inefficiency of farmers, the greatest output that can be produced from a particular set of inputs rather than the average of the actual relationship between farmers’ inputs and output is of relevance. According to the statement, given the state of technology, not all producers can use the minimal inputs necessary to generate the desired output. Producers do not continuously optimize their production functions, according to theory. Technically efficient producers are those who operate above the frontier production curve, and technically inefficient producers are those who operate below the frontier production curve.

Various works on technical efficiency in developing countries were reviewed to choose appropriate variables that best suit our research objectives before preparing the questionnaire. Among others, age (in years), Household head gender (male = 0, female = 1), Educational status (years of schooling), Occupation (1 = Farming, 2 = Business, 3 = Service, 4 = Farming + Business, 5 = Farming + Service, 6 = Others), Farming experience (in years), Family size (in Numbers), Farm category (Marginal = 1, Small =2, Medium = 3 and Large = 4), Access to credit (if taken credit = 1, otherwise = 0), Training (if received any training = 1, otherwise = 0) and Advisory services (if received any advisory services = 1, otherwise = 0) were included for technical efficiency analysis.

2.2.1 Research design

A multistage sampling procedure was used to sample 100 farmers. The first stage was the purposive selection of the Sreemangal Upazila from the other six pineapple-producing upazilas of Moulvibazar District because of its production potentiality ( DS, 2011 ). The second stage was the random selection of 4 unions (Ashidron, Satgaon, Sindurkhan, and Sreemangal) of the upazila from 9 union parishads based on their bumper contribution to pineapple production using the lottery method. In contrast, the last stage was the random selection of a total of 100 farmers from the respective unions through the random number table method.

2.2.2 Method of data collection and sample size determination

This study collected raw data from 100 respondents employing a semi-structured questionnaire during on-farm production. The data were collected from sampled pineapple farmers about their production technology and output for 2019–20. The final questionnaire contained different study aspects, including general information, land-holding information, socio-economic aspects, problems, and probable solutions. Pre-testing was done before the final survey, which was conducted from February to April 2020. After that, a simple random sampling procedure was adopted to select the desired sample size and was calculated by using the method suggested by Cochran (1977) to calculate a representative sample for proportions as:

Here, n is the required sample size.

n 0 is Cochran’s sample size computed for the ideal sample size.

N is the size of the population.

Z α 2 2 is the selected critical value of the desired confidence level 95 percent = 1.96 2  = 3.8416.

p is the estimated proportion of an attribute that is present in the population = 0.5, q = p −1 = 0.5.

d is the error term 5% as we considered the confidence interval 95 percent (d = 0.05).

Therefore, n 0 = 384.14 and n = 200.

Thus, based on the Cochran formula the sample size required was 200. For some limitations, the total sample size for the study was maintained at 100.

2.3 Analytical framework

The previous literature illustrates that the evaluation of technical efficiency employs two methods: a parametric approach and a non-parametric approach. The parametric approach enables econometric techniques, but the nonparametric approach is based entirely on mathematical techniques of Data Envelopment Analysis (DEA) used by Pakravan-Charvadeh and Flora (2022) and Pakravan-Charvadeh et al. (2022) in their research work identify nutrition efficiency. Previous studies have discussed and explained the advantages and shortcomings of both approaches ( Battese and Coelli, 1992 ; Bravo-ureta and Evenson, 1994 ; Battese and Coelli, 1995 ). The econometric technique is stochastic and splits the impact of random error from the inefficiency effect. The non-parametric technique combines the errors and is known as combination inefficiency. The econometric technique is parametric and controls the impact of misspecification of practical form through inefficiency. The non-parametric technique is not so liable for this description error. However, the literature reveals that the econometric technique is commonly used to assess the technical efficiency of firms ( Tchale and Sauer, 2015 ; Ali et al., 2019 ; Ndubueze-Ogarak et al., 2021 ). Accordingly, the econometric techniques were used in our study for Stochastic Frontier Analysis (SFA).

2.3.1 Theoretical foundations

Formalizing the yields’ response to various inputs is the primary driving force behind measuring manufacturing processes’ technical efficiency. The primary causes of the observed variances in this responsiveness are changes in the technology employed by the enterprises, variations in the efficiency levels of the production processes, and variations in the production setting. As a result, technological efficiency determines economic efficiency ( Adegbite and Adeoye, 2015 ).

Farrell (1957) methodology was a precursor to technical efficiency modeling since it presumes the presence of an efficient production-possibility frontier (PPF). According to this concept, the PPF indicates the highest production that can be obtained from a specific combination of productive elements. The gap between each firm’s production level and the PPF’s peak level is used to calculate technical inefficiency. Consequently, it is possible to compute the technical efficiency as a percentage of the sample’s highest productive production unit.

Two different methods could be used to estimate this production function. First, there are the non-parametric approaches, which stand out for their adaptability. They are less constrictive when it comes to the parameters that are applied to the reference technology and when modeling manufacturing processes that involve many products ( Trujillo and Iglesias, 2013 ).

Second, one can econometrically estimate a production function using parametric methods, allowing one to draw statistical conclusions from the estimation’s findings.

To apply these techniques, the dataset must be in a functional form. Using the latter approach, Aigner et al. (1977) and Meeusen and Broeck (1977) develop a stochastic production function to separate mistakes resulting from model misspecification from those that productive inefficiencies can explain. Determining the precise distribution of the error component and the production function’s functional shape is necessary for this differentiation.

The primary empirical source on the factors influencing technical efficiency in agriculture is Battese and Coelli (1995) . The main hypothesis put out by these writers is the combined estimation of a model that incorporates the factors influencing agricultural production inefficiency and the efficient frontier of agricultural production.

As Coelli and Battese (1996) demonstrated Indian farmers tend to be more productive when they are older, have larger farms, and have more education. Similarly, Tian and Wan (2000) discovered that the efficiency of rice production in China is positively impacted by education, farm size, and the implementation of different cropping techniques. Similarly, Villano and Fleming (2006) found that factors including age, education level, the percentage of adults in the household, and the amount of money generated by non-farm activities influence technical efficiency in a sample of farmers in Central Luzon, the Philippines.

On the other hand, Amaza and Olayemi (2002) suggested that greater levels of technical efficiency are attained in Nigeria if education, technical support, and crop diversification all rise.

Within the same continent, Essilfie et al. (2011) measured maize growers in Ghana’s technical efficiency at the farm level; their findings indicated that variations in age, sex, years of education, family size, and farmers’ off-farm income impact technical efficiency. In Latin America, limited research was conducted on the estimation of the level of technical efficiency in agriculture using the stochastic frontier approach; however, it is noteworthy to mention the study of Benoit-Cattin and Mendez (1996) focused on a group of small coffee farmers in Guatemala for revealing their efficiency level at using the existing technology; these authors concluded that both technical assistance & credit support sustain Guatemalan coffee plantations. In Brazil, Conceicao and Araujo (2000) found the TE of a sample of commercial farmers and discovered that experience plays a significant role in explaining these farmers’ technical efficiency.

Furthermore, Richetti and Reis (2003) study looks at the economic effectiveness of the productive resources used in the State of Mato Grosso do Sul’s soybean farming. They demonstrated how the methods of transferring technology to growers of this grain are connected to technical inefficiency. According to Santos et al. (2006) , there is a positive correlation between technological efficiency and the following factors in Chile: the size of the property, the distance from the main road, the age of the head of the family, and membership in a technology transfer group. Moreira et al. (2006) worked on a highly unbalanced panel dataset for a sample of small dairy farms in Southern Chile in a related investigation carried out in the same nation.

Both Taylor and Shonkwiler (1986) and Tadesse and Krishnamoorthy (1997) used the stochastic frontier model created by Aigner et al. (1977) . When analyzing a farmer’s efficiency or inefficiency, what matters is not the mean of the actual relationship between the farmer’s inputs and output, but rather the highest output that can be produced with a specific set of inputs. The statement suggests that, given the state of technology, not all producers can use the minimal number of inputs needed to generate the desired quantity of output. Producers do not always optimize their production functions, according to theory. Technically efficient producers are those who operate on the production frontier; technically inefficient producers are those who operate below the frontier production curve.

Nevertheless, studies on the technical efficiency of production that employ the parametric method are scarce regarding pineapple production. Chen et al. (2001) conducted a remarkable study wherein the technical efficiency of 83 pineapple farms in China was measured. The study concluded that manpower is the most crucial aspect of production. In a similar vein, Adinya et al. (2010) established the inefficiency of Nigerian pineapple production. Their findings suggested that achieving higher production efficiency is significantly impacted by the farmers’ educational attainment.

Eventually, two types of functions namely: Cobb–Douglas and Translog dominate the technical efficiency literature. In the case of a lower sample size, the Translog specification might not be representative. The stochastic frontier production model is employed to determine respondents’ technical efficiency.

The Cobb–Douglas (CD) is the appropriate form of the frontier production function. The production technology is assumed to be characterized by the Cobb–Douglas production function since it has the advantage over other forms of production functions like the Linear and Semi-log production functions in that a logarithmic transformation provides a model that is linear in the log of input and hence, easily used for econometric studies ( Coelli, 1995 ). This production function gave the best fit to data compared to the linear, exponential, and semi-log functional forms ( Akhilomen et al., 2015 ).

In addition to the above, the CD functional form is mainly preferred because its coefficients directly represent the elasticity of production. It provides an adequate representation of the production process as we are interested in an efficiency measurement and not an analysis of the production structure ( Taylor and Shonkwiler, 1986 ). Further, the CD functional form has been widely used in farm efficiency analyses. It is an adequate representation of the data ( Abedullah and Bakhsh, 2006 ).

2.3.2 Stochastic frontier analysis

SFA, which is also known as a composed error model, was developed initially by Aigner et al. (1977) and Meeusen and Broeck (1977) . Supposing an appropriate production equation, we described the stochastic production frontier Eq. 1 below:

Where Y i , yield produced by i th pineapple grower; X i , inputs for the pineapple by i th growers; β , parameters of study; ε i , collected unsystematic errors; ε i  =  ν i – u i , ν i is symmetric (− ∞  <  ν i  <  ∞ ) and shows those random errors, such as climate change or other natural disasters, which are out of the farmer’s control in the Eq. 2 .

It is expected that ν i is identically and independently distributed as N (0, σ 2 v ; Gujarati, 2003 ). Farm-specific technical inefficiency is denoted by u i . On the other hand, it shows the gap of output ( Yi ) and its maximum possible output assumed by the SFA [f( X i , β ) +  ν i ] ( Aigner et al., 1977 ). u i arises from N (0, σ 2 v ) and is half normally distributed below 0 ( Kumbhakar and Lovell, 2000 ). The terms ν i and u i are always independent of the input factors X i .

2.3.3 Stochastic frontier model specification

The SFA model was used to estimate the technical efficiency of pineapple production. This technique specifies the effect of technical inefficiency that cannot be controlled by pineapple growers. The Cobb–Douglas Production function is suitable for estimating technical efficiency in our study, due to its advantages of easing interpretation and estimation. In addition, the elastic functional form solves the difficulty of multi-collinearity. We can express the SFA Eq. 3 for the analysis as below:

Where Y i , yield of pineapple in kilograms per ha; i th , Number of farmers up to 100; j th , Number of variables up to 8; X 1 , Area under pineapple cultivation (ha); X 2 , Quantity of seedling (piece/ha); X 3 , Lime (kg/ha); X 4 , Urea (kg/ha); X 5 , Triple Superphosphate (TSP) (kg/ha), X 6 , Muriate of Potash (MoP; kg/ha); X 7 , Hormone (ml/ha); X 8 , Human labor (man-days/ha); ε i , Error (composed error term); ln, natural logarithm; β 0 , Intercept of the model; β j , equation parameters.

2.3.4 Estimation of the stochastic frontier model

The Maximum Likelihood Estimation (MLE) technique was employed to estimate the SFA ( Greene, 2000 ). The basic idea of the maximum likelihood principle is to choose the parameter estimates (β, σ 2 ε ) to maximize the probability of obtaining the data:

Where σ 2 v and σ 2 u are the variances in the equation for v and u, respectively; further, σ 2 ε  = σ 2 v  + σ 2 u , and γ = σ u /σ v .

The MLEs of β, γ, and σ 2 ε at which the value of the likelihood function is the maximum were obtained by setting the first-order partial derivatives for β, γ, and σ 2 ε as equal to zero and solving these non-linear equations simultaneously in the Eqs. 4 , 5 . It can be estimated by using a non-linear optimization algorithm to find the optimal values of the parameters.

2.3.5 Equation of technical inefficiency estimation

In the model specification of technical efficiency estimation, it is expected that random v i is normally distributed as N (0, σ 2 v ), whereas u i is half normally distributed as N (0, σ 2 u ).

Where U i denotes the specific technical inefficiency of pineapple yield;

Z 1  = Age of the pineapple farmer (in years).

Z 2  = Education of the pineapple farmer (in years of schooling).

Z 3  = Dummy variable for credit taken from any source, e.g., Banks, NGOs (Non-governmental Organizations) only for cultivating pineapple (1 for yes and 0, otherwise).

Z 4  = Dummy variable for training on pineapple farming participated by the pineapple farmer (1 for yes and 0, otherwise).

Z 5  = Family size in number.

δ j  = Parameters of the respective technical inefficiency variable to be estimated.

(j = l, 2,............... 5)

W i  = Random error term that is defined by the truncation of the normal distribution (With zero mean and variance, σ w 2 ).

2.3.6 Estimation of technical efficiency and technical inefficiency of individual pineapple growers

The following formula Eq. 7 is applied to estimate the Technical Efficiency (TE) of pineapple growers:

Where Y i , observed yield of i th pineapple grower; Y i * , frontiers yield of i th pineapple grower that is obtained; TE i , technical inefficiency of i th pineapple grower in the range of 0 to 1.

To obtain the result of Technical Inefficiency (TI) of individual pineapple growers, the Eq. 8 below was employed

Where TI i  = 1- (Y i /Y i * ), TI i , technical inefficiency of i th pineapple growers in the ranges of 0 to 1 Eq. 8 .

2.3.7 Hypothesis testing

It would be reasonable to fit an inefficiency effect model like ( Eq. 6 ), if the inefficiency effects are significant, stochastic, and have a particular distribution specification.

Hence, according to Battese and Coelli (1995) , we ought to test the following null hypotheses:

The generalized likelihood ratio test was to be used to examine the null hypotheses. To get the log-likelihood values under the above three null hypotheses, i.e., from Eqs. (9–11) , three different models were fitted. The model under the null hypothesis Eq. (9) is the traditional mean response function where output or cost is regressed up on the respective predictor variables given in stochastic production or cost frontier assuming zero inefficiencies. That is, only statistical noise makes up the random component; the inefficiency effect model was discarded. The degrees of freedom under this null hypothesis was 7, since γ = 0, implying σ u 2  = 0 and δ j ’s (j = 0,1,2, …, 5) are equal to zero. Similarly, the model under the null hypothesis Eq. (10) is also a conventional mean response function where the farm-specific variables such as age, education, credit, training on farming, and family size were also included in the model as independent variables with the input variables. In this case, only the inefficiency component (i.e., u i ) is dropped from the original model which means that parameters γ, σ u 2 and δ 0 are equal to zero. Hence, the degree of freedom under this null hypothesis was 3. Again, the model under the null hypothesis Eq. (11) is the original stochastic frontier model as it is, but the inefficiency effect model will be spilled. That is, dropping δ j ’s parameters (where, j = 1, 2, …, 5), the values of γ, σ u 2 and δ 0 were to be calculated. Hence, the degrees of freedom under this null hypothesis are equal to the number of farm-specific variables dropped (i.e., 5 ).

Now the value of the log-likelihood function calculated at the model under H 0 needed to be close to the value calculated at the original model for the null hypothesis to be accepted. A measure of this familiarity was the likelihood ratio statistic under the above null hypothesis is

Where λ follows a chi-square distribution with degrees of freedom equal to the number of parameters assumed to be zero in the null hypothesis (H 0 ) provided that H 0 is correct and L(H o ) is the log-likelihood value under H 0 and L(H 1 ) is the log-likelihood value under H 1 in the Eq. (12) . Additionally, the renowned Student’s t-test was used to perform individual significance tests on the parameters in both the stochastic frontier model and the inefficiency effect model.

3 Results and discussion

3.1 stochastic production frontier estimates.

Table 2 displays the findings of maximum likelihood estimates for the pineapple technical inefficiency effect model and stochastic frontier production function. We discovered that area, lime and labor input would be sequentially correct based on their coefficients values all had statistically significant negative coefficients, meaning a negative impact on technical efficiency, with the values −0.011, −0.019, and −0.088, respectively. Small farms motivated family work and large farms’ tardiness in timely completing farm operations during crucial times may be responsible for the negative correlation between area and production. This conclusion coincides with those of Samarpitha et al. (2016) where the area under rice crops negatively impacted the technical efficiency (−0.0114), suggesting that small and marginal farms were technically more efficient than their bigger counterparts. So, the integration of rural industries is a crying need to improve agricultural productivity ( Ye et al., 2023 ). On the other hand, the case of labor input may be linked to ineffective labor management in the study area and a declining rate of return on labor. The negative coefficients of labor input indicate that every single unit increase in the amount of labor per hectare leads to the rise of farmers’ inefficiency level by 0.088 units. This might be due to the lack of workers’ specialization in pineapple farming with less labor productivity nature in the study area. To ensure the positive impact of labor input employed for certainly we ought to give more focus on labor productivity rather than the labor supplies.

www.frontiersin.org

Table 2 . Empirical results of the stochastic frontier production model.

Ghimire et al. (2023) also found the coefficient of labor was 0.066 and was negatively significant denoting possible negative change by 6.6% in aggregate output of lentils as a result of unit man-days increment in labor use. These unexpected outcomes would be overcome with suggestions provided by Hasan et al. (2022) and Alam et al. (2019) . The acreage of pineapple production explained the levels of technical efficiency to a significant degree as reported by Trujillo and Iglesias (2013) .

The primary input in the production of pineapples was seedlings (sucker), for which the elasticity was determined to be 0.998 at a 1 percent significant level. This suggests that a 1 percent increase in seedling pieces per hectare is positively connected to a 0.998 percent improvement in farm efficiency. This positive relationship with output conforms to a priori expectation that is supported by Ghimire et al. (2023) where they also observed the coefficient of seed input was positively significant indicating unit increase in seed quantity will lead to an increase in lentil production by 37.6 percent. According to Balogun et al. (2018) , as a management strategy for free movement and simple weeding, poor spacing of pineapple suckers after planting may have contributed to the positive coefficient of suckers’ substantial relationship with output. Leaders in the agriculture sector risk severe repercussions if they willfully ignore seed quality, such as increased produce prices, food insecurity, and a collapse in farm revenues. Considering the significant risks involved, it makes sense to spend money on premium seeds. Otherwise, this may be the reason that most of the farmers will use local seeds that may come up with poor germination and plant vigor ( Ghimire et al., 2023 ). Additionally, we should get ready for financial instability to cascade down the supply chain from the farm to the consumer. The implication is that this will shorten production times while improving the quality, yield, and consistency of the finished crop.

3.2 Determinants of technical inefficiency among pineapple farmers

The lower part of the table enlisted the parameters of the technical inefficiency model. A negative value of parameters of the technical inefficiency variable presages a negative impact on technical inefficiency and vice-versa . Thus, the sign of the ‘δ’ parameters in the inefficiency effect model was expected to be negative which will impact the technical efficiency positively. The sigma square (σ 2 ) is 0.025 of the estimated models was statistically significant at a 1 percent level of probability. This indicated a good fit of the distributional form assumed for the composite error term. The gamma (γ), which measures the dominance of the inefficiency effect over random error, with a value of 0.984 at a 1 percent level of significance implies that 0.984 percent of the variation in the output was attributed to technical inefficiency. The result shows that three variables credit, training, and family size had a significant impact on technical inefficiency with a negative coefficient. In the case of credit, the coefficient −0.066 at a 10 percent level of significance describes that technical inefficiency decreases with the increase of credit availability and utilization in pineapple production. This finding was consistent with the work of Balogun et al. (2018) , Haq (2013) and Amoah et al. (2014) , which showed a positive association between credit and input use and farm productivity. Again, at a 1 percent statistical level, farmers who received training conversely impact the technical inefficiency with the parameter coefficient of −1.034. It describes that farmer who had taken training tend to be less inefficient than their counterparts without having training. Trained farmers could implement improved production technology and had contact with the extension agents that led to efficient production for them. The result agrees with Balogun et al. (2018) and Akhilomen et al. (2015) . On the contrary, Mussa (2011) used family size in the inefficiency effect model and found a positive impact on technical efficiency that coincides with our result. The reason may be that more family members ensure the availability of more family labor and can carry out important agricultural practices timely thus improving efficiency. It follows that a farmer who interacts with extension agents more often will be able to recognize and implement new farming methods more successfully than a farmer who interacts with them less. Since efficiency and a nation’s economic development are directly correlated, over time, economic growth is increased by more efficient businesses, higher input productivity, and increased economic activity. Every gain in efficiency usually results in a decrease in the cost of manufacturing ( Naseer et al., 2020 ). Consumer prices for goods and services might go down when production costs are low. Technical efficiency is therefore necessary for long-term economic growth.

3.3 Farm efficiency level

The frequency distribution of the technical efficiency estimates for pineapple production is shown in both Table 3 and Figure 3 . It is observed that technical efficiency varies from 61.57 to 99.33 percent for pineapple growers. The mean technical efficiency of pineapple farming is 94.37 percent in the study area. It indicates that on average a pineapple farmer in the study area still had the capability of increasing technical efficiency by 5.63 percent, using the available resources and technology to achieve the frontier output level. The frequency distribution of the technical efficiency implies that 85 percent of farmers had TE above 0.9, and only 4 percent is less than 0.71 ( Figure 3 ). More precisely, in the study area, maximum farmers were found to operate at an 85 percent efficiency level. The result shows consistency with persistent heritage in the study area related to good agricultural practices, in particular for pineapple production. Eventually, a slight improvement in technical efficiency will lead to a good cost-efficient direction, which will help to enhance the farmer’s profit ( Ghimire et al., 2023 ).

www.frontiersin.org

Table 3 . Frequency distribution of technical efficiency estimates from C-D stochastic frontier production function.

www.frontiersin.org

Figure 3 . Technical efficiency scores of the individual pineapple farmers. The frequency distribution of technical efficiency estimates from the C-D stochastic frontier production function is demonstrated with percentages in this bar chart. Using the results, we may suggest suitable policies based on the nature of each farmer and their farm enterprise.

3.4 Tests of hypotheses on the parameters of the technical inefficiency model

Generalized likelihood-ratio tests of null hypotheses that the technical inefficiency effects are absent are presented in Table 4 . The first null hypothesis, which specified that the inefficiency effects were absent from the stochastic production frontier model, was strongly rejected. The second null hypothesis, which specified that the inefficiency effects were not stochastic for the production frontier model, was also strongly rejected. The third null hypothesis, considered in Table 4 , specifies that the inefficiency effects of stochastic production were not a linear function of age, education, credit, training, and family size. This null hypothesis was also strongly rejected at a 5 percent level of significance. This indicates that the joint effect of these five explanatory variables on the inefficiency of production was significant although the individual effect of one or more variables might not be statistically significant. The inefficiency effect in the stochastic production frontier was stochastic and was not uncorrelated to age, education, credit, training, and family size. Thus, it appears that, in this application, the proposed inefficiency stochastic production frontier model was a significant improvement over the corresponding stochastic frontiers which do not involve a model for the technical inefficiency effects.

www.frontiersin.org

Table 4 . Tests of hypotheses on the parameters of the technical inefficiency effect model.

3.5 Summary findings of the farm-level prospects

This study explored the future potentiality of fresh pineapples both in raw and processed form in Bangladesh ( Figure 4 ). We are well-informed that pineapple has huge demand both locally and internationally for its several types of uses. Its consumption is increasing day by day because of its different nutritional and medicinal values ( Shakil, 2023 ). Since exporting fresh pineapples is difficult because of their perishable nature, growers suggested that we focus on exporting canned pineapples. Besides, pineapple leaf fine fiber is already used for making different types of products (clothes, rope) in some countries due to its organic nature ( Pandit et al., 2020 ). In the study area, most of the pineapple cultivators are mainly small and marginal-level farmers, they sell their product through trade agents at the village level, and a major share of pineapple marketing is run by this system in Bangladesh. The farmers reported that if it can be marketed through proper marketing channels then these juicy and nutritious pineapples are sure to earn huge amounts of foreign currency through export.

www.frontiersin.org

Figure 4 . Potentials of a pineapple tree from growers’ perspectives. According to the farmers, there are both qualitative and quantitative advantages to a nation’s pineapple production. It raises farm families’ standards of life and produces long-term profit as it is a perennial crop. The figure depicts the direct contribution that pineapple cultivation and consumption have made to the development of a sustainable agricultural industry for the coming generations.

As pineapple production is increasing every year in Bangladesh, it will be easy to establish a potential export site by utilizing the huge amount of leaf wastage. This leaf is the main raw material for pineapple leaf fiber production. Despite having an efficiency in Bangladesh in cheap labor costs the government provides various incentives for export promotion so natural and organic pineapple leaf fiber can be used to make products rather than synthetic fiber.

It is already known that the European market is the main target of many pineapple exporting countries ( Untoro et al., 2021 ). As Bangladesh has been given different trade facilities by the European Union till now so entering that giant market is quite easy for us. Many Bangladeshis live in different countries of the Middle East and there is a demand for local fruits. Many of the local companies of Bangladesh have strong competitiveness in those markets, strongly similar to the findings of Untoro et al. (2021) . So, investing in the canned pineapple market will be beneficiary for Bangladesh. Based on the available raw materials and potential production with a low production cost, there is a high production possibility of canned pineapple and pineapple leaf fiber in the study area.

Moreover, pineapple waste which is rich in fiber can be used as an energy source as well as a good digestive feed for animals such as poultry, broilers, and cows ( Figure 4 ). Buliah et al. (2019) also found that feeding dairy cows with pineapple waste can increase milk production due to an increase in digestion rate.

In addition, the pineapple leaf fiber extraction and weaving industry along with the high demand for canned pineapple ensures the involved people can have income and employment opportunities needed to substantially improve their lives ( Figure 4 ). This will lead to huge potential and economic rewards for indigenous weavers. By improving the lives of their families and their communities, the country can benefit from this. Extraction of fiber, weaving, and embroidery jobs enable them to earn money that allows them to remain at home. By joining this sector, they do not need to join any hazardous and non-prestigious jobs (such as brick fields, rice mills, or domestic workers) to earn money. The qualitative field-level research findings are represented through the following chart:

Since Bangladesh has only a few actual and potential export sectors it is expected that this study will contribute to identifying another potential income and export-earning sector of Bangladesh. We hope academically that the findings of this study will encourage many entrepreneurs and established businessmen to engage in this business.

4 Conclusion and recommendations

Pineapple is one of the few fruits with adequate vitamins and improved productivity in Bangladesh. Efficiency in its production is crucial to food security, poverty abatement, and lessening vitamin-related deficiency among the masses (particularly children) in the state. The study measured the level of technical efficiency and identified the factors influencing the technical inefficiency of pineapple production using the Cobb–Douglas stochastic frontier approach. The study concludes that technical inefficiency was present in pineapple production. The average technical efficiency was estimated as just below 95% across the study area meaning that farmers had been operating their farms below the production frontier. So, the results indicated that there is still scope for a remarkable improvement in technical efficiency in pineapple production with the remaining level of input supply to obtain the optimum level of output efficiency. The efficiency model indicated that seedling is a significant primary input for pineapple farmers’ efficiency. Therefore, if maximum production is to be achieved in the pineapple industry, the quality and optimum quantity of seedlings must be ensured.

The technical inefficiency factors model revealed that the efficiency of farmers was significantly influenced by demographic and institutional factors, such as age, education, training, credit, and family size. The age and education level of farmers had a significant positive influence on technical inefficiency, this positive coefficient indicates that technical efficiencies were significantly higher for the younger farmers, compared to old age groups, and informally educated farmers. On the other hand, credit, training, and family size had a significant negative impact on technical inefficiency, which implies that technical inefficiency will be reduced significantly by increasing access to formal credit and training along with the intervention of productive family members.

Regarding the findings, the study makes the following recommendations: (a) to increase productivity and reduce unemployment and poverty in the district and the nation, the government should create an atmosphere that encourages more young people to work in the pineapple industry. This is due to the following reasons: (i) Results showed that more human work led to lower production; (ii) Youth recruitment into agriculture is crucial because they are likely to be able and willing to accept contemporary technical advancements and use resources perfectly. Farmers need to understand that increasing worker productivity is more important than limiting production with hired labor; (b) Since farmers are not receiving inputs at the government rate, public interventions may be necessary to guarantee that high-quality inputs are reasonably priced. Furthermore, farmers stated that contaminated fertilizers caused them harm. Therefore, it is important to increase public awareness of the need to maintain fertilizer quality at both the local and rural levels; (c) The government should also prioritize education and extension services by bolstering the Department of Agricultural Extension (DAE), establishing farmers’ training centers, formal and informal farmers’ education, and technical and vocational schools; (d) In addition, financial assistance on favorable terms must be given to the farmers who grow pineapples. Eventually, policymakers ought to make some necessary amendments to their policies to make production inputs available to respective farmers at the right time, in the right quantity, and at affordable prices.

It is undeniable that Bangladesh has demonstrated remarkable growth and development, particularly during periods of heightened global unpredictability. Bangladesh went from being one of the world’s poorest countries at birth in 1971 to having a lower middle class in 2015 ( The World Bank, 2023 ). Notwithstanding these successes, the economy continues to face significant obstacles, including growing inflationary pressure, energy scarcity, a deficit in the balance of payments, and a lack of revenue. The trade deficit shrank in Fiscal Year, FY23, while the balance of payments (BoP) deficit and foreign exchange reserves decreased as a result of a contraction in the financial account deficit. In a cutthroat commercial climate, technically proficient and prosperous pineapple growers may write a wonderful tale of poverty alleviation and job creation that would guarantee even a minor contribution to reaching upper-middle-class status by 2031. To maximize productivity given available resources and technological capabilities, it will be necessary to develop human capital, a skilled labor force, effective infrastructure, and a governmental climate that encourages private investment.

The results of this study, notwithstanding its focus on Bangladesh’s northeast, will significantly impact the 12th Sustainable Development Goal (SDG 12 of UNDP) by guaranteeing a more cost-effective and sustainable agri-food production and consumption system.

5 Limitations of the study and further research

Even though this study produced some important findings, it is important to recognize several limitations when evaluating the data. First off, the current study only looked at a small number of parameters to investigate their direct impact on pineapple growers’ technical inefficiency. Future studies in this area should take into account the effects of external variables on pineapple production, such as infrastructure, government regulations, market conditions, and climate change. Second, the paper only considers pineapple production in Bangladesh’s northeastern district, which might not be enough to allow the findings to be applied more broadly. Therefore, to strengthen the relevance and applicability of the study’s findings, future studies should be conducted in this direction including a wider variety of settings and circumstances. The use of cross-sectional data, which makes it difficult to capture the effects over time, and the reliance on farmers’ self-reported measures, which could lead to inaccurate predictions, are two other limitations. Time series data and panel data ( He et al., 2023 ) should be used in this field of study to provide an accurate picture.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent from the [patients/ participants OR patients/participants legal guardian/next of kin] was not required to participate in this study in accordance with the national legislation and the institutional requirements.

Author contributions

TD: Conceptualization, Resources, Investigation, Writing – original draft, Writing – review & editing. JKS: Supervision, Writing – review & editing. MAR: Supervision, Writing – review & editing. KMMA: Writing – review & editing. KA: Formal analysis, Writing – review & editing. AC: Writing – review & editing. MSA: Visualization, Writing – review & editing, Conceptualization.

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This research work was funded by the Ministry of Science and Technology (MoST), Government of the People’s Republic of Bangladesh. The funder has no specific role in the conceptualization, design, data collection, analysis, decision to publish, or preparation of the manuscript.

Acknowledgments

The authors would like to thank the data enumerators for their invaluable support throughout this study. We are also grateful to the study area’s agricultural extension personnel for their technical expertise and administrative assistance. This work would not have been possible without the generous financial support of the MoST. Finally, we would like to thank the study participants for their kind contributions of time and insights.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Abedullah, K., and Bakhsh, A. (2006). Technical efficiency and its determinants in potato production: evidence from Punjab. Pakistan. Lahore J. Econ. 11, 1–22. doi: 10.35536/lje.2006.v11.i2.a1

Crossref Full Text | Google Scholar

Adegbite, O., and Adeoye, I. B. (2015). Technical efficiency of pineapple production in Osun state, Nigeria. Agris Econ. Info. National Horticultural Res. Institute 7, 3–12. doi: 10.7160/aol.2015.070101

Adinya, I., Afu, S., and Ijoma, J. (2010). Economic meltdown and decline in pineapple production: determinants of production inefficiency of pineapple-based alley cropping practices in Cross River state, Nigeria. J. Animal and Plant Sci. 20, 107–116.

Google Scholar

Ahluwalia, M. S. (1996). New economic policy and agriculture: some reflections. Indian J. Agricul. Econ. Indian Society Agricul. Econ. 51, 412–426. doi: 10.22004/ag.econ.297476

Aigner, D., Lovell, C. A. K., and Schmidt, P. (1977). Formulation and estimation of stochastic frontier production function models. J. Econ. 6, 21–37. doi: 10.1016/0304-4076(77)90052-5

Akhilomen, L. O., Bivan, G. M., Rahman, S. A., and Sanni, S. A. (2015). Economic efficiency analysis of pineapple production in Edo state, Nigeria: a stochastic frontier production approach. J. Experimental Agricu. Int. 5, 267–280. doi: 10.9734/AJEA/2015/13488

Akter, K., Majumder, S., Islam, M. A., and Sarker, B. (2020). Technical efficiency analysis of pineapple production at Madhupur Upazila of Tangail District, Bangladesh. Asian Res. J. Arts Soc. Sci. 12, 32–42. doi: 10.9734/arjass/2020/v12i230187

Alam, M. F., Khan, M. A., and Huq, A. S. M. A. (2012). Technical efficiency in tilapia farming of Bangladesh: a stochastic frontier production approach. J. European Aquaculture Society 20, 619–634. doi: 10.1007/s10499-011-9491-3

Alam, M. A., Sarker, M. A., Hoque, M. J., and Khan, M. S. H. (2019). Use of agrochemicals in pineapple farming: a case study from Madhupur Forest areas of Bangladesh. J. South Pacific Agricul. 22, 10–16.

Ali, S. M. Y., Ahiduzzaman, M., Hossain, M. M., Ali, M. A., Biswas, M. A. M., Rahman, M. H., et al. (2015). Physical and chemical characteristics of pineapples grown in Bangladesh. Int. J. Business, Soc. Scientific Res. 3, 234–246.

Ali, I., Xue-xi, H. U. O., Khan, I., Ali, H., Baz, K., and Khan, S. U. (2019). Technical efficiency of hybrid maize growers: a stochastic frontier model approach. J. Integr. Agric. 18, 2408–2421. doi: 10.1016/S2095-3119(19)62743-7

Amaza, P., and Olayemi, J. (2002). Analysis of technical inefficiency in food crop production in Gombe state. Nigeria. App. Econ. Letters 9, 51–54. doi: 10.1080/13504850110048523

Amoah, S. T., Debrah, I. A., and Abubakari, R. (2014). Technical efficiency of vegetable farmers in Peri-urban Ghana influence and effects of resource inequalities. American J. Agricul. Forestry 2, 79–87. doi: 10.11648/j.ajaf.20140203.14

Bakh, M. E., and Islam, M. S. (2005). Technical and allocative efficiency of growing wheat in northwest districts of Bangladesh. Bangladesh J. Agricul. Econ. 28, 73–83. doi: 10.22004/ag.econ.200223

Balogun, O. L., Adewuyi, S. A., Disu, O. R., Afodu, J. O., and Ayo-Bello, T. A. (2018). Profitability and technical efficiency of pineapple production in Ogun state. Nigeria. Int. J. Fruit Sci. 18, 436–444. doi: 10.1080/15538362.2018.1470594

Banglapedia (2023). National Encyclopedia of Bangladesh. Sreemangal Upazila. Accessed on 15 May, 2024. Available at: https://en.banglapedia.org/index.php/Sreemangal_Upazila

Battese, G. E., and Coelli, T. J. (1992). Frontier production functions, technical efficiency, and panel data: with application to Paddy farmers in India. J. Prod. Anal. 3, 153–169. doi: 10.1007/BF00158774

Battese, G. E., and Coelli, T. J. (1995). A model for technical inefficiency effects in a stochastic frontier production function for panel data. Empir. Econ. 20, 325–332. doi: 10.1007/BF01205442

BBS (2016). Report on the productivity survey of pineapple crop . Dhaka, Bangladesh: Bangladesh Bureau of Statistics, statistics and informatics division (SID). Ministry of Planning, government of the People’s republic of Bangladesh.

BBS (2019). Yearbook of agricultural statistics. 31 st series . Dhaka, Bangladesh: Bangladesh Bureau of Statistics, statistics and informatics division (SID). Ministry of Planning, government of the People’s republic of Bangladesh.

BBS (2022). Report on the productivity survey of pineapple crop . Dhaka, Bangladesh: Bangladesh Bureau of Statistics, statistics and informatics division (SID). Ministry of Planning, government of the People’s republic of Bangladesh.

Begum, A. M., Imam, M. F., and Alam, M. A. (2010). Measurement of productivity and efficiency of potato production in two selected areas of Bangladesh: a Translog stochastic frontier analysis. Progress. Agric. 21, 233–245. doi: 10.3329/pa.v21i1-2.16780

Benoit-cattin, M., and Mendez, J. C. (1996). Determinación Paramétrico-Estocástica De La Eficiencia Técnica De La Producción De Café De Los Pequeños Productores en Guatemala. Investigación agraria. Economía 1, 117–138.

BER (2023). Agriculture, chapter seven, Bangladesh economic review, finance division, Ministry of Finance . Dhaka, Bangladesh: Government of the People’s Republic of Bangladesh.

Biswas, P., and Nishat, S. A. (2019). Production and export possibility of canned pineapple and pineapple leaf Fiber in Bangladesh. IOSR J. Business Manag. (IOSR-JBM) 21, 17–23. doi: 10.9790/487X-2109041723

Bravo-ureta, B. E., and Evenson, E. (1994). Efficiency in agricultural production: the case of peasant farmers in eastern Paraguay. Agric. Econ. 10, 27–37. doi: 10.1111/j.1574-0862.1994.tb00286.x

Buliah, N., Jamek, S., Ajit, A., and Abu, R. (2019). Production of dairy cow pellets from pineapple leaf waste. AIP Conference Proceedings 2124, 01–06. doi: 10.1063/1.5117108

Chen, T., Yang, S., and Pan, T. (2001). A study on the technical efficiency of pineapple arms in Taiwan. J. Agricul. Res. China 50, 88–97.

Cochran, W. G. (1977). Sampling Techniques . Third Edn. Canada: John Wiley & Sons.

Coelli, T. J. (1995). Recent developments in frontier estimation and efficiency measurement. Aust. J. Agric. Econ. 39, 219–245. doi: 10.1111/j.1467-8489.1995.tb00552.x

Coelli, T. J., and Battese, G. E. (1996). Identification of factors that influence the technical efficiency of Indian farmers. Aust. J. Agric. Econ. 40, 103–128. doi: 10.1111/j.1467-8489.1996.tb00558.x

Conceicao, J. C. P. R., and Araujo, P. F. C. (2000). Fronteira de Produção Estocástica e Eficiência Técnica na Agricultura. Rev. Econ. Sociol. Rural. 38, 45–64.

Datta, T., Saha, J. K., Rahman, M. A., Chowdhury, A., Akter, M., and Gupta, A. D. (2023). The cost-benefit analysis and constraints of pineapple production in Bangladesh. Archives Agricul. Environ. Sci. 8, 397–402. https://dx.doi.org/10.26832/24566632.2023.0803018

Deshwara, M. (2015). Bangladesh: Bumper pineapple harvest in Srimangal. International Tropical Fruits Network. Available at: https://www.itfnet.org/v1/2015/07/bangladesh-bumper-pineapple-harvest-in-srimangal/

DS (2011). District statistics, Bangladesh Bureau of Statistics, statistics and informatics division, Ministry of Planning . Dhaka, Bangladesh: Government of the People’s Republic of Bangladesh.

Essilfie, F. L., Asiamah, M. T., and Nimoh, F. (2011). Estimation of farm level technical efficiency in small scale maize production in the Mfantseman municipality in the central region of Ghana: a stochastic frontier approach. J. Dev. Agric. Econ. 3, 645–654.

FAO (2018). Food Systems . United Nations, Rome, Italy: Food and Agriculture Organization.

Farrell, M. J. (1957). The measurement of production efficiency. J. R. Stat. Soc. 120, 253–281. doi: 10.2307/2343100

Fassinou, H. V. N., Lommen, W. J. M., Van der Vorst, J., Agbossou, E. K., and Struik, P. C. (2012). Analysis of pineapple production Systems in Benin. Acta Hortic. 928, 47–58. doi: 10.17660/ActaHortic.2012.928.4

Ghimire, B., Dhakal, S. C., Marahatta, S., and Bastakoti, R. C. (2023). Technical efficiency and its determinants on lentil (Lens culinary) production in Nepal. Farming System 1, 100045–100010. doi: 10.1016/j.farsys.2023.100045

Greene, W. H. (2000). Econometric analysis . New York: Prentice-Hall, Inc.

Gujarati, D. N. (2003). Basic econometrics . 4th Edn. New York: McGraw-Hill.

Haq, A. Z. M. (2013). The impact of agricultural extension contact on crop income in Bangladesh. Bangladesh J. Agric. Res. 38, 321–334. doi: 10.3329/bjar.v38i2.15893

Hasan, S., Ali, M., and Khalil, M. (2011). Impact of pineapple cultivation on the increased income of pineapple growers. Agriculturists 8, 50–56. doi: 10.3329/agric.v8i2.7577

Hasan, S., Hasan, S. S., Saha, S., and Islam, M. R. (2022). Identify problems and suggest possible solutions for safe pineapple production in Madhupur tract. European J. Agricul. Food Sci. 4, 68–74. doi: 10.24018/ejfood.2022.4.5.564

He, S., Yang, S., Razzaq, A., Erfanian, S., and Abbas, A. (2023). Mechanism and impact of digital economy on urban economic resilience under the carbon emission scenarios: evidence from China’s urban development. Int. J. Environ. Res. Public Health 20, 2–20. doi: 10.3390/ijerph20054454

PubMed Abstract | Crossref Full Text | Google Scholar

Hossian, M. M., and Abdulla, F. (2015). A time series analysis for the pineapple production in Bangladesh. Jahangirnagar University J. Sci. 38, 49–59.

Islam, K. M. Z., and Sumelius, J. (2011). Technical, economic and allocative efficiency of microfinance borrowers and non-borrowers: evidence from peasant farming in Bangladesh. Eur. J. Soc. Sci. 18, 361–377.

Jahid, A. M. (2023). Early pineapple harvest benefits farmers. The Daily Star, April 30. Available at: https://www.thedailystar.net/business/news/early-pineapple-harvest-benefits-farmers-2086133

Kumbhakar, S. C., and Lovell, C. A. K. (2000). Stochastic frontier analysis . Cambridge, UK: Cambridge University Press.

Meeusen, W., and Broeck, J. V. D. A. (1977). Technical efficiency and dimension of the firm: some results on the use of frontier production functions. Empir. Econ. 2, 109–122. doi: 10.1007/BF01767476

MoF (2022). Bangladesh National Food and nutrition security policy, plan of action; 2021–2030, food planning and monitoring unit (FPMU), ministry of food . Dhaka, Bangladesh: Government of the People’s Republic of Bangladesh.

Moreira, V., Bravo-Ureta, B., Carrillo, B., and Vazquez, J. (2006). Technical efficiency measures for small dairy farms in southern Chile: a stochastic frontier analysis with unbalanced panel data. Archivos de Medicina Veterinaria 38, 25–32. doi: 10.4067/S0301-732X2006000100004

Mussa, E. C. (2011). Economic efficiency of smallholder major crops production in the central highlands of Ethiopia. MS Thesis, Agricultural and Applied Economics Department, Egerton University, Kenya.

Naseer, M. A. R., Ashfaq, M., and Razzaq, A. (2020). Comparison of water use efficiency, profitability and consumer preferences of different Rice varieties in Punjab, Pakistan. Paddy Water Environ. 18, 273–282. doi: 10.1007/s10333-019-00780-9

Ndubueze-Ogarak, M. E., Adeyoola, O. A., and Nwigwe, C. A. (2021). Determinants of technical efficiency of small-holders yam farmers in Nigeria. Rev. Agricul. App. Econ. 24, 13–20. doi: 10.15414/raae.2021.24.01.13-20

NHB (2015). Pineapple. National Horticulture Board, Ministry of Agriculture and farmers welfare. Societies registration act 2005, Gurugram, government of India. Available at: https://nhb.gov.in/PDFViwer.aspx?enc=3ZOO8K5CzcdC/Yq6HcdIxKoWjriAwu4k1X9JJJcK0OA=

Okon, U. E., Enete, A. A., and Bassey, N. E. (2010). Technical efficiency and its determinants in garden egg (Solanum spp) production in Uyo Metropolis, Akwa Ibom state. J. Field Actions 1, 1–6.

Oladapo, M. O., Momoh, S., Yusuf, S., and Awoyinka, Y. (2007). Marketing margin and spatial pricing efficiency of pineapple in Nigeria. Asian J. Market. 1, 14–22. doi: 10.3923/ajm.2007.14.22

Pakravan-Charvadeh, M. R., and Flora, C. B. (2022). Sustainable food consumption pattern with emphasis on socioeconomic factors to reduce food waste. Int. J. Environ. Sci. Technol. 19, 9929–9944. doi: 10.1007/s13762-022-04186-9

Pakravan-Charvadeh, M. R., Flora, C. B., and Emrouznejad, A. (2022). Impact of socio-economic factors on nutrition efficiency: an application of data envelopment analysis. Frontiers in nutrition. Sec. Nutrition Sustain. Diets 9, 1–11. doi: 10.3389/fnut.2022.859789

Pandit, P., Pandey, R., Singha, K., Shrivastava, S., Gupta, V., and Jose, S. (2020). Pineapple leaf fibre: Cultivation and production . Singapore: Green Energy and Technology. Springer, 1–20.

Phrommarat, B., and Oonkasem, P. (2021). Sustainable pineapple farm planning based on eco-efficiency and income risk: a comparison of conventional and integrated farming systems. Appl. Ecol. Environ. Res. 19, 2701–2717. doi: 10.15666/aeer/1904_27012717

Polas, M. A. B. (2013). Profitability and technical efficiency of maize production study in some selected areas of Natore District, MS thesis, Department of Agricultural Finance, Bangladesh agricultural university, Mymensingh-2202, Bangladesh.

Razzaq, A., Qing, P., Naseer, M. A. R., Abid, M., Anwar, M., and Javed, I. (2019). Can the informal groundwater markets improve water use efficiency and equity? Evidence from a semi-arid region of Pakistan. Sci. Total Environ. 666, 849–857. doi: 10.1016/j.scitotenv.2019.02.266

Richetti, A., and Reis, R. P. (2003). The soybean production frontier and economic efficiency in Mato Grosso do Sul, Brazil. Rev. Econ. Sociol. Rural. 41, 153–168. doi: 10.1590/S0103-20032003000100003

Samarpitha, A., Vasudev, N., and Suhasini, K. (2016). Technical, economic, and allocative efficiencies of Rice farms in Nalgonda District of Telangana state. Econ. Aff. 61, 365–374. doi: 10.5958/0976-4666.2016.00047.4

Santos, J., Foster, W., Ortega, J., and Ramifrez, E. (2006). Estudio de la Eficiencia Técnica de Productores de Papas en Chile: El rol del Programa de Transferencia Tecnológica de INDAP. Economía Agraria 10, 119–132.

Sarker, J. R., and Alam, M. F. (2016). Efficiency and economics in cotton production of Bangladesh. J. Agricul. Environ. Int. Develop. 110, 325–348. doi: 10.12895/jaeid.20162.494

Seema, S. A., Banik, J., and Nandi, R. (2023). Diversity of homestead plants in Sreemangal Upazilla. Bangladesh. Asian Plant Res. J. 9, 1–12. doi: 10.9734/aprj/2022/v9i230199

Shakil, M. (2023). Better pineapple yield brings cheers for farmers. The Daily Star, May 11. Available at: https://www.thedailystar.net/business/economy/news/better-pineapple-yield-brings-cheers-farmers-3316671

Tadesse, B., and Krishnamoorthy, S. (1997). Technical efficiency in Paddy farms of Tamil Nadu: an analysis based on farm size and ecological zone. Agric. Econ. 16, 185–192. doi: 10.1111/j.1574-0862.1997.tb00453.x

Taylor, T. G., and Shonkwiler, J. S. (1986). Alternative stochastic specifications of the frontier production function in the analysis of agricultural credit programs and technical efficiency. J. Dev. Econ. 21, 149–160. doi: 10.1016/0304-3878(86)90044-1

Tchale, H., and Sauer, J. (2015). The efficiency of maize farming in Malawi. A bootstrapped Translog frontier. Cahiers d Economie et Sociologie Rurales 82-83, 33–56.

The Daily Tribunal (2023). Sreemangal pineapple going to Country’s other areas. April 08. Available at: https://www.dailytribunal24.com/country/division/58820

The World Bank (2023). The World Bank in Bangladesh, October 04. Available at: https://www.worldbank.org/en/country/bangladesh/overview

Tian, W. M., and Wan, G. H. (2000). Technical efficiency and its determinants in China's grain production. J. Prod. Anal. 13, 159–174. doi: 10.1023/A:1007805015716

Trujillo, J. C., and Iglesias, W. J. (2013). Measurement of the technical efficiency of small pineapple farmers in Santander, Colombia: a stochastic frontier approach. Rev. Econ. Sociol. Rural. 51, s049–s062. doi: 10.1590/S0103-20032013000600003

Uddin, M. T., Roy, S. S., and Dhar, A. R. (2022). Financial profitability and value chain analysis of pineapple in Tangail. Bangladesh. World Food Policy 8, 126–143. doi: 10.1002/wfp2.12039

Untoro, A. W., Waluyati, L. R., and Darwanto, D. H. (2021). Export competitiveness of Indonesia canned pineapple in European Union market, advances in economics, business and management research, Atlantis press international B. V, proceedings of 1st international conference on sustainable agricultural socio-economics, agribusiness, and rural development (ICSASARD) , 199, 53–59.

Villano, R., and Fleming, E. (2006). Technical inefficiency and production risk in Rice farming: evidence from Central Luzon Philippines. Asian Econ. J. 20, 29–46. doi: 10.1111/j.1467-8381.2006.00223.x

Ye, F., Qin, S., Nisar, N., Zhang, Q., Tong, T., and Wang, L. (2023). Does rural industrial integration improve agricultural productivity? Implications for sustainable food production. Front. Sustain. Food Syst., Sec. Nutrition Sustain. Diets 7, 1–14. doi: 10.3389/fsufs.2023.1191024

Keywords: Cobb–Douglas (C-D) production function, stochastic frontier analysis, technical efficiency, sustainable pineapple farming, future prospects and economic policies

Citation: Datta T, Saha JK, Rahman MA, Adnan KMM, Akter K, Chowdhury A and Alamgir MS (2024) Examining technical efficiency, prospects, and policies of farmers: data from a developing nation’s pineapple production. Front. Sustain. Food Syst . 8:1383948. doi: 10.3389/fsufs.2024.1383948

Received: 08 February 2024; Accepted: 03 July 2024; Published: 16 August 2024.

Reviewed by:

Copyright © 2024 Datta, Saha, Rahman, Adnan, Akter, Chowdhury and Alamgir. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Md. Shah Alamgir, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

IMAGES

  1. PPT

    hypothesis testing for mean and proportion

  2. Hypothesis Testing

    hypothesis testing for mean and proportion

  3. PPT

    hypothesis testing for mean and proportion

  4. How to Perform Hypothesis Testing for a Proportion: 8 Steps

    hypothesis testing for mean and proportion

  5. PPT

    hypothesis testing for mean and proportion

  6. Hypothesis Testing With Two Proportions

    hypothesis testing for mean and proportion

COMMENTS

  1. 8.4: Hypothesis Test Examples for Proportions

    Example 8.4.7. Joon believes that 50% of first-time brides in the United States are younger than their grooms. She performs a hypothesis test to determine if the percentage is the same or different from 50%. Joon samples 100 first-time brides and 53 reply that they are younger than their grooms.

  2. Hypothesis Testing for Means & Proportions

    We then determine the appropriate test statistic (Step 2) for the hypothesis test. The formula for the test statistic is given below. Test Statistic for Testing H0: p = p 0. if min (np 0 , n (1-p 0 )) > 5. The formula above is appropriate for large samples, defined when the smaller of np 0 and n (1-p 0) is at least 5.

  3. 3.4: Hypothesis Test for a Population Proportion

    Hypothesis Test for a Population Proportion (p) Frequently, the parameter we are testing is the population proportion. We are studying the proportion of trees with cavities for wildlife habitat. We need to know if the proportion of people who support green building materials has changed.

  4. 8.8 Hypothesis Tests for a Population Proportion

    The p -value for a hypothesis test on a population proportion is the area in the tail (s) of distribution of the sample proportion. If both n× p ≥ 5 n × p ≥ 5 and n ×(1− p) ≥ 5 n × ( 1 − p) ≥ 5, use the normal distribution to find the p -value. If at least one of n× p < 5 n × p < 5 or n×(1 −p) < 5 n × ( 1 − p) < 5, use ...

  5. PDF STAT 201 Chapter 9.1-9.2 Hypothesis Testing for Proportion

    A Hypothesis is a proposition assumed as a premise in an argument. It's a statement regarding a characteristic of one or more populations. Hypothesis testing is a procedure based on evidence found in a sample to test hypothesis. The null hypothesis, , is a statement to be tested. The null hypothesis is a statement of no change, no effect or ...

  6. 5.5

    For a test for two proportions, we are interested in the difference between two groups. If the difference is zero, then they are not different (i.e., they are equal). Therefore, the null hypothesis will always be: H 0: p 1 − p 2 = 0. Another way to look at it is H 0: p 1 = p 2.

  7. Lesson 6a: Hypothesis Testing for One-Sample Proportion

    Since hypothesis tests are about a parameter value, the hypotheses use parameter notation - \(p \) for proportion or \(\mu \) for mean - in their arrangement. For tests of a proportion or a test of a mean, we would choose the appropriate alternative based on our research question.

  8. Statistics

    The following steps are used for a hypothesis test: For example: And we want to check the claim: By taking a sample of 40 randomly selected Nobel Prize winners we could find that: The sample proportion is then: \ (\displaystyle \frac {10} {40} = 0.25\), or 25%. From this sample data we check the claim with the steps below.

  9. Hypothesis Test for a Proportion

    Test statistic. The test statistic is a z-score (z) defined by the following equation. z = (p - P) / σ. where P is the hypothesized value of population proportion in the null hypothesis, p is the sample proportion, and σ is the standard deviation of the sampling distribution. P-value.

  10. 9.6 Hypothesis Testing of a Single Mean and Single Proportion

    The student will select the appropriate distributions to use in each case. The student will conduct hypothesis tests and interpret the results. Television Survey In a recent survey, it was stated that Americans watch television on average four hours per day. Assume that σ = 2.

  11. 9.4

    9.4 - Comparing Two Proportions. So far, all of our examples involved testing whether a single population proportion p equals some value p 0. Now, let's turn our attention for a bit towards testing whether one population proportion p 1 equals a second population proportion p 2. Additionally, most of our examples thus far have involved left ...

  12. 10.5 Hypothesis Testing for Two Means and Two Proportions

    Using two consecutive days' business sections, test whether the stocks went down, on average, for the second day. H 0: _____ H a: _____ In words, define the random variable. The distribution to use for the test is _____. Calculate the test statistic using your data. Draw a graph and label it appropriately.

  13. Summary

    Here we presented hypothesis testing techniques for means and proportions in one and two sample situations. Tests of hypothesis involve several steps, including specifying the null and alternative or research hypothesis, selecting and computing an appropriate test statistic, setting up a decision rule and drawing a conclusion.

  14. 8.2: Hypothesis Testing of Single Proportion

    Either five-step procedure, critical value or p -value approach, can be used. 8.2: Hypothesis Testing of Single Proportion is shared under a license and was authored, remixed, and/or curated by LibreTexts. Both the critical value approach and the p-value approach can be applied to test hypotheses about a population proportion.

  15. Hypothesis Testing

    Table of contents. Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test. Step 4: Decide whether to reject or fail to reject your null hypothesis. Step 5: Present your findings. Other interesting articles. Frequently asked questions about hypothesis testing.

  16. Hypothesis test comparing population proportions

    Our alternative hypothesis is that there is a difference. Or that P1 does not equal P2. Or that P1 minus P2, the proportion of men voting minus the proportion of women voting, the true population proportions, do not equal 0. And we're going to do the hypothesis test with a significance level of 5%.

  17. Sample Proportion vs. Sample Mean: The Difference

    Here's the difference between the two terms: Sample proportion: The proportion of observations in a sample with a certain characteristic. Often denoted p̂, It is calculated as follows: p̂ = x / n. where: x: The number of observations in the sample with a certain characteristic. n: The total number of observations in the sample.

  18. Hypothesis Testing

    This statistics video tutorial explains how to solve hypothesis testing problems with proportions. It explains how to calculate the sample proportion and th...

  19. Lesson 6a: Hypothesis Testing for One-Sample Proportion

    Objectives. Upon successful completion of this lesson, you should be able to: Explain the concepts of hypothesis testing. Set up hypotheses. Perform hypothesis testing for a population proportion using the p-value approach and the rejection region approach. Use a confidence interval to draw a conclusion about a two-sided test.

  20. Hypothesis test for difference in proportions

    Remember the 𝒛 for any test statistic is =. (Estimator﹣Null) / SE. Let's focus on the numerator (Estimator﹣Null): ∙ The "estimator" in this case is the difference between proportions. This is what we are trying to estimate from the question. Thus, Estimator = p̂₁﹣ p̂₂. ∙ The "null" in this case is zero.

  21. 8.4: Hypothesis Test for One Proportion

    Step 1: State the hypotheses: The key words in this example, "proportion" and "differs," give the hypotheses: H 0: p = 0.856. H 1: p ≠ 0.856 (claim) Step 2: Compute the test statistic. Before finding the test statistic, find the sample proportion p^ = 420 500 = 0.84 and q0 = 1 - 0.856 = 0.144.

  22. Hypothesis test for difference in proportions example

    Flag. Evan. 4 years ago. Since we're subtracting the two samples, the mean would be the 1st sample mean minus the 2nd sample mean (µ1 - µ2). Sal finds that to be 0.38 - 0.33 = 0.05 at. 6:46. In this video, Sal is figuring out if there is convincing evidence that the difference in population means is actually 0.

  23. Hypothesis Testing for Proportions & Means: Examples & Exercises

    Hypothesis for the Difference Between Two Population Means Exercise 2: To determine whether car ownership affects a student's academic achievements, two random samples of 100 male students were each drawn from the student body. The grade point average for the sample of non-owners of cars was 2.7, and for the sample of car owners was 2.54. The sample variances are 0.36 and 0.4, respectively.

  24. Blown out of Proportion? Housing Prices, Home Prices and New Tenant

    One way to check a series for mean-reversion is to test whether the series is stationary (exhibiting a constant mean and variance over time), or non-stationary (not converging toward any particular level over time). Table 1 below reports the results of three statistical tests that evaluate whether a series exhibits non-stationarity.

  25. 9: Hypothesis Testing about Population Mean and Proportion

    9.2.4: Testing for Independence in Two-Way Tables (Special Topic) 9.2.5: Small Sample Hypothesis Testing for a Proportion (Special Topic) 9.2.6: Randomization Test (Special Topic) 9.2.7: Exercises; 9.3: Hypothesis Testing with One Sample. 9.3.1: Prelude to Hypothesis Testing; 9.3.2: Null and Alternative Hypotheses

  26. Frontiers

    p is the estimated proportion of an attribute that is present in the ... 2.3.7 Hypothesis testing. It would be reasonable to fit an inefficiency ... The model under the null hypothesis Eq. (9) is the traditional mean response function where output or cost is regressed up on the respective predictor variables given in stochastic production or ...