• Skip to primary navigation
  • Skip to main content
  • Skip to footer
  • Image & Use Policy
  • Translations

UC MUSEUM OF PALEONTOLOGY

UC Berkeley logo

Understanding Evolution

Your one-stop source for information on evolution

Teaching Resources

Testing a hypothesis.

Grade Level(s):

  • Howard Hughes Medical Institute

Resource type:

  • Classroom activity

Time: 50 minutes

Students watch a short film about natural selection in humans and answer questions on a worksheet that reinforce the evolutionary story behind malaria and sickle cell anemia prevalence.

Go to this resource »

malaria hypothesis

  • Teaching tips
  • Teaching background
  • [Evidence of evolution: Grades 9-12] There is a fit between organisms and their environments, though not always a perfect fit. (LS4.C)
  • [Mechanisms of evolution: Grades 9-12] Evolution results from selection acting upon genetic variation within a population. (LS4.B)
  • [Mechanisms of evolution: Grades 9-12] There is variation within a population. (LS3.B)
  • [Mechanisms of evolution: Grades 9-12] Natural selection acts on the variation that exists in a population. (LS4.B, LS4.C)
  • [Mechanisms of evolution: Grades 9-12] Natural selection is dependent on environmental conditions.
  • [Mechanisms of evolution: Grades 9-12] Over time, the proportion of individuals with advantageous characteristics may increase due to their likelihood of surviving and reproducing. (LS4.B, LS4.C)
  • [Mechanisms of evolution: Grades 9-12] Depending on environmental conditions, inherited characteristics may be advantageous, neutral, or detrimental.
  • [Nature of science: Grades 9-12] A hallmark of science is exposing ideas to testing. (P3, P4, P6, P7)
  • [Nature of science: Grades 9-12] Scientists test their ideas using multiple lines of evidence. (P6, NOS2)
  • [Nature of science: Grades 9-12] Scientists can test ideas about events and processes long past, very distant, and not directly observable.
  • [Nature of science: Grades 9-12] Science is a human endeavor. (NOS7)
  • [Studying evolution: Grades 9-12] As with other scientific disciplines, evolutionary biology has applications that factor into everyday life.
  • Disciplinary Core Idea LS4.B: Natural Selection
  • Disciplinary Core Idea LS4.C: Adaptation
  • NOS Matrix understanding category 2. Scientific knowledge is based on empirical evidence.
  • NOS Matrix understanding category 7. Science is a human endeavor.
  • Science and Engineering Practice 3. Planning and carrying out investigations
  • Science and Engineering Practice 4. Analyzing and interpreting data
  • Science and Engineering Practice 6. Constructing explanations and designing solutions
  • Science and Engineering Practice 7. Engaging in argument from evidence

Answers to the worksheets are readily available online, so if this is a concern, you may wish to have students complete those in class.

  • Genetic Variation
  • Natural Selection

Subscribe to our newsletter

  • Teaching resource database
  • Correcting misconceptions
  • Conceptual framework and NGSS alignment
  • Image and use policy
  • Evo in the News
  • The Tree Room
  • Browse learning resources

University of Chicago Press Journals logo

  • This journal

On Hypothesis Testing in Ecology and Evolution

  • James F. Quinn  and 
  • Arthur E. Dunham

Search for more articles by this author

Theories of causality in ecology and evolution rarely lend themselves to analysis by the formal method of "hypothesis testing" envisioned by champions of a "strong inference" model of scientific method. The objective of biological research typically is to assess the relative contributions of a number of potential causal agents operating simultaneously. Sensibly stated hypotheses in the methodology of most field investigations are similar to hypotheses of applied statistics. They are not intended to be mutually exclusive, in any sense exhaustive, or global in their application. It is not possible in principle to perform a "critical test" or experiment to distinguish between the truth of "alternative hypotheses" if the proposed causal processes they caricature occur simultaneously. We consider several examples in which a rigid hypothetico-deductive methodology applied to nonalternative ecological "hypotheses" could lead to fallacious conclusions. It has been proposed that processes of ecological succession may be separated into alternative modes of "facilitation," "inhibition," and "tolerance." Yet attempts to experimentally reject one or more of the supposedly distinct hypotheses cannot, in principle, distinguish between them in a variety of biologically interesting cases. In studies of the limits of distributions of intertidal organisms, reasonable univariate experimental tests of possible causes would lead to rejection of "biological enemy" hypotheses when a "keystone predator effect" occurs because the interaction between competition and predation reverses the direction of the effect on some prey populations expected from either process in isolation. Particular problems arise when "null models" in ecology are treated as hypotheses of "strong inference." Models of ecological or evolutionary causality rarely have single or easily stated "null" converses. Tractable null models have no probability of being strictly true, and thus may be rejected a priori as hypothetico-deductive constructs. In practice, their role is as a reference point for measurement of departures. Their usefulness in this regard depends upon the reliability with which the characteristics of biology without interaction can be estimated. Applied to studies of interspecific competition through the use of species distributions, purported null hypotheses make different biological assumptions than those of the interactive models. They seem neither especially more reliable nor in any way more fundamental. We see no reason to accept the recent claims that "null hypotheses," as applied in ecology and evolution, have any logical primacy or greater parsimony than other approaches to partitioning the variation observed in natural communities among the contributions of many observable causes. Careful consideration of possible explanations and controlled experimentation contribute a great deal to ecological and evolutionary knowledge. However, we believe that the hypothetico-deductive model of scientific method can provide misleading prescriptions for efficient investigation and acceptance of evidence in phenomena with multiple causes, and should be applied with appropriate skepticism.

Article DOI

Permissions Request permissions

Copyright 1983 The University of Chicago

Sign up for new issue alerts

Crossref reports the following articles citing this article:

  • Literary Theory
  • Literature Analysis

When are hypotheses useful in ecology and evolution?

  • February 2021

Matthew G Betts at Oregon State University

  • Oregon State University
  • This person is not on ResearchGate, or hasn't claimed this research yet.

David Frey at Cornell University

  • Cornell University

Sarah J. K. Frey at Oregon State University

Abstract and Figures

Understanding mechanisms often increases model transferability. Panels A and B show snowshoe hares in

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations
  • Christopher J. Lortie
  • C. E. Timothy Paine

Charles W Fox

  • Nat. Hum. Behav.
  • Marcus R Munafò

Brian Nosek

  • Dorothy V. M. Bishop
  • John P. A. Ioannidis

Jeff Houlahan

  • Marc Edwards

Siddhartha Roy

  • Richard McElreath
  • CAN J FOREST RES

Matthew P. Ayres

  • Karl Popper
  • Julia Rosen
  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

Testing Hypotheses of Molecular Evolution

Cite this chapter.

evolution of hypothesis testing

  • David Bickel 2  

Part of the book series: SpringerBriefs in Systems Biology ((BRIEFSBIOSYS))

736 Accesses

19 Altmetric

This chapter lifts the uncertainty quantification of the previous chapters up to the level of molecular evolution hypotheses. A relatively recent rival to the neutral theory of molecular evolution is explained in terms of how its predictions differ. The exercises at the end of the chapter give readers experience with quantifying the extent to which sequence data support one evolutionary hypothesis more than another.

In applying mathematics to subjects such as physics or statistics we make tentative assumptions about the real world which we know are false but which we believe may be useful nonetheless. – George E. P. Box 1 Is it of the slightest use to reject a hypothesis until we have some idea of what to put in its place? … There has not been a single date in the history of the law of gravitation when a modern significance test would not have rejected all laws and left us with no law. – Sir Harold Jeffreys 2

“Science and Statistics,” Journal of the American Statistical Association [ 25 ], copyright Ⓒ American Statistical Association, reprinted by permission of Taylor & Francis Ltd, http://www.tandfonline.com on behalf of American Statistical Association.

Theory of Probability (Oxford University Press) [ 63 , §7.22]. Reproduced with permission of the Licensor through PLSclear.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

evolution of hypothesis testing

Molecular Evolution: A Brief Introduction

evolution of hypothesis testing

Inferring Trees

evolution of hypothesis testing

A Not-So-Long Introduction to Computational Molecular Evolution

Bickel, D.R. 2022b. Fisher’s disjunction as the principle vindicating p-values, confidence intervals, and their generalizations: A frequentist semantics for possibility theory. Working paper. https://doi.org/10.5281/zenodo.6590672 .

Box, G.E.P. 1976. Science and statistics. Journal of the American Statistical Association 71: 791–799.

Article   Google Scholar  

Bromham, L. 2016. An Introduction to Molecular Evolution and Phylogenetics . Oxford: Oxford University Press.

Google Scholar  

Gould, S.J. 2009. Punctuated Equilibrium . Cambridge: Harvard University Press.

Book   Google Scholar  

Hoyle, F., and N. Wickramasinghe. 2000. Astronomical Origins of Life . New York: Springer.

Hu, T., M. Long, D. Yuan, Z. Zhu, Y. Huang, and S. Huang. 2013. The genetic equidistance result: misreading by the molecular clock and neutral theory and reinterpretation nearly half of a century later. Science China Life Sciences 56: 254–261.

Huang, S. 2012. Primate phylogeny: molecular evidence for a pongid clade excluding humans and a prosimian clade containing tarsiers. Science China Life Sciences 55: 709–725.

Huang, S. 2016. New thoughts on an old riddle: What determines genetic diversity within and between species? Genomics 108: 3–10.

Article   CAS   Google Scholar  

Jablonka, E., M. Lamb, and A. Zeligowski. 2014. Evolution in Four Dimensions: Genetic, Epigenetic, Behavioral, and Symbolic Variation in the History of Life . Cambridge: MIT Press.

Jeffreys, H. 1948. Theory of Probability . London: Oxford University Press.

Lesk, A. 2019. Introduction to Bioinformatics . Oxford: Oxford University Press.

Schwabe, C. 2002. Genomic potential hypothesis of evolution: a concept of biogenesis in habitable spaces of the universe. The Anatomical Record: An Official Publication of the American Association of Anatomists 268: 171–179.

Steele, E., R. Gorczynski, R. Lindley, Y. Liu, R. Temple, G. Tokoro, D. Wickramasinghe, and N. Wickramasinghe, 2019. Lamarck and panspermia – on the efficient spread of living systems throughout the cosmos. Progress in Biophysics and Molecular Biology 149: 10–32.

Wang, M., D. Wang, J. Yu, and S. Huang. 2020. Enrichment in conservative amino acid changes among fixed and standing missense variations in slowly evolving proteins. PeerJ 8: e9983.

Download references

Author information

Authors and affiliations.

Informatics and Analytics, University of North Carolina at Greensboro, Greensboro, NC, USA

David Bickel

You can also search for this author in PubMed   Google Scholar

6.1 Electronic Supplementary Materials

Supplemental 1.

ancestor uncertainty (xlsx 1021 kb)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Bickel, D. (2022). Testing Hypotheses of Molecular Evolution. In: Phylogenetic Trees and Molecular Evolution. SpringerBriefs in Systems Biology. Springer, Cham. https://doi.org/10.1007/978-3-031-11958-3_6

Download citation

DOI : https://doi.org/10.1007/978-3-031-11958-3_6

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-11957-6

Online ISBN : 978-3-031-11958-3

eBook Packages : Biomedical and Life Sciences Biomedical and Life Sciences (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Search Menu
  • Sign in through your institution
  • Volume 16, Issue 8, August 2024 (In Progress)
  • Volume 16, Issue 7, July 2024
  • Advance articles
  • High-Impact Research Collection
  • Celebrate 40 Years of Publishing
  • Special sections
  • Virtual Issues
  • Research articles
  • Perspectives
  • Genome resources
  • Biographies
  • Author Guidelines
  • Submission Site
  • Open Access
  • Reasons to submit
  • About Genome Biology and Evolution
  • About the Society for Molecular Biology and Evolution
  • Editorial Board
  • Advertising and Corporate Services
  • Journals Career Network
  • Self-Archiving Policy
  • Journals on Oxford Academic
  • Books on Oxford Academic

Issue Cover

Article Contents

Introduction, data and methods, supplementary material, acknowledgments, data availability, literature cited.

  • < Previous

Phylogenomic Testing of Root Hypotheses

ORCID logo

Present address: Institute for Molecular Evolution, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.

Fernando D K Tria and Giddy Landan contributed equally to this work.

  • Article contents
  • Figures & tables
  • Supplementary Data

Fernando D K Tria, Giddy Landan, Devani Romero Picazo, Tal Dagan, Phylogenomic Testing of Root Hypotheses, Genome Biology and Evolution , Volume 15, Issue 6, June 2023, evad096, https://doi.org/10.1093/gbe/evad096

  • Permissions Icon Permissions

The determination of the last common ancestor (LCA) of a group of species plays a vital role in evolutionary theory. Traditionally, an LCA is inferred by the rooting of a fully resolved species tree. From a theoretical perspective, however, inference of the LCA amounts to the reconstruction of just one branch—the root branch—of the true species tree and should therefore be a much easier task than the full resolution of the species tree. Discarding the reliance on a hypothesized species tree and its rooting leads us to reevaluate what phylogenetic signal is directly relevant to LCA inference and to recast the task as that of sampling the total evidence from all gene families at the genomic scope. Here, we reformulate LCA and root inference in the framework of statistical hypothesis testing and outline an analytical procedure to formally test competing a priori LCA hypotheses and to infer confidence sets for the earliest speciation events in the history of a group of species. Applying our methods to two demonstrative data sets, we show that our inference of the opisthokonta LCA is well in agreement with the common knowledge. Inference of the proteobacteria LCA shows that it is most closely related to modern Epsilonproteobacteria, raising the possibility that it may have been characterized by a chemolithoautotrophic and anaerobic life style. Our inference is based on data comprising between 43% (opisthokonta) and 86% (proteobacteria) of all gene families. Approaching LCA inference within a statistical framework renders the phylogenomic inference powerful and robust.

Inference of the last common ancestor (LCA) for a group of taxa is central in the study of species evolution. We present a novel approach for the inference of the LCA without reconstructing a species tree and demonstrate its applicability using opisthokonta and proteobacteria data sets. Approaching LCA inference within a statistical framework renders the phylogenomic inference powerful and robust.

Inference of the last common ancestor (LCA) for a group of taxa is central for the study of evolution of genes, genomes, and organisms. Prior to the assignment of an LCA, represented by the root node, phylogenetic trees are devoid of a time direction, and the temporal order of divergences is undetermined. Discoveries and insights from the reconstruction of LCAs span a wide range of taxonomic groups and time scales. For example, studies of the last universal common ancestor (LUCA) of all living organisms inferred that it was an anaerobic prokaryote whose energy metabolism was characterized by CO 2 fixing, H 2 -dependent with a Wood–Ljungdahl pathway, N 2 fixing, and thermophile ( Weiss et al. 2016 ). Nonetheless, due to the inherent difficulty of ancient LCA inference, it is frequently the center of evolutionary controversies, such as, for instance, the debate concerning the two versus three domains of life ( Williams et al. 2020 ), the cyanobacterial root position ( Hammerschmidt et al. 2021 ), the LCA of vertebrates ( Okamoto et al. 2017 ), or the LCA of hominids ( Lovejoy et al. 2009 ).

The identity of the LCA is traditionally inferred from a species tree that is reconstructed unrooted and is then rooted at the final step. The root may be inferred by several methods, including out-group rooting ( Kluge and Farris 1969 ), midpoint rooting ( Farris 1972 ), minimal ancestor deviation (MAD) ( Tria et al. 2017 ), minimal variance rooting ( Mai et al. 2017 ), relaxed clock models ( Lepage et al. 2007 ), tree reconciliation ( Szöllősi et al. 2012 ), and nonreversible substitution models for nucleotide data ( Huelsenbeck et al. 2002 ; Williams et al. 2015 ; Cherlin et al. 2018 ; Bettisworth and Stamatakis 2021 ) or amino acids data ( Naser-Khdour et al. 2022 ). A correct LCA inference in methods that rely on the availability of the species tree is thus dependent on the accuracy of the species tree topology. One approach for the reconstruction of a species tree is to use a single gene as a proxy for the species tree topology, for example, 16S ribosomal RNA subunit for prokaryotes ( Fox et al. 1980 ) or the Cytochrome C for eukaryotes ( Fitch and Margoliash 1967 ). This approach is, however, limited in its utility due to possible differences between the gene evolutionary history and the species phylogeny. Similarly, methods relying on ancient paralogs for root inferences (e.g., Gogarten et al. 1989 ; Iwabe et al. 1989 ) rely on a small number of genes and assume an identity of the gene and species evolutionary history.

Phylogenomics offer an alternative to the single-gene approach by aiming to utilize the whole genome rather than a single gene for the phylogenetic reconstruction ( Eisen 2003 ). In the most basic approach, the species tree is reconstructed from the genes that are shared among all the species under study, termed here as complete gene families . Approaches for the reconstruction of a species tree from multiple complete single-copy (CSC) genes include tree reconstruction from concatenated alignments (e.g., Ciccarelli et al. 2006 ; Hug et al. 2016 ; Parks et al. 2018 ) as well as calculation of consensus trees (e.g., Dagan et al. 2013 ). However, these approaches are often restricted in their data sample as they exclude partial gene families not present in all members of the species set (e.g., due to differential loss) and multicopy gene families present in multiple copies in one or more species (e.g., as a result of gene duplications or gene acquisition). Thus, the drawback in alignment concatenation or consensus tree approaches is that the inference becomes limited to gene sets that do not represent the entirety of genomes. This issue tends to become more acute the more diverse the species set is or when it includes species with reduced genomes. In extreme cases, no single-copy and complete gene family exists ( Medini et al. 2005 ). Super-tree approaches offer an alternative as they enable to include also partial gene families ( Pisani et al. 2007 ; Whidden et al. 2014 ; Williams et al. 2017 ; Zhu et al. 2019 ); however, those approaches also exclude multicopy gene families (partial and complete). These requirements produce the phenomenon of “trees of 1%” of the genes ( Dagan and Martin 2006 ; and see our table 1 ). Thus, although the major aim of phylogenomic approaches is to improve the accuracy of phylogenetic inference by increasing the sample size, methodologies based on single-copy genes suffer from several inference problems, with some elements that are common to all of them. The first is the limited sample sizes due to the number of single copy genes. In super-tree approaches based on concatenation and consensus, there is no room for the inclusion of multicopy gene families. Furthermore, reduction of families including paralogs into ortholog-only subsets, for example, using tree reconciliation (reviewed in Szöllősi et al. 2015 ; Smith and Hahn 2021 ), requires an a priori assumed species tree topology. Finally, because the aforementioned approaches yield unrooted species trees, the inference of the root is performed as the last step in the analysis; hence, the sample size of trees used for the LCA inference is essentially one (i.e., a single tree).

Illustrative Data Sets Used in This Study

OpisthokontaProteobacteria
3172
13,0369,686
: Complete single-copy gene families, present as single-copy in all members of a species set.170
(1.30%)
50
(0.52%)
: Complete multicopy gene families, present in all species, but having multiple copies in at least one species.612
(4.69%)
70
(0.72%)
: Partial single-copy gene families,
absent from some species and present as single-copy in the others.
7,773
(59.63%)
5,586
(57.67%)
: Partial multicopy gene families,
absent from some species and having multiple copies in at least one other species.
4,481
(34.37%)
3,980
(41.09%)
77.70%
(132 out of 170 trees)
{fungi,
metazoa}
33.17%
(16.58* out of 50 trees)
{ɛ-proteobacteria,
other proteobacteria}
OpisthokontaProteobacteria
3172
13,0369,686
: Complete single-copy gene families, present as single-copy in all members of a species set.170
(1.30%)
50
(0.52%)
: Complete multicopy gene families, present in all species, but having multiple copies in at least one species.612
(4.69%)
70
(0.72%)
: Partial single-copy gene families,
absent from some species and present as single-copy in the others.
7,773
(59.63%)
5,586
(57.67%)
: Partial multicopy gene families,
absent from some species and having multiple copies in at least one other species.
4,481
(34.37%)
3,980
(41.09%)
77.70%
(132 out of 170 trees)
{fungi,
metazoa}
33.17%
(16.58* out of 50 trees)
{ɛ-proteobacteria,
other proteobacteria}

Note.— (A) Number of species in each data set (for the complete list of species, see supplementary table S1, Supplementary Material online). (B) Classification of gene families in the data sets according to their presence and absence pattern in the species genomes. The proportion of gene families in each category from the total gene families is shown in parenthesis. (C) Number of CSC gene trees where the inferred root matches the consensus rooting using minimal ancestor deviation (MAD). *Note that the number of supporting trees may include fractions since trees with competing roots contribute the fraction of supporting root splits to the consensus MAD root.

The inference of the LCA from a single species tree can be robust and accurate only if the underlying species tree is reliable. Unfortunately, this is rarely the case, as can be frequently seen in the plurality of gene tree topologies and their disagreement with species trees, especially for prokaryotes due to frequent gene transfers (e.g., Doolittle and Bapteste 2007 ; Linz et al. 2007 ; and see our fig. 5 a ). Indeed, recent implementations of tree reconciliation approaches aim to accommodate the presence of heterogeneous topologies due to gene duplication or gain by inferring the effect of such events on the tree topology (e.g., Coleman et al. 2021 ; Morel et al. 2022 ). However, such applications required an a priori assumption of the relative frequency of gene duplication and gene transfer that is bound to have a significant effect on the resulting tree topology (including the root position) ( Bremer et al. 2022 ). We propose that the identification of an LCA for a group of species does not require the reconstruction of a fully resolved species tree. Instead, the LCA can be defined as the first speciation event, that is, tree node, for the group of species. In this formulation, the topological resolution of the entire species tree is immaterial, and the only phylogenetic conclusion needed is the partitioning of the species into two disjoint monophyletic groups or the species root partition .

Here, we present a novel approach for the LCA inference without reconstructing a species tree. Our approach considers the total evidence from unrooted gene trees for all protein families from a set of taxa, including partial families as well as multiplecopy gene families. The approach utilizes the measure of ancestor deviation (AD) that is the basis for the MAD rooting method ( Tria et al. 2017 ). The MAD rooting method assumes a clock-like evolutionary rate of protein families, which has been shown to be a reasonable assumption, at least for prokaryotic protein families where ∼70% families do not deviate significantly from clock-like evolutionary rate ( Novichkov et al. 2004 ; Dagan et al. 2010 ). The AD measure is calculated for each branch in a given phylogenetic tree as the mean relative deviation from the molecular clock expectation, when the root is positioned on that branch. The branch that minimizes the relative deviations from the molecular clock assumption (i.e., is assigned the minimal AD) is the best candidate to contain the root node. The AD calculation can be performed for any given tree topology, regardless of the tree reconstruction approach. The comparison of branch ADs calculated for gene trees of all protein families in a group of species enables us to formulate and test hypotheses regarding the LCA of the species tree.

For the presentation of the statistical framework for testing root hypotheses, we used illustrative rooting problems for two species sets: opisthokonta and proteobacteria ( table 1A and B ). The root partition is well established for the opisthokonta species sets, and it serves here as a positive control. The root of the proteobacteria species set is still debated; hence, this data set serves to demonstrate the power of the proposed approach. The opisthokonta data set comprises 14 metazoa and 17 fungi species, and the known root is a partition separating fungi from metazoa species ( Stechmann and Cavalier-Smith 2002 ; Katz 2012 ). The proteobacteria data set includes species from five taxonomic classes in that phylum ( Ciccarelli et al. 2006 ; Pisani et al. 2007 ; Lang et al. 2013 ; but see Waite et al. 2017 ). The Proteobacteria data set poses a harder root inference challenge than opisthokonta as previous results suggest that three different branches are comparably likely to harbor the root node, a situation best described as a root neighborhood ( Tria et al. 2017 ).

Phylogenomic Rooting as Hypothesis Testing

Our LCA inference approach differs from existing ones in several aspects: 1) No species tree is reconstructed or assumed. 2) Phylogenetic information is extracted from gene trees reconstructed from partial and multicopy gene families in addition to CSC gene families. 3) The analysis uses unrooted gene trees, and no rooting operations are performed, of either gene trees or species trees. 4) Any LCA hypothesis can be tested, including species partitions that do not occur in any of the trees. LCA inference deals with abstractions of similar but distinct types of phylogenetic roots: species tree roots and gene tree roots. We reserve the terms root branch and root split to refer to gene tree roots, while species root partition refers to species trees.

Before describing our approach, we first demonstrate the limitations of a simpler phylogenomic rooting procedure that uses only CSC gene families and infers the root by a consensus derived from the rooted trees of the CSC genes. In this procedure, only the root branch in each gene tree is considered for the root inference. We then show how to consider all branches from each unrooted gene tree, not only the root branch of the rooted trees. The incorporation of all branches, not considered by a simple consensus of rooted trees, leads to a statistical test to decide between two competing root hypotheses.

Next, we show how information from partial and multicopy gene families can be used within the same statistical framework, greatly increasing the sample size and inference power. We then extend the pairwise formulation and consider multiple competing root partitions by testing all partition pairs, one pair at a time.

Finally, we modify the pairwise test to a comparison of one root partition against all alternatives simultaneously (a one-to-many test) and present a sequential elimination process that infers a minimal root neighborhood, that is, a confidence set of LCA partitions.

Phylogenomic Consensus Rooting

The consensus approach infers the root partition of a species set from a sample of rooted CSC gene trees. Root branches are collected from all trees, and the operational taxonomic units (OTUs) split induced by the most frequent root branch in the sample is the inferred species root partition for the species set. In species sets with a strong root signal, this majority-rule approach is sufficient to determine a clear root partition for the species set. This circumstance is observed in the opisthokonta illustrative data set. Using MAD ( Tria et al. 2017 ) to root the individual gene trees, the consensus species root partition was inferred as the root branch found in >70% of the CSC gene trees (see table 1 ). In the proteobacteria, in contrast, the most frequent root branch was inferred in just 33% of the CSC gene trees. The performance of the consensus approach is thus hindered by three factors. First, majority-rule voting considers just one branch from each gene tree, ignoring a large measure of the phylogenetic signal present in the gene trees. In addition, the quality of the root inference varies among the gene trees and is quantifiable, but this information is not utilized by the consensus approach. Lastly, simple voting cannot be satisfactorily tested for statistical significance.

The Root Support Test for Two Alternative Root Partitions

The first step in our approach is a formulation of a test to select between two competing species root partitions (see fig. 1 for a road-map of the procedure). To that end, we do not infer a single root for each gene tree, but consider every branch of a gene tree as a possible root branch, and assign it a score that quantifies the relative quality of different root positions. In the current study, we use the AD measure to assess the relative strength of alternative roots of the same gene tree. The AD statistic quantifies the amount of lineage rate heterogeneity that is induced by postulating a branch as harboring the root of the tree. We have previously shown that AD measures provide robust evidence for the inference of the root of a single gene tree ( Tria et al. 2017 ).

—Outline of the analytical procedure. Stages are depicted clockwise from top-left. The input for the analysis is (1) gene trees of all protein families for a group of species, including the information of AD per branch as calculated by MAD. Protein families are classified into complete and partial, single-copy, or multicopy families according to the gene copy number per species. (2) Branch ADs in the gene trees supply evidence for hypothetical root partitions in the species tree; these are collected in the (3) root support matrix. The information in the root support matrix is used to identify candidates for the species root partition (including the consensus root partition, if exists). (4) The comparison of root candidates is done by comparing the distribution of their ADs in all gene trees in a pairwise test. (5) If several root partitions are similarly supported by ADs, these can be analyzed in the context of a root neighborhood, where weakly supported partitions are sequentially eliminated from the root partitions set. (6) The remaining root partitions comprise the species LCA confidence set.

—Outline of the analytical procedure. Stages are depicted clockwise from top-left. The input for the analysis is (1) gene trees of all protein families for a group of species, including the information of AD per branch as calculated by MAD. Protein families are classified into complete and partial, single-copy, or multicopy families according to the gene copy number per species. (2) Branch ADs in the gene trees supply evidence for hypothetical root partitions in the species tree; these are collected in the (3) root support matrix. The information in the root support matrix is used to identify candidates for the species root partition (including the consensus root partition, if exists). (4) The comparison of root candidates is done by comparing the distribution of their ADs in all gene trees in a pairwise test. (5) If several root partitions are similarly supported by ADs, these can be analyzed in the context of a root neighborhood, where weakly supported partitions are sequentially eliminated from the root partitions set. (6) The remaining root partitions comprise the species LCA confidence set.

Collecting AD values from a set of gene trees, we obtain a paired sample of support values. In figure 2 , we present the joint distribution of AD values for the two most likely root partitions in the eukaryotic data set. A null hypothesis of equal support can be tested by the Wilcoxon signed-rank test, and rejection of the null hypothesis indicates that the root partition with smaller AD values is significantly better supported than the competitor ( fig. 2 b ).

—Pairwise testing of competing root hypotheses in the opisthokonta data set. (a) The two most frequent root branches among the CSC gene trees (supplementary table S1a, Supplementary Material online). (b) CSC gene families, and (c) all gene families. Colormaps are the joint distribution of paired AD values. Smaller ADs indicate better support, whereby candidate 1 outcompete candidate 2 above the diagonal and candidate 2 wins below the diagonal. P values are for the two-sided Wilcoxon signed-rank test used to compare paired branch AD values. Note the gain in power concomitant to larger sample size.

—Pairwise testing of competing root hypotheses in the opisthokonta data set. ( a ) The two most frequent root branches among the CSC gene trees ( supplementary table S1 a , Supplementary Material online). ( b ) CSC gene families, and ( c ) all gene families. Colormaps are the joint distribution of paired AD values. Smaller ADs indicate better support, whereby candidate 1 outcompete candidate 2 above the diagonal and candidate 2 wins below the diagonal. P values are for the two-sided Wilcoxon signed-rank test used to compare paired branch AD values. Note the gain in power concomitant to larger sample size.

As in all statistical inferences, the power of the test ultimately depends on the sample size. Considering only CSC gene families often limits rooting analyses to a small minority of the available sequence data (e.g., table 1 ). Paired AD support values, however, can be extracted also from partial and multicopy gene families, resulting in much larger sample size and statistical power ( fig. 2 c ).

Rooting Support from Partial and Multicopy Gene Trees

To deal with non-CSC gene trees, we must decouple the notion of OTU split (i.e., tree branch) from the notion of species partition . In CSC gene trees, the correspondence between tree branches and root partitions is direct and one to one ( fig. 3 a ). In trees of partial gene families, a single OTU split may correspond to several species partitions. In multiplecopy gene families, some tree branches do not correspond to any possible species partition.

—Correspondence of OTU splits and tested root partitions. (a) In CSC gene trees; (b) in PSC gene trees; and (c) in CMC gene trees. PMC gene trees entail both the b and c operations. OTU splits refer to branches in gene trees and are represented as black circles and white squares. Species partitions refer to possible branches in the hypothetical species tree, with unknown topology, and are represented as gray shades. In CSC gene trees, all branches (including internal and external) can be mapped to species partitions in a one-to-one manner (green arrows in a; note that only several splits are illustrated). For mapping branches from PSC gene trees (b) to species partitions, we remove from the species partitions the species missing in the gene tree and term the reduced version of the species partitions as OTU partitions. In CMC gene trees (c), only branches that form species splits can be mapped onto species partitions. A species splits in a CMC gene tree is a branch for which all gene copies of any one species are present on the same side of the split.

—Correspondence of OTU splits and tested root partitions. ( a ) In CSC gene trees; ( b ) in PSC gene trees; and ( c ) in CMC gene trees. PMC gene trees entail both the b and c operations. OTU splits refer to branches in gene trees and are represented as black circles and white squares. Species partitions refer to possible branches in the hypothetical species tree, with unknown topology, and are represented as gray shades. In CSC gene trees, all branches (including internal and external) can be mapped to species partitions in a one-to-one manner (green arrows in a ; note that only several splits are illustrated). For mapping branches from PSC gene trees ( b ) to species partitions, we remove from the species partitions the species missing in the gene tree and term the reduced version of the species partitions as OTU partitions. In CMC gene trees ( c ), only branches that form species splits can be mapped onto species partitions. A species splits in a CMC gene tree is a branch for which all gene copies of any one species are present on the same side of the split.

In order to find the branches in a partial gene tree that correspond to the tested root partitions, we reduce root partitions from species to OTUs by removing the species that are missing in the gene tree ( Semple and Steel 2001 ). The root partitions are then assigned AD support by matching their reduced OTU version to the OTU splits of the gene tree ( fig. 3 b ).

In multicopy gene trees, one or more species are represented multiple times as an OTU ( Swenson and El-Mabrouk 2012 ). Each branch of a multicopy gene tree splits the OTUs into two groups, and the two groups may be mutually exclusive or overlapping in terms of species. We refer to mutually exclusive splits in multicopy gene trees as species splits which can be mapped to specific root partitions. Overlapping splits, on the other hand, cannot correspond to any root partition ( fig. 3 c ). Mapping of tree splits from partial multicopy (PMC) gene trees entail both operations: identification of species splits and reduction of root partitions. We note that the ability to gain information on root partitions from multicopy gene trees depends on the quality of gene family clustering with regards to the accuracy of orthology assessment. The presence of paralogs (especially ancient paralogs) in the set of multicopy gene trees will lead to a low proportion of splits that can be mapped to root partitions.

Candidate root partitions, or their reduced versions, may be absent from some gene trees and will be missing support values from these trees. We distinguish between two such cases: informative and uninformative missing values. A gene family is uninformative relative to a specific species root partition when its species composition includes species from only one side of the species partition. In such cases, the candidate root partition cannot be observable in any tree reconstructed from the gene family. We label the gene trees of such families as uninformative relative to the specific candidate root partition and exclude them from tests involving that partition (note that such a gene tree may be still used to test other candidate root partitions). In contrast, when a gene family includes species from both sides of a candidate species root partition but the gene tree lacks a corresponding branch, we label the gene tree as informative relative to the partition. This constitutes evidence against the candidate partition and should not be ignored in the ensuing tests. In such cases, we replace the missing support values by a pseudocount consisting of the maximal (i.e., worst) AD value in the gene tree. This assignment of a default worst-case support value also serves to enable the pairwise testing of incompatible root partitions, where no gene tree can include both partitions ( Semple and Steel 2001 ).

Complete gene families are always informative relative to any candidate root partitions. Partial gene families, however, may be uninformative for some root candidates. When testing two candidate root hypotheses against each other, the exclusion of uninformative partial gene trees thus leads to a reduction of sample size from the full complement of gene families. Furthermore, one branch of a partial gene tree may be identical to the reduced versions of two or more species root partitions, whereby the tree is informative relative to the several candidates yet their support values are tied.

Root Inference and Root Neighborhoods

The pairwise test is useful when the two competing root hypotheses are given a priori, as often happens in specific evolutionary controversies. More generally, however, one wishes to infer the species LCA, or root partition, with no prior hypotheses. In principle, the pairwise test may be carried out over all pairs of possible root partitions, while controlling for multiple testing. Such an exhaustive approach is practically limited to very small rooting problems, as the number of possible partitions grows exponentially with the number of species. A possible simplification is to restrict the analysis to test only pairs of root partitions from a pool of likely candidates. We propose that a reasonable pool of candidate root partitions can be constructed by collecting the set of root splits that are inferred as the root in any of the CSC gene trees.

When one species root partition is significantly better supported than any of the other candidates, the root is fully determined. Such is the result for the opisthokonta data set, for which the known root partition is the best candidate among all pairwise comparisons ( supplementary table S2 a , Supplementary Material online). In more difficult situations, the interpretation of all pairwise P values is not straightforward due to the absence of a unanimous best candidate root partition. This situation is exemplified with the CSC subset of the proteobacteria data set where no candidate has better support than all the alternative candidates ( fig. 4 and supplementary table S2 b , Supplementary Material online). The absence of a clear best candidate suggests the existence of a root neighborhood in the species set. Thus, a rigorous procedure for the inference of a confidence set for LCA is required.

—Cumulative distribution plots of AD in the proteobacteria data set. Left: cumulative distribution of unpaired AD values for the 25 candidate root partitions. Right: cumulative distribution of paired differences to candidate 1 (i.e., the most frequent candidate), whereby positive differences indicate better support for candidate 1 and negative values better support for candidate i (see supplementary table S1b, Supplementary Material online for candidate partition definitions). The results from gene family (i.e., tree) classes are stacked vertically. P values of the least significant among the contrasts to candidate 1 are shown in red (FDR adjusted for all 300 pairwise comparisons; details in supplementary table S2b, Supplementary Material online).

—Cumulative distribution plots of AD in the proteobacteria data set. Left: cumulative distribution of unpaired AD values for the 25 candidate root partitions. Right: cumulative distribution of paired differences to candidate 1 (i.e., the most frequent candidate), whereby positive differences indicate better support for candidate 1 and negative values better support for candidate i (see supplementary table S1 b , Supplementary Material online for candidate partition definitions). The results from gene family (i.e., tree) classes are stacked vertically. P values of the least significant among the contrasts to candidate 1 are shown in red (FDR adjusted for all 300 pairwise comparisons; details in supplementary table S2 b , Supplementary Material online).

One-to-Many Root Support Test

To assess the support for root partitions in the full context of all other candidate root partitions, we modify the pairwise test to a test contrasting one root partition to a set of many alternatives. The one-to-many test consists of comparing the distribution of root support values for one focal partition to the extreme support values among all the other candidates and is inherently asymmetric. A “better than best” version takes the minimal (i.e., best) value among the AD values of the alternatives, while the “worse than worst” version considers the maximal (i.e., worst) among the alternatives’ ADs. As expected, the “better than best” variant is always less powerful than any of the pairwise tests and will not be considered further. The “worse than worst” variant, on the other hand, can be used to trim down a set of candidates while being more conservative than the pairwise tests. In the one-to-many test, each gene tree provides one AD value for the focal partition and one AD value for the worst among the alternative root partitions. The worst AD values are assigned while considering only partitions with nonmissing values (see above). Note that the worst alternative root partition may vary across gene trees. To maximize the sample size, that is, number of trees used for the statistical tests, gene trees including informative missing values for all root candidate partitions are included in the analysis with the largest (i.e., worst) AD found in the entire tree. We test for differences in the magnitude of paired AD values using the one-sided Wilcoxon signed-rank test, with the null hypothesis that the focal ADs are equal or smaller than the maximal ADs for the complementary set and the alternative hypothesis that the focal ADs are larger still than the maximum. A rejection of the null hypothesis is interpreted to mean that the focal root partition is significantly worse supported than the complementary set of candidates taken as a whole.

Inference of a Minimal Root Neighborhood

To infer a root neighborhood, that is, a confidence set of LCA hypotheses, we start with a reasonably constructed large set of n candidate partitions and reduce it by a stepwise elimination procedure. At each step, we employ the one-to-many test to contrast each of the remaining candidates to its complementary set. We control for false discovery rate (FDR; Benjamini and Hochberg 1995 ) due to multiple testing, and if at least one test is significant at the specified FDR level, the focal partition with the smallest P value (i.e., largest z -statistic) is removed from the set of candidates. The iterative process is stopped when none of the retained candidates is significantly less supported than the worst support for the other members of the set or when the set is reduced to a single root partition. To be conservative, we use a cumulative FDR procedure where at the first step, we control for n tests, in the next round for 2 n − 1 tests, and, when not stopped earlier, for n* ( n − 1)/2 − 1 at the last iteration.

We demonstrate the sequential elimination procedure for the proteobacteria data set in figure 5 . The splits network reconstructed for the proteobacteria data set exemplifies the plurality of incongruent splits in the CSC gene trees and hence the dangers in assuming a single species tree. In this data set, the initial candidate set consisted of the 25 different root partitions found in the 50 CSC gene trees, and the elimination process terminated with a neighborhood of size 1, a species root partition separating the Epsilonproteobacteria species from the other proteobacteria classes. This LCA is indeed the most frequent one among the CSC gene trees but with a low frequency of only one in three gene trees. It is noteworthy that the order of elimination does not generally follow the frequency of partitions in the CSC set. For example, the last alternative to be rejected (number 19) was inferred as a root branch in just one tree where it is tied with two other branches, whereas the second and third most frequent CSC roots are rejected already at iterations 19 and 20.

—LCA inference by sequential elimination in the proteobacteria data set. (a) Phylogenetic split network of the 50 CSC gene trees; (b) trace of the sequential elimination process (see supplementary table S1b, Supplementary Material online for candidate partition definitions). Selected partitions are indicated by gray arcs in a and bold numbers in b.

—LCA inference by sequential elimination in the proteobacteria data set. ( a ) Phylogenetic split network of the 50 CSC gene trees; ( b ) trace of the sequential elimination process (see supplementary table S1 b , Supplementary Material online for candidate partition definitions). Selected partitions are indicated by gray arcs in a and bold numbers in b .

The elimination order is determined by the P value of the one-to-many test, which in turn reflects both the effect size of worse support and the power of the test, where the latter is a function of sample size. Hence, candidate partitions for which a smaller number of gene trees are informative are more difficult to reject. In particular, the testing of an LCA hypothesis of a single basal species partitioned from the other species is limited to those gene families that include the basal species. The last two partitions rejected in figure 5 are indeed single species partitions, and the number of gene trees that are informative relative both to them and to the remaining candidates drops drastically in comparison with the earlier iterations. Yet, even at the last iteration, the number of gene trees that bear upon the conclusion is an order of magnitude larger than the number of CSC gene families.

The full complement of the proteobacteria data set consists of 9,686 gene families. The final conclusion—determination of a single LCA partition—is arrived at by extracting ancestor–descendant information from 86% of the gene families. The gene families that do not provide any evidence consist of 1,113 partial single-copy (PSC) gene families, mainly very small ones (e.g., due to recent gene origin), and 214 PMC families, mostly small families and some with abundant paralogs (e.g., due to gene duplication prior to the LCA).

In the current analysis, we used the AD measure to provide the strength of the root signal in gene tree branches. We note that the statistical frameworks can accommodate other measures that quantify the root signal for all gene tree branches. A fundamental element in our approach is the prior definition of a pool of candidate root partitions. We advocate deriving the initial set from roots inferred for CSC gene trees. A yet larger but manageable initial set may be constructed of splits frequently observed in the CSC gene trees. Importantly, the initial set need not be limited to observed partitions but can be augmented by a priori hypotheses informed by current phylogenetic and taxonomical precepts.

The inferred species root partition for the proteobacteria data set indicates that the proteobacteria LCA was more closely related to modern Epsilonproteobacteria in comparison with the other classes; characteristics of present-day species in that group can therefore be used to hypothesize about the biology of the proteobacteria LCA. Epsilonproteobacteria species show versatile biochemical strategies to fix carbon, enabling members of this class to colonize extreme environments such as deep-sea hydrothermal vents ( Campbell et al. 2006 ). Epsilonproteobacteria residing in deep-sea habitats are generally anaerobes, and many are characterized as chemolithoautotrophs ( Takai et al. 2005 ). The possibility that the ancient proteobacteria LCA had a chemolithoautotrophic and anaerobic life style for ancient lineages is in line with the scenario of life's early phase as predicted by the hydrothermal-vent theory for the origin of life ( Martin et al. 2008 ).

From a purely theoretical perspective, the inference of the LCA for a group of species amounts to the reconstruction of just one branch—the root branch—of the true unrooted species tree and should therefore be a much easier task than the full resolution of the rooted species tree. Approaches that pose the LCA problem in terms of rooting of a resolved species tree require the solution of a much harder problem as a prerequisite for addressing the easier task. Methods where the input information passes through a bottleneck of a single inferred species tree have the disadvantage that the actual inference of the LCA is based on a sample of size one.

An alternative approach for rooting species trees has emerged from the use of “gene-tree–species-tree reconciliation” models (or simply tree reconciliations) ( Szöllősi et al. 2012 ; Williams et al. 2017 ; Coleman et al. 2021 ). These models operate on topological differences between gene trees and a species tree, and the analyses attempt to bring the tree differences into agreement by invoking gene transfers, gene duplications, and gene losses, in any combination as needed. To discern among alternative reconciliation scenarios, the evolutionary rates for gene transfers, gene duplications, and gene losses need to be estimated from the data. Rate estimation is, however, challenging, and a previous study showed that evolutionary rates automatically estimated by a popular tree reconciliation model are often unrealistic and that the use of incorrect rates leads to incorrect root inferences ( Bremer et al. 2022 ). Indeed, evolutionary rates as estimated by tree reconciliation analyses often contradict more conservative estimates obtained by independent studies ( Treangen and Rocha 2011 ; Tria and Martin 2021 ), which may indicate possible biases within reconciliation models. By contrast, the rooting approach presented here does not rely on a priori estimates of rates of gene duplication and transfer, and as such, it offers a simpler solution to the species root problem. Furthermore, and contrary to tree reconciliations, our approach can perform rooting when the species tree is altogether unknown (albeit a candidate root partition is still necessary), which makes our tests especially appealing to rooting prokaryotic phylogenies for which species tree inferences are typically challenging.

Avoiding the reliance on a species tree prompt us to reevaluate what phylogenetic signal is directly relevant to LCA inference and to recast the task as that of sampling the total evidence from all gene families at the genomic scope. Moreover, dispensing with a single rooting operation of a single species tree facilitates the reformulation of LCA and root inference in the framework of statistical hypothesis testing. The analytical procedure we outline allows formally to test competing a priori LCA hypotheses and to infer confidence sets for the earliest speciation events in the history of a group of species. Indeed, our approach relies on the AD measure that may be sensitive to large deviations from clock-like evolutionary rates. Nonetheless, previous studies suggested that the majority of protein families in prokaryotic genomes are characterized by clock-like evolutionary rates ( Novichkov et al. 2004 ; Dagan et al. 2010 ). Additionally, we propose that the use of large gene tree samples assists in overcoming bias in the root inference that are due to conflicting signals in the data. Possible biases in the inference of species root partition may arise due to methodological artifacts (e.g., tree reconstruction errors) and confounding evolutionary processes such as lateral gene transfer, gene duplication, and gene loss. Indeed, our root inferences were consistent across samples of different gene tree categories (CSC, complete multicopy [CMC], PSC, and PMC), each category bearing different types, and degrees of conflicting signals. For instance, gene duplications are more frequent in multicopy gene trees (CMC and PMC), whereas gene losses are likely more frequent in partial trees (PSC and PMC). Phylogenomic inferences utilizing all gene trees for a species set are expected to increase the robustness of the root inference.

Our analyses of the demonstrative data sets show that different species sets present varying levels of LCA signal: The opisthokonta data set shows a strong root signal; the proteobacteria data set has a moderate LCA signal. Data sets with weak signal are better described in terms of a confidence sets for root partitions, reflecting the inherent uncertainties and avoiding the pitfalls in forcing a single-hypothesis result.

The LCA inferences presented here utilized 43–86% of the total number of gene families for root partition inferences. This is in stark contrast to the 0.5–1.3% of the gene families that are CSC and can be utilized by traditional approaches. The number of genes families considered in our tests corresponds to the number of genes encoded in modern genomes, supplying “total evidence” for LCA inferences.

Protein families for the opisthokonta and proteobacteria data sets were extracted from EggNOG version 4.5 ( Huerta-Cepas et al. 2016 ). Protein families were filtered based on the number of species, gene copy number, number of OTUs, and sequence length, as follows. Protein families present in less than four species were discarded. Suspected outlier sequences were detected based on their length relative to the median length: Sequences were removed if shorter than half or longer than twice the median. Species with more than ten copies of a gene were removed from the corresponding gene family. Multicopy gene families were discarded if the number of species was smaller than half the total number of OTUs ( table 1 ).

Protein sequences of the resulting protein families were aligned using MAFFT version 7.027b with L-INS-i alignment strategy ( Katoh and Standley 2013 ). Phylogenetic trees were reconstructed using iqtree version 1.6.6 with the model selection parameters “-mset LG -madd LG4X” ( Nguyen et al. 2015 ). The phylogenetic network ( fig. 5 ) was reconstructed using SplitsTree4 version 4.14.6 ( Huson and Bryant 2006 ). Branch ancestral deviation (AD) values and roots for the consensus analysis were inferred using mad.py version 2.21 ( Tria et al. 2017 ).

Supplementary data are available at Genome Biology and Evolution online ( http://www.gbe.oxfordjournals.org/ ).

We thank Maxime Godfroid for fruitful discussions. The study was supported by CAPES (Coordination for the Improvement of Higher Education Personnel–Brazil) (awarded to F.D.K.T.) and the European Research Council (Grant No. 281357 awarded to T.D.). D.R.P. is supported by the DFG funded CRC1182 the origin and function of metaorganisms.

The source code to run the phylogenomic rooting as well as the unrooted trees with AD values used in this study are found in the following repository: https://github.com/deropi/PhyloRooting.git . Additionally, R code to replicate some of the figures in this paper is also provided.

Benjamini Y , Hochberg Y . 1995 . Controlling the false discovery rate: a practical and powerful approach to multiple testing . J R Stat Soc Series B Stat Methodol . 57 : 289 – 300 .

Google Scholar

Bettisworth B , Stamatakis A . 2021 . Root digger: a root placement program for phylogenetic trees . BMC Bioinform 22 : 225 .

Bremer N , Knopp M , Martin WF , Tria FDK . 2022 . Realistic gene transfer to gene duplication ratios identify different roots in the bacterial phylogeny using a tree reconciliation method . Life 12 : 995 .

Campbell BJ , Engel AS , Porter ML , Takai K . 2006 . The versatile ε-proteobacteria: key players in sulphidic habitats . Nat Rev Microbiol . 4 : 458 – 468 .

Cherlin S , Heaps SE , Nye TMW , Boys RJ , Williams TA , Embley TM . 2018 . The effect of nonreversibility on inferring rooted phylogenies . Mol Biol Evol . 35 : 984 – 1002 .

Ciccarelli FD , Doerks T , von Mering C , Creevey CJ , Snel B , Bork P . 2006 . Toward automatic reconstruction of a highly resolved tree of life . Science 311 : 1283 – 1287 .

Coleman GA , Davín AA , Mahendrarajah TA , Szánthó LL , Spang A , Hugenholtz P , Szöllősi GJ , Williams TA . 2021 . A rooted phylogeny resolves early bacterial evolution . Science 372 : eabe0511 .

Dagan T , Martin W . 2006 . The tree of one percent . Genome Biol . 7 : 118 .

Dagan T , Roettger M , Bryant D , Martin W . 2010 . Genome networks root the tree of life between prokaryotic domains . Genome Biol Evol. 2 : 379 – 392 .

Dagan T , Roettger M , Stucken K , Landan G , Koch R , Major P , Gould SB , Goremykin VV , Rippka R , Tandeau de Marsac N , et al.  2013 . Genomes of Stigonematalean cyanobacteria (subsection V) and the evolution of oxygenic photosynthesis from prokaryotes to plastids . Genome Biol Evol . 5 : 31 – 44 .

Doolittle WF , Bapteste E . 2007 . Pattern pluralism and the tree of life hypothesis . Proc Natl Acad Sci U S A . 104 : 2043 – 2049 .

Eisen JA . 2003 . Phylogenomics: intersection of evolution and genomics . Science 300 : 1706 – 1707 .

Farris JS . 1972 . Estimating phylogenetic trees from distance matrices . Am Nat . 106 : 645 – 668 .

Fitch WM , Margoliash E . 1967 . Construction of phylogenetic trees . Science 155 : 279 – 284 .

Fox GE , Stackebrandt E , Hespell RB , Gibson J , Maniloff J , Dyer TA , Wolfe RS , Balch WE , Tanner RS , Magrum LJ , et al.  1980 . The phylogeny of prokaryotes . Science 209 : 457 – 463 .

Gogarten JP , Kibak H , Dittrich P , Taiz L , Bowman EJ , Bowman BJ , Manolson MF , Poole RJ , Date T , Oshima T , et al.  1989 . Evolution of the vacuolar H + -ATPase: implications for the origin of eukaryotes . Proc Natl Acad Sci U S A . 86 : 6661 – 6665 .

Hammerschmidt K , Landan G , Domingues Kümmel Tria F , Alcorta J , Dagan T . 2021 . The order of trait emergence in the evolution of cyanobacterial multicellularity . Genome Biol Evol . 13:evaa249 .

Huelsenbeck JP , Bollback JP , Levine AM . 2002 . Inferring the root of a phylogenetic tree . Syst Biol . 51 : 32 – 43 .

Huerta-Cepas J , Cepas J , Szklarczyk D , Forslund K , Cook H , Heller D , Walter MC , Rattei T , Mende DR , Sunagawa S , et al.  2016 . eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences . Nucleic Acids Res . 44 : D286 – D293 .

Hug LA , Baker BJ , Anantharaman K , Brown CT , Probst AJ , Castelle CJ , Butterfield CN , Hernsdorf AW , Amano Y , Ise K , et al.  2016 . A new view of the tree of life . Nat Microbiol . 1 : 1 – 6 .

Huson DH , Bryant D . 2006 . Application of phylogenetic networks in evolutionary studies . Mol Biol Evol . 23 : 254 – 267 .

Iwabe N , Kuma K , Hasegawa M , Osawa S , Miyata T . 1989 . Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes . Proc Natl Acad Sci U S A . 86 : 9355 – 9359 .

Katoh K , Standley DM . 2013 . MAFFT Multiple sequence alignment software version 7: improvements in performance and usability . Mol Biol Evol . 30 : 772 – 780 .

Katz LA . 2012 . Origin and diversification of eukaryotes . Annu Rev Microbiol . 66 : 411 – 427 .

Kluge AG , Farris JS . 1969 . Quantitative phyletics and the evolution of anurans . Syst Biol . 18 : 1 – 32 .

Lang JM , Darling AE , Eisen JA . 2013 . Phylogeny of bacterial and archaeal genomes using conserved genes: supertrees and supermatrices . PLoS One 8 : e62510-15 .

Lepage T , Bryant D , Philippe H , Lartillot N . 2007 . A general comparison of relaxed molecular clock models . Mol Biol Evol . 24 : 2669 – 2680 .

Linz S , Radtke A , von Haeseler A . 2007 . A likelihood framework to measure horizontal gene transfer . Mol Biol Evol . 24 : 1312 – 1319 .

Lovejoy CO , Suwa G , Simpson SW , Matternes JH , White TD . 2009 . The great divides: Ardipithecus ramidus reveals the postcrania of our last common ancestors with African apes . Science 326 : 100 – 106 .

Mai U , Sayyari E , Mirarab S . 2017 . Minimum variance rooting of phylogenetic trees and implications for species tree reconstruction . PLoS One 12 : e0182238 .

Martin W , Baross J , Kelley D , Russell MJ . 2008 . Hydrothermal vents and the origin of life . Nat Rev Microbiol . 6 : 805 – 814 .

Medini D , Donati C , Tettelin H , Masignani V , Rappuoli R . 2005 . The microbial pan-genome . Curr Opin Genet Dev . 15 : 589 – 594 .

Morel B , Schade P , Lutteropp S , Williams TA , Szöllősi GJ , Stamatakis A . 2022 . SpeciesRax: a tool for Maximum likelihood Species tree inference from gene family trees under duplication, transfer, and loss . Mol Biol Evol. 39 : msab365 .

Naser-Khdour S , Quang Minh B , Lanfear R . 2022 . Assessing confidence in root placement on phylogenies: an empirical study using nonreversible models for mammals . Syst Biol . 71 : 959 – 972 .

Nguyen L-T , Schmidt HA , von Haeseler A , Minh BQ . 2015 . IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies . Mol Biol Evol . 32 : 268 – 274 .

Novichkov PS , Omelchenko MV , Gelfand MS , Mironov AA , Wolf YI , Koonin EV . 2004 . Genome-wide molecular clock and horizontal gene transfer in bacterial evolution . J Bacteriol . 186 : 6575 – 6585 .

Okamoto E , Kusakabe R , Kuraku S , Hyodo S , Robert-Moreno A , Onimaru K , Sharpe J , Kuratani S , Tanaka M . 2017 . Migratory appendicular muscles precursor cells in the common ancestor to all vertebrates . Nat Ecol Evol . 1 : 1731 – 1736 .

Parks DH , Chuvochina M , Waite DW , Rinke C , Skarshewski A , Chaumeil P-A , Hugenholtz P . 2018 . A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life . Nat Biotechnol . 36 : 996 – 1004 .

Pisani D , Cotton JA , McInerney JO . 2007 . Supertrees disentangle the chimerical origin of eukaryotic genomes . Mol Biol Evol . 24 : 1752 – 1760 .

Semple C , Steel M . 2001 . Tree reconstruction via a closure operation on partial splits . In: Gascuel O, Sagot M-F, editors. Computational Biology. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer . p. 126 – 134 .

Smith ML , Hahn MW . 2021 . New approaches for inferring phylogenies in the presence of paralogs . Trends Genet . 37 : 174 – 187 .

Stechmann A , Cavalier-Smith T . 2002 . Rooting the eukaryote tree by using a derived gene fusion . Science 297 : 89 – 91 .

Swenson KM , El-Mabrouk N . 2012 . Gene trees and species trees: irreconcilable differences . BMC Bioinform . 13 : S15 .

Szöllősi GJ , Boussau B , Abby SS , Tannier E , Daubin V . 2012 . Phylogenetic modeling of lateral gene transfer reconstructs the pattern and relative timing of speciations . Proc Natl Acad Sci U S A . 109 : 17513 – 17518 .

Szöllősi GJ , Tannier E , Daubin V , Boussau B . 2015 . The inference of gene trees with species trees . Syst Biol . 64 : e42 – e62 .

Takai K , Campbell BJ , Cary SC , Suzuki M , Oida H , Nunoura T , Hirayama H , Nakagawa S , Suzuki Y , Inagaki F , et al.  2005 . Enzymatic and genetic characterization of carbon and energy metabolisms by deep-sea hydrothermal chemolithoautotrophic isolates of Epsilonproteobacteria . Appl Environ Microbiol . 71 : 7310 – 7320 .

Treangen TJ , Rocha EPC . 2011 . Horizontal transfer, not duplication, drives the expansion of protein families in prokaryotes . PLoS Genet . 7 : e1001284-12 .

Tria FDK , Landan G , Dagan T . 2017 . Phylogenetic rooting using minimal ancestor deviation . Nat Ecol Evol . 1 : 0193 .

Tria FDK , Martin WF . 2021 . Gene duplications are at least 50 times less frequent than gene transfers in prokaryotic genomes . Genome Biol Evol . 13 : evab224 .

Waite DW , Vanwonterghem I , Rinke C , Parks DH , Zhang Y , Takai K , Sievert SM , Simon J , Campbell BJ , Hanson TE , et al.  2017 . Comparative genomic analysis of the class Epsilonproteobacteria and proposed reclassification to Epsilonbacteraeota (phyl. nov) . Front Microbiol . 8 : 4962 – 4919 .

Weiss MC , Sousa FL , Mrnjavac N , Neukirchen S , Roettger M , Nelson-Sathi S , Martin WF . 2016 . The physiology and habitat of the last universal common ancestor . Nat Microbiol 1: 16116 .

Whidden C , Zeh N , Beiko RG . 2014 . Supertrees based on the subtree prune-and-regraft distance . Syst Biol . 63 : 566 – 581 .

Williams TA , Cox CJ , Foster PG , Szöllősi GJ , Embley TM . 2020 . Phylogenomics provides robust support for a two-domains tree of life . Nat Ecol Evol . 4 : 138 – 147 .

Williams TA , Heaps SE , Cherlin S , Nye TMW , Boys RJ , Embley TM . 2015 . New substitution models for rooting phylogenetic trees . Philos Trans R Soc B Biol Sci . 370 : 20140336 .

Williams TA , Szöllősi GJ , Spang A , Foster PG , Heaps SE , Boussau B , Ettema TJG , Embley TM . 2017 . Integrative modeling of gene and genome evolution roots the archaeal tree of life . Proc Natl Acad Sci U S A . 114 : E4602 – E4611 .

Zhu Q , Mai U , Pfeiffer W , Janssen S , Asnicar F , Sanders JG , Belda-Ferre P , Al-Ghalith GA , Kopylova E , McDonald D , et al.  2019 . Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea . Nat Commun . 10 : 5477 .

Author notes

Supplementary data.

Month: Total Views:
May 2023 11
June 2023 148
July 2023 568
August 2023 61
September 2023 54
October 2023 58
November 2023 57
December 2023 44
January 2024 65
February 2024 55
March 2024 34
April 2024 50
May 2024 48
June 2024 25
July 2024 50
August 2024 20

Email alerts

Citing articles via, affiliations.

  • Online ISSN 1759-6653
  • Copyright © 2024 Society for Molecular Biology and Evolution
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Hypothesis testing in evolutionary developmental biology: a case study from insect wings

Affiliation.

  • 1 Department of Ecology and Evolutionary Biology, 75 N. Eagleville Rd., U-3043, University of Connecticut, Storrs, CT 06269, USA. [email protected]
  • PMID: 15388766
  • DOI: 10.1093/jhered/esh064

Developmental data have the potential to give novel insights into morphological evolution. Because developmental data are time-consuming to obtain, support for hypotheses often rests on data from only a few distantly related species. Similarities between these distantly related species are parsimoniously inferred to represent ancestral aspects of development. However, with limited taxon sampling, ancestral similarities in developmental patterning can be difficult to distinguish from similarities that result from convergent co-option of developmental networks, which appears to be common in developmental evolution. Using a case study from insect wings, we discuss how these competing explanations for similarity can be evaluated. Two kinds of developmental data have recently been used to support the hypothesis that insect wings evolved by modification of limb branches that were present in ancestral arthropods. This support rests on the assumption that aspects of wing development in Drosophila, including similarities to crustacean epipod patterning, are ancestral for winged insects. Testing this assumption requires comparisons of wing development in Drosophila and other winged insects. Here we review data that bear on this assumption, including new data on the functions of wingless and decapentaplegic during appendage allocation in the red flour beetle Tribolium castaneum.

PubMed Disclaimer

Similar articles

  • The roles of wingless and decapentaplegic in axis and appendage development in the red flour beetle, Tribolium castaneum. Ober KA, Jockusch EL. Ober KA, et al. Dev Biol. 2006 Jun 15;294(2):391-405. doi: 10.1016/j.ydbio.2006.02.053. Epub 2006 Apr 17. Dev Biol. 2006. PMID: 16616738
  • Loss and recovery of wings in stick insects. Whiting MF, Bradler S, Maxwell T. Whiting MF, et al. Nature. 2003 Jan 16;421(6920):264-7. doi: 10.1038/nature01313. Nature. 2003. PMID: 12529642
  • Conservation of wingless patterning functions in the short-germ embryos of Tribolium castaneum. Nagy LM, Carroll S. Nagy LM, et al. Nature. 1994 Feb 3;367(6462):460-3. doi: 10.1038/367460a0. Nature. 1994. PMID: 8107804
  • A genome-wide inventory of neurohormone GPCRs in the red flour beetle Tribolium castaneum. Hauser F, Cazzamali G, Williamson M, Park Y, Li B, Tanaka Y, Predel R, Neupert S, Schachtner J, Verleyen P, Grimmelikhuijzen CJ. Hauser F, et al. Front Neuroendocrinol. 2008 Jan;29(1):142-65. doi: 10.1016/j.yfrne.2007.10.003. Epub 2007 Oct 24. Front Neuroendocrinol. 2008. PMID: 18054377 Review.
  • Patterns on the insect wing. Parchem RJ, Perry MW, Patel NH. Parchem RJ, et al. Curr Opin Genet Dev. 2007 Aug;17(4):300-8. doi: 10.1016/j.gde.2007.05.006. Epub 2007 Jul 12. Curr Opin Genet Dev. 2007. PMID: 17627807 Review.
  • What serial homologs can tell us about the origin of insect wings. Tomoyasu Y, Ohde T, Clark-Hachtel C. Tomoyasu Y, et al. F1000Res. 2017 Mar 14;6:268. doi: 10.12688/f1000research.10285.1. eCollection 2017. F1000Res. 2017. PMID: 28357056 Free PMC article. Review.
  • Origin and diversification of wings: Insights from a neopteran insect. Medved V, Marden JH, Fescemyer HW, Der JP, Liu J, Mahfooz N, Popadić A. Medved V, et al. Proc Natl Acad Sci U S A. 2015 Dec 29;112(52):15946-51. doi: 10.1073/pnas.1509517112. Epub 2015 Dec 14. Proc Natl Acad Sci U S A. 2015. PMID: 26668365 Free PMC article.
  • Insights into the molecular mechanisms underlying diversified wing venation among insects. Shimmi O, Matsuda S, Hatakeyama M. Shimmi O, et al. Proc Biol Sci. 2014 Aug 22;281(1789):20140264. doi: 10.1098/rspb.2014.0264. Proc Biol Sci. 2014. PMID: 25009057 Free PMC article.
  • A common set of DNA regulatory elements shapes Drosophila appendages. McKay DJ, Lieb JD. McKay DJ, et al. Dev Cell. 2013 Nov 11;27(3):306-18. doi: 10.1016/j.devcel.2013.10.009. Dev Cell. 2013. PMID: 24229644 Free PMC article.
  • Insights into insect wing origin provided by functional analysis of vestigial in the red flour beetle, Tribolium castaneum. Clark-Hachtel CM, Linz DM, Tomoyasu Y. Clark-Hachtel CM, et al. Proc Natl Acad Sci U S A. 2013 Oct 15;110(42):16951-6. doi: 10.1073/pnas.1304332110. Epub 2013 Oct 1. Proc Natl Acad Sci U S A. 2013. PMID: 24085843 Free PMC article.

Publication types

  • Search in MeSH

LinkOut - more resources

Full text sources.

  • Ovid Technologies, Inc.
  • Silverchair Information Systems

Molecular Biology Databases

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

  • Data Science
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • Deep Learning
  • Computer Vision
  • Artificial Intelligence
  • AI ML DS Interview Series
  • AI ML DS Projects series
  • Data Engineering
  • Web Scrapping

Understanding Hypothesis Testing

Hypothesis testing involves formulating assumptions about population parameters based on sample statistics and rigorously evaluating these assumptions against empirical evidence. This article sheds light on the significance of hypothesis testing and the critical steps involved in the process.

What is Hypothesis Testing?

A hypothesis is an assumption or idea, specifically a statistical claim about an unknown population parameter. For example, a judge assumes a person is innocent and verifies this by reviewing evidence and hearing testimony before reaching a verdict.

Hypothesis testing is a statistical method that is used to make a statistical decision using experimental data. Hypothesis testing is basically an assumption that we make about a population parameter. It evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. 

To test the validity of the claim or assumption about the population parameter:

  • A sample is drawn from the population and analyzed.
  • The results of the analysis are used to decide whether the claim is true or not.
Example: You say an average height in the class is 30 or a boy is taller than a girl. All of these is an assumption that we are assuming, and we need some statistical way to prove these. We need some mathematical conclusion whatever we are assuming is true.

Defining Hypotheses

  • Null hypothesis (H 0 ): In statistics, the null hypothesis is a general statement or default position that there is no relationship between two measured cases or no relationship among groups. In other words, it is a basic assumption or made based on the problem knowledge. Example : A company’s mean production is 50 units/per da H 0 : [Tex]\mu [/Tex] = 50.
  • Alternative hypothesis (H 1 ): The alternative hypothesis is the hypothesis used in hypothesis testing that is contrary to the null hypothesis.  Example: A company’s production is not equal to 50 units/per day i.e. H 1 : [Tex]\mu [/Tex] [Tex]\ne [/Tex] 50.

Key Terms of Hypothesis Testing

  • Level of significance : It refers to the degree of significance in which we accept or reject the null hypothesis. 100% accuracy is not possible for accepting a hypothesis, so we, therefore, select a level of significance that is usually 5%. This is normally denoted with  [Tex]\alpha[/Tex] and generally, it is 0.05 or 5%, which means your output should be 95% confident to give a similar kind of result in each sample.
  • P-value: The P value , or calculated probability, is the probability of finding the observed/extreme results when the null hypothesis(H0) of a study-given problem is true. If your P-value is less than the chosen significance level then you reject the null hypothesis i.e. accept that your sample claims to support the alternative hypothesis.
  • Test Statistic: The test statistic is a numerical value calculated from sample data during a hypothesis test, used to determine whether to reject the null hypothesis. It is compared to a critical value or p-value to make decisions about the statistical significance of the observed results.
  • Critical value : The critical value in statistics is a threshold or cutoff point used to determine whether to reject the null hypothesis in a hypothesis test.
  • Degrees of freedom: Degrees of freedom are associated with the variability or freedom one has in estimating a parameter. The degrees of freedom are related to the sample size and determine the shape.

Why do we use Hypothesis Testing?

Hypothesis testing is an important procedure in statistics. Hypothesis testing evaluates two mutually exclusive population statements to determine which statement is most supported by sample data. When we say that the findings are statistically significant, thanks to hypothesis testing. 

One-Tailed and Two-Tailed Test

One tailed test focuses on one direction, either greater than or less than a specified value. We use a one-tailed test when there is a clear directional expectation based on prior knowledge or theory. The critical region is located on only one side of the distribution curve. If the sample falls into this critical region, the null hypothesis is rejected in favor of the alternative hypothesis.

One-Tailed Test

There are two types of one-tailed test:

  • Left-Tailed (Left-Sided) Test: The alternative hypothesis asserts that the true parameter value is less than the null hypothesis. Example: H 0 ​: [Tex]\mu \geq 50 [/Tex] and H 1 : [Tex]\mu < 50 [/Tex]
  • Right-Tailed (Right-Sided) Test : The alternative hypothesis asserts that the true parameter value is greater than the null hypothesis. Example: H 0 : [Tex]\mu \leq50 [/Tex] and H 1 : [Tex]\mu > 50 [/Tex]

Two-Tailed Test

A two-tailed test considers both directions, greater than and less than a specified value.We use a two-tailed test when there is no specific directional expectation, and want to detect any significant difference.

Example: H 0 : [Tex]\mu = [/Tex] 50 and H 1 : [Tex]\mu \neq 50 [/Tex]

To delve deeper into differences into both types of test: Refer to link

What are Type 1 and Type 2 errors in Hypothesis Testing?

In hypothesis testing, Type I and Type II errors are two possible errors that researchers can make when drawing conclusions about a population based on a sample of data. These errors are associated with the decisions made regarding the null hypothesis and the alternative hypothesis.

  • Type I error: When we reject the null hypothesis, although that hypothesis was true. Type I error is denoted by alpha( [Tex]\alpha [/Tex] ).
  • Type II errors : When we accept the null hypothesis, but it is false. Type II errors are denoted by beta( [Tex]\beta [/Tex] ).


Null Hypothesis is True

Null Hypothesis is False

Null Hypothesis is True (Accept)

Correct Decision

Type II Error (False Negative)

Alternative Hypothesis is True (Reject)

Type I Error (False Positive)

Correct Decision

How does Hypothesis Testing work?

Step 1: define null and alternative hypothesis.

State the null hypothesis ( [Tex]H_0 [/Tex] ), representing no effect, and the alternative hypothesis ( [Tex]H_1 [/Tex] ​), suggesting an effect or difference.

We first identify the problem about which we want to make an assumption keeping in mind that our assumption should be contradictory to one another, assuming Normally distributed data.

Step 2 – Choose significance level

Select a significance level ( [Tex]\alpha [/Tex] ), typically 0.05, to determine the threshold for rejecting the null hypothesis. It provides validity to our hypothesis test, ensuring that we have sufficient data to back up our claims. Usually, we determine our significance level beforehand of the test. The p-value is the criterion used to calculate our significance value.

Step 3 – Collect and Analyze data.

Gather relevant data through observation or experimentation. Analyze the data using appropriate statistical methods to obtain a test statistic.

Step 4-Calculate Test Statistic

The data for the tests are evaluated in this step we look for various scores based on the characteristics of data. The choice of the test statistic depends on the type of hypothesis test being conducted.

There are various hypothesis tests, each appropriate for various goal to calculate our test. This could be a Z-test , Chi-square , T-test , and so on.

  • Z-test : If population means and standard deviations are known. Z-statistic is commonly used.
  • t-test : If population standard deviations are unknown. and sample size is small than t-test statistic is more appropriate.
  • Chi-square test : Chi-square test is used for categorical data or for testing independence in contingency tables
  • F-test : F-test is often used in analysis of variance (ANOVA) to compare variances or test the equality of means across multiple groups.

We have a smaller dataset, So, T-test is more appropriate to test our hypothesis.

T-statistic is a measure of the difference between the means of two groups relative to the variability within each group. It is calculated as the difference between the sample means divided by the standard error of the difference. It is also known as the t-value or t-score.

Step 5 – Comparing Test Statistic:

In this stage, we decide where we should accept the null hypothesis or reject the null hypothesis. There are two ways to decide where we should accept or reject the null hypothesis.

Method A: Using Crtical values

Comparing the test statistic and tabulated critical value we have,

  • If Test Statistic>Critical Value: Reject the null hypothesis.
  • If Test Statistic≤Critical Value: Fail to reject the null hypothesis.

Note: Critical values are predetermined threshold values that are used to make a decision in hypothesis testing. To determine critical values for hypothesis testing, we typically refer to a statistical distribution table , such as the normal distribution or t-distribution tables based on.

Method B: Using P-values

We can also come to an conclusion using the p-value,

  • If the p-value is less than or equal to the significance level i.e. ( [Tex]p\leq\alpha [/Tex] ), you reject the null hypothesis. This indicates that the observed results are unlikely to have occurred by chance alone, providing evidence in favor of the alternative hypothesis.
  • If the p-value is greater than the significance level i.e. ( [Tex]p\geq \alpha[/Tex] ), you fail to reject the null hypothesis. This suggests that the observed results are consistent with what would be expected under the null hypothesis.

Note : The p-value is the probability of obtaining a test statistic as extreme as, or more extreme than, the one observed in the sample, assuming the null hypothesis is true. To determine p-value for hypothesis testing, we typically refer to a statistical distribution table , such as the normal distribution or t-distribution tables based on.

Step 7- Interpret the Results

At last, we can conclude our experiment using method A or B.

Calculating test statistic

To validate our hypothesis about a population parameter we use statistical functions . We use the z-score, p-value, and level of significance(alpha) to make evidence for our hypothesis for normally distributed data .

1. Z-statistics:

When population means and standard deviations are known.

[Tex]z = \frac{\bar{x} – \mu}{\frac{\sigma}{\sqrt{n}}}[/Tex]

  • [Tex]\bar{x} [/Tex] is the sample mean,
  • μ represents the population mean, 
  • σ is the standard deviation
  • and n is the size of the sample.

2. T-Statistics

T test is used when n<30,

t-statistic calculation is given by:

[Tex]t=\frac{x̄-μ}{s/\sqrt{n}} [/Tex]

  • t = t-score,
  • x̄ = sample mean
  • μ = population mean,
  • s = standard deviation of the sample,
  • n = sample size

3. Chi-Square Test

Chi-Square Test for Independence categorical Data (Non-normally distributed) using:

[Tex]\chi^2 = \sum \frac{(O_{ij} – E_{ij})^2}{E_{ij}}[/Tex]

  • [Tex]O_{ij}[/Tex] is the observed frequency in cell [Tex]{ij} [/Tex]
  • i,j are the rows and columns index respectively.
  • [Tex]E_{ij}[/Tex] is the expected frequency in cell [Tex]{ij}[/Tex] , calculated as : [Tex]\frac{{\text{{Row total}} \times \text{{Column total}}}}{{\text{{Total observations}}}}[/Tex]

Real life Examples of Hypothesis Testing

Let’s examine hypothesis testing using two real life situations,

Case A: D oes a New Drug Affect Blood Pressure?

Imagine a pharmaceutical company has developed a new drug that they believe can effectively lower blood pressure in patients with hypertension. Before bringing the drug to market, they need to conduct a study to assess its impact on blood pressure.

  • Before Treatment: 120, 122, 118, 130, 125, 128, 115, 121, 123, 119
  • After Treatment: 115, 120, 112, 128, 122, 125, 110, 117, 119, 114

Step 1 : Define the Hypothesis

  • Null Hypothesis : (H 0 )The new drug has no effect on blood pressure.
  • Alternate Hypothesis : (H 1 )The new drug has an effect on blood pressure.

Step 2: Define the Significance level

Let’s consider the Significance level at 0.05, indicating rejection of the null hypothesis.

If the evidence suggests less than a 5% chance of observing the results due to random variation.

Step 3 : Compute the test statistic

Using paired T-test analyze the data to obtain a test statistic and a p-value.

The test statistic (e.g., T-statistic) is calculated based on the differences between blood pressure measurements before and after treatment.

t = m/(s/√n)

  • m  = mean of the difference i.e X after, X before
  • s  = standard deviation of the difference (d) i.e d i ​= X after, i ​− X before,
  • n  = sample size,

then, m= -3.9, s= 1.8 and n= 10

we, calculate the , T-statistic = -9 based on the formula for paired t test

Step 4: Find the p-value

The calculated t-statistic is -9 and degrees of freedom df = 9, you can find the p-value using statistical software or a t-distribution table.

thus, p-value = 8.538051223166285e-06

Step 5: Result

  • If the p-value is less than or equal to 0.05, the researchers reject the null hypothesis.
  • If the p-value is greater than 0.05, they fail to reject the null hypothesis.

Conclusion: Since the p-value (8.538051223166285e-06) is less than the significance level (0.05), the researchers reject the null hypothesis. There is statistically significant evidence that the average blood pressure before and after treatment with the new drug is different.

Python Implementation of Case A

Let’s create hypothesis testing with python, where we are testing whether a new drug affects blood pressure. For this example, we will use a paired T-test. We’ll use the scipy.stats library for the T-test.

Scipy is a mathematical library in Python that is mostly used for mathematical equations and computations.

We will implement our first real life problem via python,

import numpy as np from scipy import stats # Data before_treatment = np . array ([ 120 , 122 , 118 , 130 , 125 , 128 , 115 , 121 , 123 , 119 ]) after_treatment = np . array ([ 115 , 120 , 112 , 128 , 122 , 125 , 110 , 117 , 119 , 114 ]) # Step 1: Null and Alternate Hypotheses # Null Hypothesis: The new drug has no effect on blood pressure. # Alternate Hypothesis: The new drug has an effect on blood pressure. null_hypothesis = "The new drug has no effect on blood pressure." alternate_hypothesis = "The new drug has an effect on blood pressure." # Step 2: Significance Level alpha = 0.05 # Step 3: Paired T-test t_statistic , p_value = stats . ttest_rel ( after_treatment , before_treatment ) # Step 4: Calculate T-statistic manually m = np . mean ( after_treatment - before_treatment ) s = np . std ( after_treatment - before_treatment , ddof = 1 ) # using ddof=1 for sample standard deviation n = len ( before_treatment ) t_statistic_manual = m / ( s / np . sqrt ( n )) # Step 5: Decision if p_value <= alpha : decision = "Reject" else : decision = "Fail to reject" # Conclusion if decision == "Reject" : conclusion = "There is statistically significant evidence that the average blood pressure before and after treatment with the new drug is different." else : conclusion = "There is insufficient evidence to claim a significant difference in average blood pressure before and after treatment with the new drug." # Display results print ( "T-statistic (from scipy):" , t_statistic ) print ( "P-value (from scipy):" , p_value ) print ( "T-statistic (calculated manually):" , t_statistic_manual ) print ( f "Decision: { decision } the null hypothesis at alpha= { alpha } ." ) print ( "Conclusion:" , conclusion )

T-statistic (from scipy): -9.0 P-value (from scipy): 8.538051223166285e-06 T-statistic (calculated manually): -9.0 Decision: Reject the null hypothesis at alpha=0.05. Conclusion: There is statistically significant evidence that the average blood pressure before and after treatment with the new drug is different.

In the above example, given the T-statistic of approximately -9 and an extremely small p-value, the results indicate a strong case to reject the null hypothesis at a significance level of 0.05. 

  • The results suggest that the new drug, treatment, or intervention has a significant effect on lowering blood pressure.
  • The negative T-statistic indicates that the mean blood pressure after treatment is significantly lower than the assumed population mean before treatment.

Case B : Cholesterol level in a population

Data: A sample of 25 individuals is taken, and their cholesterol levels are measured.

Cholesterol Levels (mg/dL): 205, 198, 210, 190, 215, 205, 200, 192, 198, 205, 198, 202, 208, 200, 205, 198, 205, 210, 192, 205, 198, 205, 210, 192, 205.

Populations Mean = 200

Population Standard Deviation (σ): 5 mg/dL(given for this problem)

Step 1: Define the Hypothesis

  • Null Hypothesis (H 0 ): The average cholesterol level in a population is 200 mg/dL.
  • Alternate Hypothesis (H 1 ): The average cholesterol level in a population is different from 200 mg/dL.

As the direction of deviation is not given , we assume a two-tailed test, and based on a normal distribution table, the critical values for a significance level of 0.05 (two-tailed) can be calculated through the z-table and are approximately -1.96 and 1.96.

The test statistic is calculated by using the z formula Z = [Tex](203.8 – 200) / (5 \div \sqrt{25}) [/Tex] ​ and we get accordingly , Z =2.039999999999992.

Step 4: Result

Since the absolute value of the test statistic (2.04) is greater than the critical value (1.96), we reject the null hypothesis. And conclude that, there is statistically significant evidence that the average cholesterol level in the population is different from 200 mg/dL

Python Implementation of Case B

import scipy.stats as stats import math import numpy as np # Given data sample_data = np . array ( [ 205 , 198 , 210 , 190 , 215 , 205 , 200 , 192 , 198 , 205 , 198 , 202 , 208 , 200 , 205 , 198 , 205 , 210 , 192 , 205 , 198 , 205 , 210 , 192 , 205 ]) population_std_dev = 5 population_mean = 200 sample_size = len ( sample_data ) # Step 1: Define the Hypotheses # Null Hypothesis (H0): The average cholesterol level in a population is 200 mg/dL. # Alternate Hypothesis (H1): The average cholesterol level in a population is different from 200 mg/dL. # Step 2: Define the Significance Level alpha = 0.05 # Two-tailed test # Critical values for a significance level of 0.05 (two-tailed) critical_value_left = stats . norm . ppf ( alpha / 2 ) critical_value_right = - critical_value_left # Step 3: Compute the test statistic sample_mean = sample_data . mean () z_score = ( sample_mean - population_mean ) / \ ( population_std_dev / math . sqrt ( sample_size )) # Step 4: Result # Check if the absolute value of the test statistic is greater than the critical values if abs ( z_score ) > max ( abs ( critical_value_left ), abs ( critical_value_right )): print ( "Reject the null hypothesis." ) print ( "There is statistically significant evidence that the average cholesterol level in the population is different from 200 mg/dL." ) else : print ( "Fail to reject the null hypothesis." ) print ( "There is not enough evidence to conclude that the average cholesterol level in the population is different from 200 mg/dL." )

Reject the null hypothesis. There is statistically significant evidence that the average cholesterol level in the population is different from 200 mg/dL.

Limitations of Hypothesis Testing

  • Although a useful technique, hypothesis testing does not offer a comprehensive grasp of the topic being studied. Without fully reflecting the intricacy or whole context of the phenomena, it concentrates on certain hypotheses and statistical significance.
  • The accuracy of hypothesis testing results is contingent on the quality of available data and the appropriateness of statistical methods used. Inaccurate data or poorly formulated hypotheses can lead to incorrect conclusions.
  • Relying solely on hypothesis testing may cause analysts to overlook significant patterns or relationships in the data that are not captured by the specific hypotheses being tested. This limitation underscores the importance of complimenting hypothesis testing with other analytical approaches.

Hypothesis testing stands as a cornerstone in statistical analysis, enabling data scientists to navigate uncertainties and draw credible inferences from sample data. By systematically defining null and alternative hypotheses, choosing significance levels, and leveraging statistical tests, researchers can assess the validity of their assumptions. The article also elucidates the critical distinction between Type I and Type II errors, providing a comprehensive understanding of the nuanced decision-making process inherent in hypothesis testing. The real-life example of testing a new drug’s effect on blood pressure using a paired T-test showcases the practical application of these principles, underscoring the importance of statistical rigor in data-driven decision-making.

Frequently Asked Questions (FAQs)

1. what are the 3 types of hypothesis test.

There are three types of hypothesis tests: right-tailed, left-tailed, and two-tailed. Right-tailed tests assess if a parameter is greater, left-tailed if lesser. Two-tailed tests check for non-directional differences, greater or lesser.

2.What are the 4 components of hypothesis testing?

Null Hypothesis ( [Tex]H_o [/Tex] ): No effect or difference exists. Alternative Hypothesis ( [Tex]H_1 [/Tex] ): An effect or difference exists. Significance Level ( [Tex]\alpha [/Tex] ): Risk of rejecting null hypothesis when it’s true (Type I error). Test Statistic: Numerical value representing observed evidence against null hypothesis.

3.What is hypothesis testing in ML?

Statistical method to evaluate the performance and validity of machine learning models. Tests specific hypotheses about model behavior, like whether features influence predictions or if a model generalizes well to unseen data.

4.What is the difference between Pytest and hypothesis in Python?

Pytest purposes general testing framework for Python code while Hypothesis is a Property-based testing framework for Python, focusing on generating test cases based on specified properties of the code.

Please Login to comment...

Similar reads.

  • data-science
  • How to Get a Free SSL Certificate
  • Best SSL Certificates Provider in India
  • Elon Musk's xAI releases Grok-2 AI assistant
  • What is OpenAI SearchGPT? How it works and How to Get it?
  • Content Improvement League 2024: From Good To A Great Article

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • v.11(11); 2021 Jun

When are hypotheses useful in ecology and evolution?

Matthew g. betts.

1 Forest Biodiversity Research Network, Department of Forest Ecosystems and Society, Oregon State University, Corvallis OR, USA

Adam S. Hadley

David w. frey, sarah j. k. frey, dusty gannon, scott h. harris, urs g. kormann, kara leimberger, katie moriarty.

2 USDA Forest Service, Pacific Northwest Research Station, Corvallis OR, USA

Joseph M. Northrup

3 Wildlife Research and Monitoring Section, Ontario Ministry of Natural Resources and Forestry, Environmental and Life Sciences Graduate Program, Trent University, Peterborough ON, Canada

Josée S. Rousseau

Thomas d. stokely, jonathon j. valente, diego zárrate‐charry, associated data.

Data for the analysis of hypothesis use in ecology and evolution publications is available at https://figshare.com/articles/dataset/Betts_et_al_2021_When_are_hypotheses_useful_in_ecology_and_evolution_Ecology_and_Evolution/14110289 .

Research hypotheses have been a cornerstone of science since before Galileo. Many have argued that hypotheses (1) encourage discovery of mechanisms, and (2) reduce bias—both features that should increase transferability and reproducibility. However, we are entering a new era of big data and highly predictive models where some argue the hypothesis is outmoded. We hypothesized that hypothesis use has declined in ecology and evolution since the 1990s, given the substantial advancement of tools further facilitating descriptive, correlative research. Alternatively, hypothesis use may have become more frequent due to the strong recommendation by some journals and funding agencies that submissions have hypothesis statements. Using a detailed literature analysis ( N  = 268 articles), we found prevalence of hypotheses in eco–evo research is very low (6.7%–26%) and static from 1990–2015, a pattern mirrored in an extensive literature search ( N  = 302,558 articles). Our literature review also indicates that neither grant success nor citation rates were related to the inclusion of hypotheses, which may provide disincentive for hypothesis formulation. Here, we review common justifications for avoiding hypotheses and present new arguments based on benefits to the individual researcher. We argue that stating multiple alternative hypotheses increases research clarity and precision, and is more likely to address the mechanisms for observed patterns in nature. Although hypotheses are not always necessary, we expect their continued and increased use will help our fields move toward greater understanding, reproducibility, prediction, and effective conservation of nature.

We use a quantitative literature review to show that use of a priori hypotheses is still rare in the fields of ecology and evolution. We provide suggestions about the group and individual‐level benefits of hypothesis use.

An external file that holds a picture, illustration, etc.
Object name is ECE3-11-5762-g005.jpg

1. INTRODUCTION

Why should ecologists have hypotheses? At the beginning of most science careers, there comes a time of “hypothesis angst” where students question the need for the hypothetico‐deductive approach their elders have deemed essential for good science. Why is it not sufficient to just have a research objective or question? Why can't we just collect observations and describe those in our research papers?

Research hypotheses are explanations for an observed phenomenon (Loehle,  1987 ; Wolff & Krebs,  2008 ) (see Box  1 ) and have been proposed as a central tool of science since Galileo and Francis Bacon in the mid‐1600s (Glass & Hall,  2008 ). Over the past century, there have been repeated calls for rigorous application of hypotheses in science, and arguments that hypothesis use is the cornerstone of the scientific method (Chamberlin,  1890 ; Popper,  1959 ; Romesburg,  1981 ). In a seminal paper in Science, Platt ( 1964 ) challenged all scientific fields to adopt and rigorously test multiple hypotheses (sensu Chamberlin,  1890 ), arguing that without such hypothesis tests, disciplines would be prone to “stamp collecting” (Landy,  1986 ). To constitute “strong inference,” Platt required the scientific method to be a three‐step process including (1) developing alternative hypotheses, (2) devising a set of “crucial” experiments to eliminate all but one hypothesis, and (3) performing the experiments (Elliott & Brook,  2007 ).

Definitions of hypotheses and associated terms

Hypothesis : An explanation for an observed phenomenon.

Research Hypothesis: A statement about a phenomenon that also includes the potential mechanism or cause of that phenomenon. Though a research hypothesis doesn't need to adhere to this strict framework it is often best described as the “if” in an “if‐then” statement. In other words, “if X is true” (where X is the mechanism or cause for an observed phenomenon) “then Y” (where Y is the outcome of a crucial test that supports the hypothesis). These can also be thought of as “ mechanistic hypotheses ” since they link with a causal mechanism. For example, trees grow slowly at high elevation because of nutrient limitation (hypothesis); if this is the case, fertilizing trees should result in more rapid growth (prediction).

Prediction: The potential outcome of a test that would support a hypothesis. Most researchers call the second part of the if‐then statement a “prediction”.

Multiple alternative hypotheses: Multiple plausible explanations for the same phenomenon.

Descriptive Hypothesis: Descriptive statements or predictions with the word “hypothesis” in front of them. Typically researchers state their guess about the results they expect and call this the “hypothesis” (e.g., “I hypothesize trees at higher elevation will grow slowly”).

Statistical Hypothesis : A predicted pattern in data that should occur if a research hypothesis is true.

Null Hypothesis : A concise statement expressing the concept of “no difference” between a sample and the population mean.

The commonly touted strengths of hypotheses are two‐fold. First, by adopting multiple plausible explanations for a phenomenon (hereafter “ multiple alternative hypotheses ”; Box  1 ), a researcher reduces the chance that they will become attached to a single possibility, thereby biasing research in favor of this outcome (Chamberlin,  1890 ); this “confirmation bias” is a well‐known human trait (Loehle,  1987 ; Rosen,  2016 ) and likely decreases reproducibility (Munafò et al.,  2017 ). Second, various authors have argued that the a priori hypothesis framework forces one to think in advance about—and then test—various causes for patterns in nature (Wolff & Krebs,  2008 ), rather than simply examining the patterns themselves and coming up with explanations after the fact (so called “inductive research;” Romesburg,  1981 ). By understanding and testing mechanisms, science becomes more reliable and transferable (Ayres & Lombardero,  2017 ; Houlahan et al.,  2017 ; Sutherland et al.,  2013 ) (Figure  1 ). Importantly, both of these strengths should have strong, positive impacts on reproducibility of ecological and evolutionary studies (see Discussion).

An external file that holds a picture, illustration, etc.
Object name is ECE3-11-5762-g001.jpg

Understanding mechanisms often increases model transferability. Panels (a and b) show snowshoe hares in winter and summer coloration, respectively. If a correlative (i.e., nonmechanistic) model for hare survival as a function of color was trained only on hares during the winter and then extrapolated into the summer months, it would perform poorly (white hares would die disproportionately under no‐snow conditions). On the other hand, a researcher testing mechanisms for hare survival would (ideally via experimentation) arrive at the conclusion that it is not the whiteness of hares, but rather blending with the background that confers survival (the “camouflage” hypothesis). Understanding mechanism results in model predictions being robust to novel conditions. Panel (c) Shows x and y geographic locations of training (blue filled circles) and testing (blue open circles) locations for a hypothetical correlative model. Even if the model performs well on these independent test data (predicting open to closed circles), there is no guarantee that it will predict well outside of the spatial bounds of the existing data (red circles). Nonstationarity (in this case caused by a nonlinear relationship between predictor and response variable; panel d) could result in correlative relationships shifting substantially if extrapolated to new times or places. However, mechanistic hypotheses aimed at understanding the underlying factors driving the distribution of this species would be more likely to elucidate this nonlinear relationship. In both of these examples, understanding drivers behind ecological patterns—via testing mechanistic hypotheses—is likely to enhance model transferability

However, we are entering a new era of ecological and evolutionary science that is characterized by massive datasets on genomes, species distributions, climate, land cover, and other remotely sensed information (e.g., bioacoustics, camera traps; Pettorelli et al.,  2017 ). Exceptional computing power and new statistical and machine‐learning algorithms now enable thousands of statistical models to be run in minutes. Such datasets and methods allow for pattern recognition at unprecedented spatial scales and for huge numbers of taxa and processes. Indeed, there have been recent arguments in both the scientific literature and popular press to do away with the traditional scientific method and a priori hypotheses (Glass & Hall,  2008 ; Golub,  2010 ). These arguments go something along the lines of “if we can get predictions right most of the time, why do we need to know the cause?”

In this paper, we sought to understand if hypothesis use in ecology and evolution has shifted in response to these pressures on the discipline. We, therefore, hypothesized that hypothesis use has declined in ecology and evolution since the 1990s, given the substantial advancement of tools further facilitating descriptive, correlative research (e.g., Cutler et al.,  2007 ; Elith et al.,  2008 ). We predicted that this decline should be particularly evident in the applied conservation literature—where the emergence of machine‐learning models has resulted in an explosion of conservation‐oriented species distribution models (Elith et al.,  2006 ). Our alternative hypothesis was that hypothesis use has become more frequent. The mechanism for such increases is that higher‐profile journals (e.g., Functional Ecology , Proceedings of the Royal Society of London Ser. B ) and competitive granting agencies (e.g., the U.S. National Science Foundation) now require or strongly encourage hypothesis statements.

As noted above, many have argued that hypotheses are useful and important for overall progress in science, because they facilitate the discovery of mechanisms, reduce bias, and increase reproducibility (Platt,  1964 ). However, for hypothesis use to be propagated among scientists, one would also expect hypotheses to confer benefits to the individual. We, therefore, tested whether hypothesis use was associated with individual‐level incentives relevant to academic success: publications, citations, and grants (Weinberg,  2010 ). If hypothesis use confers individual‐level advantages, then hypothesis‐based research should be (1) published in more highly ranked journals, (2) have higher citation rates, and (3) be supported by highly competitive funding sources.

Finally, we also present some common justifications for absence of hypotheses and suggest potential counterpoints researchers should consider prior to dismissing hypothesis use, including potential benefits to the individual researcher. We hope this communication provides practical recommendations for improving hypothesis use in ecology and evolution—particularly for new practitioners in the field (Box  2 ).

Recommendations for improving hypotheses use in ecology and evolution

Authors : Know that you are human and prone to confirmation bias and highly effective at false pattern recognition. Thus, inductive research and single working hypotheses should be rare in your research. Remember that if your work is to have a real “impact”, it needs to withstand multiple tests from other labs over the coming decades.

Editors and Reviewers : Reward research that is conducted using principles of sound scientific method. Be skeptical of research that smacks of data dredging, post hoc hypothesis development, and single hypotheses. If no hypotheses are stated in a paper and/or the paper is purely descriptive, ask whether the novelty of the system and question warrant this, or if the field would have been better served by a study with mechanistic hypotheses. If only single hypotheses are stated, ask whether appropriate precautions were taken for the researcher to avoid finding support for a pet idea (e.g., blinded experiments, randomized attribution of treatments, etc.). To paraphrase Platt ( 1964 ): beware of the person with only one method or one instrument, either experimental or theoretical.

Mentors : Encourage your advisees to think carefully about hypothesis use and teach them how to construct sound multiple, mechanistic hypotheses. Importantly, explain why hypotheses are important to the scientific method, the individual and group consequences of excluding them, and the rare instances where they may not be necessary.

Policymakers/media/educators/students/readers : Read scientific articles with skepticism; have a scrutinous eye out for single hypothesis studies and p‐hacking. Reward multi‐hypothesis, mechanistic, predictive science by giving it greater weight in policy decisions (Sutherland et al.,  2013 ), more coverage in the media, greater leverage in education, and more citations in reports.

2.1. Literature analysis

To examine hypothesis use over time and test whether hypothesis presence was associated with research type (basic vs. applied), journal impact factor, citation rates, and grants, we sampled the ecology and evolution literature using a stratified random sample of ecology and evolution journals in existence before 1991. First, we randomly selected 19 journals across impact factor (IF) strata ranging from 0.5–10.0 in two bins (<3 IF and ≥3 IF; see Figure  3 for full journal list). We then added three multidisciplinary journals that regularly publish ecology and evolution articles ( Proceedings of the National Academy of Sciences, Science, and Nature ). From this sample of 22 journals, we randomly selected ecology and evolution articles within 5‐year strata beginning in 1991 (3 articles/journal per 5‐year bin) to ensure the full date range was evenly sampled. We removed articles in the following categories: editorials, corrections, reviews, opinions, and methods papers. In multidisciplinary journals, we examined only ecology, evolution, and conservation biology articles, as indicated by section headers in each journal. Once selected, articles were randomly distributed to the authors of the current paper (hereafter “reviewers:” MGB, ASH, DF, SF, DG, SH, HK, UK, KL, KM, JN, BP, JSR, TSS, JV, DZC) for detailed examination. On rare occasions, an article was not found, or reviewers were not able to complete their review. Ultimately, our final sample comprised 268 articles.

An external file that holds a picture, illustration, etc.
Object name is ECE3-11-5762-g004.jpg

Frequency distributions showing proportion of various hypotheses types across ecology and evolution journals included in our detailed literature search. Hypothesis use varied greatly across publication outlets. We considered J. Applied Ecology, J. Wildlife Management, J. Soil, and Water Cons., Ecological Applications, Conservation Biology, and Biological Conservation to be applied journals; both applied and basic journals varied greatly in the prevalence of hypotheses

Reviewers were given a maximum of 10 min to find research hypothesis statements within the abstract or introduction of articles. We chose 10 min to simulate the amount of time that a journal editor pressed for time might spend evaluating the introductory material in an article. After this initial 10 min period, we determined: (1) whether or not an article contained at least one hypothesis, (2) whether hypotheses were mechanistic or not (i.e., the authors claimed to examine the mechanism for an observed phenomenon), (3) whether multiple alternative hypotheses were considered (sensu Chamberlin, 1890 ), and (4) whether hypotheses were “descriptive” (that is, they did not explore a mechanism but simply stated the expected direction of an effect; we define this as a “prediction” [Box  1 ]). It is important to note that to be identified as having hypotheses, articles did not need to contain the actual term “hypothesis” under our protocol; we also included articles using phrases such as “If X is true, we expected …” or “ we anticipated, ” both of which reflect a priori expectations from the data. We categorized each article as either basic (fundamental research without applications as a focus) or applied (clear management or conservation focus to article). Finally, we also examined all articles for funding sources and noted the presence of a national or international‐level competitive grant (e.g., National Science Foundation, European Union, Natural Sciences and Engineering Research Council). We assumed that published articles would have fidelity to the hypotheses stated in original grant proposals that funded the research, therefore, the acknowledgment of a successful grant is an indicator of financial reward for including hypotheses in initial proposals. Journal impact factors and individual article citation rates were gleaned directly from Web of Science. We reasoned that many researchers seek out journals with higher impact factors for the first submission of their manuscripts (Paine & Fox,  2018 ). Our assumption was that studies with more careful experimental design—including hypotheses—should be published where initially submitted, whereas those without may be eventually published, on average, in lower impact journals (Opthof et al.,  2000 ). Ideally, we could have included articles that were rejected and never published in our analysis, but such articles are notoriously difficult to track (Thornton & Lee,  2000 ).

To support our detailed literature analysis, we also tested for temporal trends in hypothesis use within a broader sample of the ecology and evolution literature. For the same set of 22 journals in our detailed sample, we conducted a Web of Science search for articles containing “hypoth*” in the title or abstract. To calculate the proportion of articles with hypotheses (from 1990–2018), we divided the number of articles with hypotheses by the total number of articles ( N  = 302,558). Because our search method does not include the main text of articles and excludes more subtle ways of stating hypotheses (e.g., “We expected…,” “We predicted…”), we acknowledge that the proportion of papers identified is likely to be an underestimate of the true proportions. Nevertheless, we do not expect that the degree of underestimation would change over time, so temporal trends in the proportion of papers containing hypotheses should be unbiased.

2.2. Statistical analysis

We used generalized linear mixed models (GLMMs) to test for change in the prevalance of various hypothesis types over time (descriptive, mechanistic, multiple, any hypothesis). Presence of a hypothesis was modeled as dichotomous (0,1) with binomial error structure, and “journal” was included as a random effect to account for potential lack of independence among articles published in the same outlet. The predictor variable (i.e., year) was scaled to enable convergence. Similarly, we tested for differences in hypothesis prevalence between basic and applied articles using GLMMs with “journal” as a random effect. Finally, we tested the hypothesis that hypothesis use might decline over time due to the emergence of machine‐learning in the applied conservation literature; specifically, we modeled “hypothesis presence” as a function of the statistical interaction between “year” and “basic versus applied” articles. We conducted this test for all hypothesis types. GLMMs were implemented in R (version 3.60) using the lme4 package (Bates et al.,  2018 ). In three of our models, the “journal” random effect standard deviation was estimated to be zero or nearly zero (i.e., 10 –8 ). In such cases, the model with the random effect is exceptionally difficult to estimate, and the random effect standard deviation being estimated as approximately zero indicates the random effect was likely not needed.

We tested whether the presence of hypotheses influenced the likelihood of publication in a high‐impact journal using generalized linear models with a Gaussian error structure. We used the log of journal impact factor (+0.5) as the response variable to improve normality of model residuals. We tested the association between major competitive grants and the presence of a hypotheses using generalized linear models (logistic regression) with “hypothesis presence” (0,1) as a predictor and presence of a grant (0,1) as a response.

Finally, we tested whether hypotheses increase citation rates using linear mixed effects models (LMMs); presence of various hypotheses (0,1) were predictors in univariate models and average citations per year (log‐transformed) was the response. “Journal” was treated as a random effect, which assumes that articles within a particular journal are unlikely to be independent in their citation rates. LMMs were implemented in R using the lme4 package (Bates et al.,  2015 ).

3.1. Trends in hypothesis use in ecology and evolution

In the ecology and evolution articles we examined in detail, the prevalence of multiple alternative hypotheses (6.7%) and mechanistic hypotheses (26%) was very low and showed no temporal trend (GLMM: multiple alternative: β ^  = 0.098 [95% CI: −0.383, 0.595], z  = 0.40, p  = 0.69, mechanistic: β ^  = 0.131 [95% CI: −0.149, 0.418], z  = 0.92, p  = 0.36, Figure  2a,b ). Descriptive hypothesis use was also low (8.5%), and although we observed a slight tendency to increase over time, 95% confidence intervals overlapped zero (GLMM: β ^  = 0.351 [95% CI: −0.088, 0.819], z  = 1.53, p  = 0.13, Figure  2c ). Although the proportion of papers containing no hypotheses appears to have declined (Figure  2d ), this effect was not statistically significant (GLMM: β ^  = −0.201 [95% CI: −0.483, 0.074], z  = −1.41, p  = 0.15). This overall pattern is consistent with a Web of Science search ( N  = 302,558 articles) for the term “hypoth*” in titles or abstracts that shows essentially no trend over the same time period (Figure  2e,f ).

An external file that holds a picture, illustration, etc.
Object name is ECE3-11-5762-g003.jpg

Trends in hypothesis use from 1991–2015 from a sample of the ecological and evolutionary literature ( N  = 268, (a) multiple alternative hypotheses, (b) mechanistic hypotheses, (c) descriptive hypotheses [predictions], and (d) no hypotheses present). We detected no temporal trend in any of these variables. Lines reflect LOESS smoothing with 95% confidence intervals. Dots show raw data with darker colors indicating overlapping data points. The total number of publications in ecology and evolution in selected journals has increased (e), but use of the term “hypoth*” in the title or abstracts of these 302,558 articles has remained flat, and at very low prevalence (f)

Counter to our hypothesis, applied and basic articles did not show a statistically significant difference in the prevalence of either mechanistic (GLMM: β ^  = 0.054 [95% CI: −0.620, 0.728], z  = 0.16, p  = 0.875) or multiple alternative hypotheses (GLMM: β ^  = 0.517 [95% CI: −0.582, 1.80], z  = 0.88, p  = 0.375). Although both basic and applied ecology and evolution articles containing hypotheses were similarly rare overall, there was a tendency for applied ecology articles to show increasing prevalence of mechanistic hypothesis use over time, whereas basic ecology articles have remained relatively unchanged (Table  S1 , Figure  S1 ). However, there was substantial variation across both basic and applied journals in the prevalence of hypotheses (Figure  3 ).

3.2. Do hypotheses “pay?”

We found little evidence that presence of hypotheses increased paper citation rates. Papers with mechanistic (LMM: β ^  = −0.109 [95% CI: −0.329, 0.115], t  = 0.042, p  = 0.97, Figure  4a , middle panel) or multiple alternative hypotheses (LMM: β ^  = −0.008 [95% CI: −0.369, 0.391], t  = 0.042, p  = 0.96, Figure  4a , bottom panel) did not have higher average annual citation rates, nor did papers with at least one hypothesis type (LMM: β ^  = −0.024 [95% CI: −0.239, 0.194], t  = 0.218, p  = 0.83, Figure  4a , top panel).

An external file that holds a picture, illustration, etc.
Object name is ECE3-11-5762-g006.jpg

Results of our detailed literature search showing the relationship between having a hypothesis (or not) and three commonly sought after scientific rewards (Average times a paper is cited/year, Journal impact factor, and the likelihood of having a major national competitive grant). We found no statistically significant relationships between having a hypothesis and citation rates or grants, but articles with hypotheses tended to be published in higher impact journals

On the other hand, journal articles containing mechanistic hypotheses tended to be published in higher impact journals (GLM: β ^  = 0.290 [95% CI: 0.083, 0.497], t  = 2.74, p  = 0.006) but only slightly so (Figure  4b , middle panel). Including multiple alternative hypotheses in papers did not have a statistically significant effect (GLM: = 0.339 [95% CI: −0.029, 0.707], t  = 1.80, p  = 0.072, Figure  4b , bottom panel).

Finally, we found no association between obtaining a competitive national or international grant and the presence of a hypothesis (logistic regression: mechanistic: β ^  = −0.090 [95% CI: −0.637, 0.453], z  = −0.36, p  =0 .745; multiple alternative: β ^  = 0.080 [95% CI: −0.891, 1.052], z  = 0.49, p  = 0.870; any hypothesis: β ^  = −0.005 [95% CI: −0.536, 0.525], z  = −0.02, p  = 0.986, Figure  4c ).

4. DISCUSSION

Overall, the prevalence of hypothesis use in the ecological and evolutionary literature is strikingly low and has been so for the past 25 years despite repeated calls to reverse this pattern (Elliott & Brook,  2007 ; Peters,  1991 ; Rosen,  2016 ; Sells et al.,  2018 ). Why is this the case?

Clearly, hypotheses are not always necessary and a portion of the sampled articles may represent situations where hypotheses are truly not useful (see Box  3 : “When Are Hypotheses Not Useful?”). Some authors (Wolff & Krebs,  2008 ) overlook knowledge gathering and descriptive research as a crucial first step for making observations about natural phenomena—from which hypotheses can be formulated. This descriptive work is an important part of ecological science (Tewksbury et al.,  2014 ), but may not benefit from strict use of hypotheses. Similarly, some efforts are simply designed to be predictive, such as auto‐recognition of species via machine learning (Briggs et al.,  2012 ) or for prioritizing conservation efforts (Wilson et al.,  2006 ), where the primary concern is correct identification and prediction rather than the biological or computational reasons for correct predictions (Box  3 ). However, it would be surprising if 75% of ecology since 1990 has been purely descriptive work from little‐known systems or purely predictive in nature. Indeed, the majority of the articles we observed did not fall into these categories.

When are hypotheses not useful?

Of course, there are a number of instances where hypotheses might not be useful or needed. It is important to recognize these instances to prevent the pendulum from swinging in a direction where without hypotheses, research ceases to be considered science (Wolff & Krebs,  2008 ). Below are several important types of ecological research where formulating hypotheses may not always be beneficial.

When the goal is prediction rather than understanding. Examples of this exception include species distribution models (Elith et al.,  2008 ) where the question is not why species are distributed as they are, but simply where species are predicted to be. Such results can be useful in conservation planning (Guisan et al.,  2013 ; see below). Another example lies in auto‐recognition of species (Briggs et al.,  2012 ) where the primary concern is getting identification right rather than the biological or computational reasons for correct predictions. In such instances, complex algorithms can be very effective at uncovering patterns (e.g., deep learning). A caveat and critical component of such efforts is to ensure that such models are tested on independent data. Further, if model predictions are made beyond the spatial or temporal bounds of training or test data, extreme caution should be applied (see Figure  4 ).

When the goal is description rather than understanding. In many applications, the objective is to simply quantify a pattern in nature; for example, where on Earth is forest loss most rapid (Hansen et al.,  2013 )? Further, sometimes so little is known about a system or species that formulating hypotheses is impossible and more description is necessary. In rare instances, an ecological system may be so poorly known and different to other systems that generating testable hypotheses would be extremely challenging. Darwin's observations while traveling on the Beagle are some of the best examples of such “hypothesis generating” science; these initial observations resulted in the formulation of one of the most extensively tested hypotheses in biology. However, such novelty should be uncommon in ecological and evolutionary research where theoretical and empirical precedent abounds (Sells et al.,  2018 ). In the field of biogeography, there is the commonly held view that researchers should first observe and analyze patterns, and only then might explanations emerge (“pattern before process”); however, it has frequently been demonstrated that mechanistic hypotheses are useful even in disciplines where manipulative experiments are impossible (Crisp et al.,  2011 ).

When the objective is a practical planning outcome such as reserve design. In many conservation planning efforts, the goal is not to uncover mechanisms, but rather simply to predict efficient methods or contexts for conserving species (Myers et al.,  2000 ; Wilson et al.,  2006 ). Perhaps this is the reason for such low prevalence of hypotheses in conservation journals (e.g., Conservation Biology).

Alternatively, researchers may not include hypotheses because they see little individual‐level incentive for their inclusion. Our results suggest that currently there are relatively few measurable benefits to individuals. Articles with mechanistic hypotheses do tend to be published in higher impact factor journals, which, for better or worse, is one of the key predictors in obtaining an academic job (van Dijk et al.,  2014 ). However, few of the other typical academic metrics (i.e., citations or grant funding) appear to reward this behavior. Although hypotheses might be “useful” for overall progress in science (Platt,  1964 ), for their use to be propagated in the population of scientists, one would also expect them to provide benefits to the individuals conducting the science. Interestingly, the few existing papers on hypotheses (Loehle,  1987 ; Romesburg,  1981 ; Sells et al.,  2018 ) tended to explain the advantages in terms of benefits to the group by offering arguments such as “because hypotheses help the field move forward more rapidly”.

Here we address some common justifications for hypotheses being unnecessary and show how one's first instinct to avoid hypotheses may be mistaken. We also present four reasons that use of hypotheses may be of individual self‐interest.

5. RESPONSES TO COMMON JUSTIFICATIONS FOR THE ABSENCE OF HYPOTHESES

During our collective mentoring at graduate and undergraduate levels, as well as examination of the literature, we have heard a number of common justifications for why hypotheses are not included. We must admit that many of us have, on occasion, rationalized absence of hypotheses in our own work using the same logic! We understand that clearly formulating and testing hypotheses can often be challenging, but propose that the justifications for avoiding hypotheses should be carefully considered.

  • “ But I do have hypotheses ”. Simply using the word “hypothesis” does not a hypothesis make. A common pattern in the literature we reviewed was for researchers to state their guess about the results they expect and call this the “hypothesis” (e.g., “I hypothesize trees at higher elevation will grow slowly”). But these are usually predictions derived from an implicit theoretical model (Symes et al.,  2015 ) or are simply descriptive statements with the word “hypothesis” in front of them (see Box  1 ). A research hypothesis must contain explanations for an observed phenomenon (Loehle,  1987 ; Wolff & Krebs,  2008 ). Such explanations are derived from existing or new theory (Symes et al.,  2015 ). Making the link between the expected mechanism (hypothesis) and logical outcome if that mechanism were true (the prediction), is a key element of strong inference. Similarly, using “statistical hypotheses” and “null hypothesis testing” is not the same as developing mechanistic research hypotheses (Romesburg,  1981 ; Sells et al.,  2018 ).
  • “ Not enough is known about my system to formulate hypotheses ”. This is perhaps the most common defense against needing hypotheses (Golub,  2010 ). The argument goes that due to lack of previous research no mature theory has developed, so formal tests are impossible. Such arguments may have basis in some truly novel contexts (e.g., exploratory research on genomes) (Golub,  2010 ). But on close inspection, similar work has often been conducted in other geographic regions, systems, or with different taxa. If the response by a researcher is “but we really need to know if X pattern also applies in this region” (e.g., does succession influence bird diversity in forests of Western North America the same way as it does in Eastern forests), this is fine and it is certainly useful to accumulate descriptive studies globally for future synthetic work. However, continued efforts at description alone constitute missed opportunities for understanding the mechanisms behind a pattern (e.g., why does bird diversity decline when the forest canopy closes?). Often with a little planning, both the initial descriptive local interest question (e.g., “is it?”) and the broader interest question (i.e., “why?”) can both be tackled with minimal additional effort.
  • “ What about Darwin? Many important discoveries have been made without hypotheses .” Several authors (and many students) have argued that many important and reliable patterns in nature have emerged outside of the hypothetico‐deductive (H‐D) method (Brush,  1974 ). For instance, Darwin's discovery of natural selection as a key force for evolution has been put forward as an example of how reliable ideas can emerge without the H‐D method (May,  1981 ; Milner,  2018 ). Examination of Darwin's notebooks has suggested that he did not propose explicit hypotheses and test them (Brush,  1974 ). However, Darwin himself wrote “all observation must be for or against some view if it is to be of any service!” (Ayala,  2009 ). In fact, Darwin actually put forward and empirically tested hypotheses in multiple fields, including geology, plant morphology and physiology, psychology, and evolution (Ayala,  2009 ). This debate suggests that, like Darwin, we should continue to value systematic observation and descriptive science (Tewksbury et al.,  2014 ), but whenever possible, it should be with a view toward developing theory and testing hypotheses

The statement that “many important discoveries have been made without hypotheses” stems from a common misconception that somehow hypotheses spring fully formed into the mind, and that speculation, chance and induction play no role in the H‐D method. As noted by Loehle ( 1987 ; p. 402) “The H‐D method and strong inference, however, are valid no matter how theories are obtained. Dreams, crystal balls, or scribbled notebooks are all allowed. In fact, induction may be used to create empirical relations which then become candidates for hypothesis testing even though induction cannot be used to prove anything”. So, although induction has frequently been used to develop theory, it is an unreliable means to test theory (Popper,  1959 ). As is well‐known, Darwin's theory of natural selection was heavily debated in scientific circles at the time, and it is only through countless hypothesis tests that it remains the best explanation for evolution even today (Mayr,  2002 ).

  • “ Ecology is too complex for hypotheses ”. In one of the most forcefully presented arguments for the H‐D method, Karl Popper ( 1959 ) argued that science should be done through a process of falsification; that is, multiple hypotheses should be constructed and the researcher's role is to successively eliminate these one at a time via experimentation until a single plausible hypothesis remains. This approach has caused some consternation among ecologists because the idea of single causes to phenomena doesn't match most of our experiences (Quinn & Dunham,  1983 ); rather, multiple interacting processes often overlap to drive observed patterns. For example, Robert Paine found that the distribution of a common seaweed was best explained by competition, physical disturbance, and dispersal ability (Paine,  1966 ).

It would be interesting if Popperian logic has inoculated ecology and evolution against the frequent application of hypotheses in research. Perhaps because the bar of falsification and testable mutually exclusive hypotheses is so high, many have opted to ignore the need for hypotheses altogether. If this is the case, our response is that in ecology and evolution we must not let Popperian perfection be the enemy of strong inference. With sufficient knowledge of a system, formal a priori hypotheses can be formulated that directly address the possibility of nonlinear relationships and interactions among variables. An example from conservation biology is the well‐explored hypothesis that the effects of habitat fragmentation should be greatest when habitat amount is low due to dispersal limitation (i.e., there should be a statistical interaction between fragmentation and habitat loss (Andrén, 1994 )).

An external file that holds a picture, illustration, etc.
Object name is ECE3-11-5762-g002.jpg

Hypothesis generation is possible at all levels of organization, and does not need to get to the bottom of a causal hierarchy to be useful. As illustrated in this case study (after Betts et al.,  2015 ), using published work by the authors, support for a hypothesis at one level often generates a subsequent question and hypotheses at the next. After each new finding we had to return to the white board and draw out new alternative hypotheses as we progressed further down the hierarchy. Supported hypotheses are shown in black and the alternative hypotheses that were eliminated are in grey. A single study is not expected to tackle an entire mechanistic hierarchy. In fact, we still have yet to uncover the physiological mechanisms involved in this phenomenon

  • “ But my model predicts patterns well ”. An increasingly common justification for not presenting and testing research hypotheses seems to be the notion that if large datasets and complex modeling methods can predict outcomes effectively, what is the need for hypothesizing a mechanism (Glass & Hall,  2008 ; Golub,  2010 )? Indeed, some have argued that prediction is a gold standard in ecology and evolution (Houlahan et al.,  2017 ). However, underlying such arguments is the critical assumption that the relationship between predictors (i.e., independent variables, 'x's) and responses ('y's) exhibit stationarity in time and space. Although this appears to be the case in cosmology (e.g., relativity is thought to apply wherever you are in the universe (Einstein,  1920 )), the assumption of stationarity has repeatedly been shown to be violated in ecological and evolutionary studies (Betts et al.,  2006 ; Osborne et al.,  2007 ; Thompson,  2005 ). Hence the well‐known maxim “correlation does not equal causation;” correlates of a phenomenon often shift, even if the underlying cause remains the same.

The advantage of understanding mechanism is that the relationship between cause and effect is less likely to shift in space and time than between the correlates of a phenomenon (Sells et al.,  2018 ) (Figure  1 ). For instance, climate‐envelope models are still commonly used to predict future species distributions (Beale et al.,  2008 ) despite the fact that links between correlates often fail (Gutiérrez et al.,  2014 ) and climate per se may not be the direct driver of distributions. In an example from our own group, predictions that fit observed data well in the region where the model was built completely failed when predicted to a new region only 250 km away (Betts et al.,  2006 ). Although it is true that mechanisms can also exhibit nonstationarity, at least in these instances logic can inform decisions about whether or not causal factors are likely to hold in a new place or time.

6. WHY SHOULD YOU HAVE HYPOTHESES? (A SELF‐INTERESTED PERSPECTIVE)

We have already described two arguments for hypothesis use, both of which should have positive influences on reproducibility and therefore progress in science: (1) multiple alternative hypotheses developed a priori prevent attachment to a single idea, and (2) hypotheses encourage exploration of mechanisms, which should increase the transferability of findings to new systems. Both these arguments have been made frequently in the eco‐evolutionary literature for decades (Elliott & Brook,  2007 ; Loehle,  1987 ; Rosen,  2016 ; Sells et al.,  2018 ), but our results show that such arguments have been lost on the majority of researchers. One hypothesis recently proposed to explain why “poor methods persist [in science] despite perennial calls for improvements” is that such arguments have largely failed because they do not appeal to researcher self‐interest (Smaldino & McElreath,  2016 ). In periods of intense competition for grants and top‐tier publications, perhaps arguments that rely on altruism fall short. However, happily, there are at least four self‐interested reasons that students of ecological and evolutionary science should adopt the hypothetico‐deductive method.

  • Clarity and Precision in Research

First, and most apparent during our review of the literature, hypotheses force clarity and precision in thinking. We often found it difficult to determine the core purpose of papers that lacked clear hypotheses. One of the key goals of scientific writing is to communicate ideas efficiently (Schimel,  2011 ). Increased clarity through use of hypotheses could potentially even explain the pattern for manuscripts using hypotheses getting published in higher impact journals. Editors are increasingly pressed for time and forced to reject the majority of papers submitted to higher impact outlets prior to detailed review (AAAS,  2018 ). “Unclear message” and “lack of clear hypotheses” are top reasons a paper ends up in the editor's reject pile (Eassom,  2018 ; Elsevier,  2015 ). If editors have to struggle as often as we did to determine the purpose of a paper, this does not bode well for future publication. Clearly, communication through succinctly stated hypotheses is likely to enhance publication success.

Hypotheses also provide crucial direction during study design. Nothing is more frustrating than realizing that your hard‐earned data cannot actually address the key study objectives or rule out alternative explanations. Developing clear hypotheses and, in particular, multiple alternative hypotheses ensures that you actually design your study in a way that can answer the key questions of interest.

  • Personal Fulfillment

Second, science is more likely to be fulfilling and fun when the direction of research is clear, but perhaps more importantly, when questions are addressed with more than one plausible answer. Results are often disappointing or unfulfilling when the study starts out with a single biological hypothesis in mind (Symes et al.,  2015 )—particularly if there is no support for this hypothesis. If multiple alternative hypotheses are well crafted, something interesting and rewarding will result regardless of the outcome. This results in a situation where researchers are much more likely to enjoy the process of science because the stress of wanting a particular end is removed. Subsequently, as Chamberlin ( 1890 ) proposed, “the dangers of parental affection for a favorite theory can be circumvented” which should reduce the risk of creeping bias. In our experience reviewing competitive grant proposals at the U.S. National Science Foundation, it is consistently the case that proposals testing several compelling hypotheses were more likely to be well received—presumably because reviewers are risk‐averse and understand that ultimately finding support for any of the outcomes will pay‐off. Why bet on just one horse when you can bet on them all?

  • Intrinsic Value to Mechanism

Mechanism seems to have intrinsic value for humans—regardless of the practical application. Humans tend to be interested in acquiring understanding rather than just accumulating facts. As a species, we crave answers to the question “why.” Indeed, it is partly this desire for mechanism that is driving a recent perceived “crisis” in machine learning, with the entire field being referred to as “alchemy” (Hutson,  2018 ); algorithms continue to increase in performance, but the mechanisms for such improvements are often a mystery—even to the researchers themselves. “Because our model predicts well” is the unsatisfying scientific equivalent to a parent answering a child's “why?” with “because that's just the way it is.” This problem is beginning to spawn a new field in artificial intelligence “AI neuroscience” which attempts to get into the “black‐box” of machine‐learning algorithms to understand how and why they are predictive (Voosen,  2017 ).

Even in some of our most applied research, we find that managers and policymakers when confronted with a result (e.g., thinning trees to 70% of initial densities reduced bird diversity) want to know why (e.g., thinning eliminated nesting substrate for 4 species); If the answer to this question is not available, policy is much less likely to change (Sells et al.,  2018 ). So, formulating mechanistic hypotheses will not only be more personally satisfying, but we expect it may also be more likely to result in real‐world changes.

  • You Are More Likely To be Right

In a highly competitive era, it seems that in the quest for high publication rates and funding, researchers lose sight of the original aim of science: To discover a truth about nature that is transferable to other systems. In a recent poll conducted by Nature, more than 70% of researchers have tried and failed to reproduce another scientist's experiments (Baker,  2016 ). Ultimately, each researcher has a choice; put forward multiple explanations for a phenomenon on their own or risk “attachment” to a single hypothesis and run the risk of bias entering their work, rendering it irreproducible, and subsequently being found wrong by a future researcher. Imagine if Lamarck had not championed a single hypothesis for the mechanisms of evolution? Although Lamarck potentially had a vital impact as an early proponent of the idea that biological evolution occurred and proceeded in accordance with natural laws (Stafleu,  1971 ), unfortunately in the modern era he is largely remembered for his pet hypothesis. It may be a stretch to argue that he would have necessarily come up with natural selection, but if he had considered natural selection, the idea would have emerged 50 years earlier, substantially accelerating scientific progress and limiting his infamy as an early evolutionary biologist. An interesting contemporary example is provided by Prof. Amy Cuddy's research focused on “power posing” as a means to succeed. The work featured in one of the most viewed TED talks of all time but rather famously turned out to be irreproducible (Ranehill et al.,  2015 ). When asked in a TED interview what she would do differently now, Prof. Cuddy noted that she would include a greater diversity of theory and multiple potential lines of evidence to “shed light on the psychological mechanisms” (Biello,  2017 ).

7. CONCLUSION

We acknowledge that formulating effective hypotheses can feel like a daunting hurdle for ecologists. However, we suggest that initial justifications for absence of hypotheses may often be unfounded. We argue that there are both selfish and altruistic reasons to include multiple alternative mechanistic hypotheses in your research: (1) testing multiple alternative hypotheses simultaneously makes for rapid and powerful progress which is to the benefit of all (Platt,  1964 ), (2) you lessen the chance that confirmation bias will result in you publishing an incorrect but provocative idea, (3) hypotheses provide clarity in design and writing, (4) research using hypotheses is more likely to be published in a high‐impact journal, and (5) you are able to provide satisfying answers to “why?” phenomena occur. However, few current academic metrics appear to reward use of hypotheses. Therefore, we propose that in order to promote hypothesis use we may need to provide additional incentives (Edwards & Roy,  2016 ; Smaldino & McElreath,  2016 ). We suggest editors reward research conducted using principles of sound scientific method and be skeptical of research that smacks of data dredging, post hoc hypothesis development, and single hypotheses. If no hypotheses are stated in a paper and/or the paper is purely descriptive, editors should ask whether the novelty of the system and question warrant this, or if the field would have been better served by a study with mechanistic hypotheses. Eleven of the top 20 ecology journals already indicate a desire for hypotheses in their instructions for authors—with some going as far as indicating “priority will be given” for manuscripts testing clearly stated hypotheses. Although hypotheses are not necessary in all instances, we expect that their continued and increased use will help our disciplines move toward greater understanding, higher reproducibility, better prediction, and more effective management and conservation of nature. We recommend authors, editors, and readers encourage their use (Box  2 ).

CONFLICT OF INTEREST

The authors have no conflicts of interests to declare.

AUTHOR CONTRIBUTIONS

Matthew G. Betts: Conceptualization (lead); data curation (lead); formal analysis (lead); funding acquisition (lead); investigation (lead); methodology (equal); project administration (lead); resources (lead); supervision (lead); visualization (lead); writing‐original draft (lead); writing‐review & editing (lead). Adam S. Hadley: Conceptualization (lead); data curation (lead); funding acquisition (equal); investigation (equal); methodology (lead); project administration (equal); resources (supporting); software (supporting); supervision (lead); validation (lead); visualization (lead); writing‐original draft (equal); writing‐review & editing (equal). David W. Frey: Conceptualization (supporting); data curation (supporting); formal analysis (supporting); funding acquisition (supporting); writing‐review & editing (supporting). Sarah J. K. Frey: Conceptualization (supporting); Investigation (equal); writing‐review & editing (equal). Dusty Gannon: Conceptualization (supporting); Investigation (equal); writing‐review & editing (equal). Scott H. Harris: Conceptualization (supporting); Investigation (equal); methodology (equal); writing‐review & editing (equal). Hankyu Kim: Conceptualization (supporting); Investigation (equal); Methodology (equal); writing‐review & editing (equal). Kara Leimberger: Conceptualization (supporting); Investigation (equal); Methodology (equal); writing‐review & editing (equal). Katie Moriarty: Conceptualization (supporting); Investigation (equal); methodology (equal); writing‐review & editing (equal). Joseph M. Northrup: Investigation (equal); methodology (equal); writing‐review & editing (equal). Ben Phalan: Investigation (equal); Methodology (equal); writing‐review & editing (equal). Josée S. Rousseau: Investigation (equal); Methodology (equal); writing‐review & editing (equal). Thomas D. Stokely: Investigation (equal); methodology (equal); writing‐review & editing (equal). Jonathon J. Valente: Investigation (equal); methodology (equal); writing‐review & editing (equal). Urs G. Kormann: Methodology (supporting); resources (equal); writing‐review & editing (supporting). Chris Wolf: Formal analysis (supporting); writing‐review & editing (supporting). Diego Zárrate‐Charry: Investigation (equal); Methodology (equal); writing‐review & editing (equal).

ETHICAL APPROVAL

The authors adhered to all standards for the ethical conduct of research.

Supporting information

Supplementary Material

ACKNOWLEDGMENTS

Funding from the National Science Foundation (NSF‐DEB‐1457837) to MGB and ASH supported this research. We thank Rob Fletcher, Craig Loehle and anonymous reviewers for thoughtful comments early versions of this manuscript, as well as Joe Nocera and his graduate student group at the University of New Brunswick for constructive comments on the penultimate version of the paper. The authors are also grateful for A. Dream for providing additional resources to enable the completion of this manuscript.

Betts MG, Hadley AS, Frey DW, et al. When are hypotheses useful in ecology and evolution? . Ecol Evol . 2021; 11 :5762–5776. 10.1002/ece3.7365 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]

Matthew G. Betts and Adam S. Hadley contributed equally to this manuscript.

DATA AVAILABILITY STATEMENT

  • AAAS (2018). What percentage of submissions does Science accept? . AAAS Science Contributors. Retrieved from http://www.sciencemag.org/site/feature/contribinfo/faq/index.xhtml‐pct_faq [ Google Scholar ]
  • Andrén, H. , & Andren, H. (1994). Effects of habitat fragmentation on birds and mammals in landscapes with different proportions of suitable habitat: A review . Oikos , 71 , 355–366. 10.2307/3545823 [ CrossRef ] [ Google Scholar ]
  • Ayala, F. J. (2009). Darwin and the scientific method . Proceedings of the National Academy of Sciences , 106 , 10033–10039. 10.1073/pnas.0901404106 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ayres, M. P. , & Lombardero, M. J. (2017). Forest pests and their management in the Anthropocene . Canadian Journal of Forest Research , 48 , 292–301. 10.1139/cjfr-2017-0033 [ CrossRef ] [ Google Scholar ]
  • Baker, M. (2016). 1,500 scientists lift the lid on reproducibility . Nature , 533 , 452–454. [ PubMed ] [ Google Scholar ]
  • Bates, D. , Mächler, M. , Bolker, B. , & Walker, S. (2015). Fitting linear mixed‐effects models using lme4 . Journal of Statistical Software , 67 , 1–48. [ Google Scholar ]
  • Bates, D. , Maechler, M. , & Bolker, B. (2018). ‘lme4’ Linear mixed‐effects models using S4 classes . R Core Team. Retrieved from https://cran.r‐project.org/web/packages/lme4/lme4.pdf [ Google Scholar ]
  • Beale, C. M. , Lennon, J. J. , & Gimona, A. (2008). Opening the climate envelope reveals no macroscale associations with climate in European birds . Proceedings of the National Academy of Sciences , 105 , 14908–14912. 10.1073/pnas.0803506105 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Betts, M. , Diamond, T. , Forbes, G. J. , Villard, M.‐A. , & Gunn, J. (2006). The importance of spatial autocorrelation, extent and resolution in predicting forest bird occurrence . Ecological Modeling , 191 , 197–224. [ Google Scholar ]
  • Betts, M. G. , Hadley, A. S. , & Kress, J. (2015). Pollinator recognition in a keystone tropical plant . Proceedings of the National Academy of Sciences , 112 , 3433–3438. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Biello, D. (2017). Inside the debate about power posing: A Q & A with Amy Cuddy . Ideas.TED.com. Retrieved from https://ideas.ted.com/inside‐the‐debate‐about‐power‐posing‐a‐q‐a‐with‐amy‐cuddy/ [ Google Scholar ]
  • Briggs, F. , Lakshminarayanan, B. , Neal, L. , Fern, X. Z. , Raich, R. , Hadley, S. J. K. , Hadley, A. S. , & Betts, M. G. (2012). Acoustic classification of multiple simultaneous bird species: A multi‐instance multi‐label approach . The Journal of the Acoustical Society of America , 131 , 4640–4650. 10.1121/1.4707424 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Brush, S. G. (1974). Should the history of science be rated X? Science , 183 , 1164–1172. [ PubMed ] [ Google Scholar ]
  • Chamberlin, T. C. (1890). The method of multiple working hypotheses . Science , 15 , 92–96. [ PubMed ] [ Google Scholar ]
  • Crisp, M. D. , Trewick, S. A. , & Cook, L. G. (2011). Hypothesis testing in biogeography . Trends in Ecology & Evolution , 26 , 66–72. 10.1016/j.tree.2010.11.005 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cutler, D. R. , Edwards, T. C. , Beard, K. H. , Cutler, A. , Hess, K. T. , Gibson, J. , & Lawler, J. J. (2007). Random forests for classification in ecology . Ecology , 88 , 2783–2792. 10.1890/07-0539.1 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Eassom, H. (2018). 9 common reasons for rejection . Wiley: Discover the Future of Research. Retrieved from https://hub.wiley.com/community/exchanges/discover/blog/2018/2001/2031/2019‐common‐reasons‐for‐rejection [ Google Scholar ]
  • Edwards, M. A. , & Roy, S. (2016). Academic research in the 21st century: Maintaining scientific integrity in a climate of perverse incentives and hypercompetition . Environmental Engineering Science , 34 , 51–61. 10.1089/ees.2016.0223 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Egler, F. E. (1986). “Physics envy” in ecology . Bulletin of the Ecological Society of America , 67 , 233–235. [ Google Scholar ]
  • Einstein, A. (1920). Relativity: The special and general theory (78 pp.). Henry Holt and Company. [ Google Scholar ]
  • Elith, J. , Graham, C. H. , Anderson, R. P. , Dudík, M. , Ferrier, S. , Guisan, A. , Hijmans, R. J. , Huettmann, F. , Leathwick, J. R. , Lehmann, A. , Li, J. , Lohmann, L. G. , Loiselle, B. A. , Manion, G. , Moritz, C. , Nakamura, M. , Nakazawa, Y. , Overton, J. M. M. , Townsend Peterson, A. , … Zimmermann, N. E. (2006). Novel methods improve prediction of species' distributions from occurrence data . Ecography , 29 , 129–151. [ Google Scholar ]
  • Elith, J. , Leathwick, J. R. , & Hastie, T. (2008). A working guide to boosted regression trees . Journal of Animal Ecology , 77 , 802–813. 10.1111/j.1365-2656.2008.01390.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Elliott, L. P. , & Brook, B. W. (2007). Revisiting Chamberlin: Multiple working hypotheses for the 21st century . BioScience , 57 , 608–614. 10.1641/B570708 [ CrossRef ] [ Google Scholar ]
  • Elsevier (2015). 5 ways you can ensure your manuscript avoids the desk reject pile . Elsevier Connect: https://www.elsevier.com/authors‐update/story/publishing‐tips/5‐ways‐you‐can‐ensure‐your‐manuscript‐avoids‐the‐desk‐reject‐pile [ Google Scholar ]
  • Fahrig, L. (2003). Effects of habitat fragmentation on biodiversity . Annual Review of Ecology, Evolution, and Systematics , 34 , 487–515. [ Google Scholar ]
  • Glass, D. J. , & Hall, N. (2008). A brief history of the hypothesis . Cell , 134 , 378–381. 10.1016/j.cell.2008.07.033 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Golub, T. (2010). Counterpoint: Data first . Nature , 464 , 679. 10.1038/464679a [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Guisan, A. , Tingley, R. , Baumgartner, J. B. , Naujokaitis‐Lewis, I. , Sutcliffe, P. R. , Tulloch, A. I. T. , Regan, T. J. , Brotons, L. , McDonald‐Madden, E. , Mantyka‐Pringle, C. , Martin, T. G. , Rhodes, J. R. , Maggini, R. , Setterfield, S. A. , Elith, J. , Schwartz, M. W. , Wintle, B. A. , Broennimann, O. , Austin, M. , … Buckley, Y. M. (2013). Predicting species distributions for conservation decisions . Ecology Letters , 16 , 1424–1435. 10.1111/ele.12189 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hansen, M. C. , Potapov, P. V. , Moore, R. , Hancher, M. , Turubanova, S. A. , Tyukavina, A. , Thau, D. , Stehman, S. V. , Goetz, S. J. , Loveland, T. R. , Kommareddy, A. , Egorov, A. , Chini, L. , Justice, C. O. , & Townshend, J. R. G. (2013). High‐resolution global maps of 21st‐century forest cover change . Science , 342 , 850–853. 10.1126/science.1244693 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Houlahan, J. E. , McKinney, S. T. , Anderson, T. M. , & McGill, B. J. (2017). The priority of prediction in ecological understanding . Oikos , 126 , 1–7. [ Google Scholar ]
  • Hutson, M. (2018). AI researchers allege that machine learning is alchemy . Science Posted in: Technology May 3, 2018. 10.1126/science.aau0577 [ CrossRef ]
  • Illán, J. G. , Thomas, C. D. , Jones, J. A. , Wong, W.‐K. , Shirley, S. M. , & Betts, M. G. (2014). Precipitation and winter temperature predict long‐term range‐scale abundance changes in Western North American birds . Global Change Biology , 20 , 3351–3364. 10.1111/gcb.12642 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lack, D. (1954). The natural regulation of animal numbers . Clarendon Press. [ Google Scholar ]
  • Landy, F. (1986). Stamp collecting versus science ‐ Validation as hypothesis testing . American Psychologist , 41 , 1183–1192. [ Google Scholar ]
  • Loehle, C. (1987). Hypothesis testing in ecology: Psychological aspects and the importance of theory maturation . The Quarterly Review of Biology , 62 , 397–409. [ PubMed ] [ Google Scholar ]
  • May, R. M. (1981). The role of theory in ecology . American Zoologist , 21 , 903–910. [ Google Scholar ]
  • Mayr, E. (2002). What evolution is . Basic Books. [ Google Scholar ]
  • Milner, S. (2018). Newton didn't frame hypotheses. Why should we? Real Clear Science: Posted in: Physics Today April 25, 2018. 10.1063/PT.2016.2013.20180424a [ CrossRef ]
  • Munafò, M. , Nosek, B. , Bishop, D. , Button, K. S. , Chambers, C. D. , du Sert, N. P. , Simonsohn, U. , Wagenmakers, E.‐J. , Ware, J. J. , & Ioannidis, J. P. A. (2017). A manifesto for reproducible science . Nature Human Behavior , 1 , 0021. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Myers, N. , Mittermeier, R. A. , Mittermeier, C. G. , da Fonseca, G. A. B. , & Kent, J. (2000). Biodiversity hotspots for conservation priorities . Nature , 403 , 853. [ PubMed ] [ Google Scholar ]
  • O'Neill, R. V. , Johnson, A. R. , & King, A. W. (1989). A hierarchical framework for the analysis of scale . Landscape Ecology , 3 , 193–205. 10.1007/BF00131538 [ CrossRef ] [ Google Scholar ]
  • Opthof, T. , Furstner, F. , van Geer, M. , & Coronel, R. (2000). Regrets or no regrets? No regrets! The fate of rejected manuscripts . Cardiovascular Research , 45 , 255–258. 10.1016/S0008-6363(99)00339-9 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Osborne Patrick, E. , Foody Giles, M. , & Suárez‐Seoane, S. (2007). Non‐stationarity and local approaches to modelling the distributions of wildlife . Diversity and Distributions , 13 , 313–323. 10.1111/j.1472-4642.2007.00344.x [ CrossRef ] [ Google Scholar ]
  • Paine, C. E. T. , & Fox, C. W. (2018). The effectiveness of journals as arbiters of scientific impact . Ecology and Evolution , 8 , 9666–9685. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Paine, R. T. (1966). Food web complexity and species diversity . American Naturalist , 100 , 65–75. 10.1086/282400 [ CrossRef ] [ Google Scholar ]
  • Peters, R. H. (1991). A critique for ecology . Nature. Cambridge University Press. [ Google Scholar ]
  • Pettorelli, N. , Nagendra, H. , Rocchini, D. , Rowcliffe, M. , Williams, R. , Ahumada, J. , de Angelo, C. , Atzberger, C. , Boyd, D. , Buchanan, G. , Chauvenet, A. , Disney, M. , Duncan, C. , Fatoyinbo, T. , Fernandez, N. , Haklay, M. , He, K. , Horning, N. , Kelly, N. , … Wegmann, M. (2017). Remote sensing in ecology and conservation: Three years on . Remote Sensing in Ecology and Conservation , 3 , 53–56. [ Google Scholar ]
  • Platt, J. R. (1964). Strong inference . Science , 146 , 347–353. [ PubMed ] [ Google Scholar ]
  • Popper, K. (1959). The logic of scientific discovery . Basic Books. [ Google Scholar ]
  • Quinn, J. F. , & Dunham, A. E. (1983). On hypothesis testing in ecology and evolution . The American Naturalist , 122 , 602–617. 10.1086/284161 [ CrossRef ] [ Google Scholar ]
  • Ranehill, E. , Dreber, A. , Johannesson, M. , Leiberg, S. , Sul, S. , & Weber, R. A. (2015). Assessing the robustness of power posing: No effect on hormones and risk tolerance in a large sample of men and women . Psychological Science , 26 , 653–656. 10.1177/0956797614553946 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Romesburg, H. C. (1981). Wildlife science: Gaining reliable knowledge . The Journal of Wildlife Management , 45 , 293–313. 10.2307/3807913 [ CrossRef ] [ Google Scholar ]
  • Rosen, J. (2016). Research protocols: A forest of hypotheses . Nature , 536 , 239–241. 10.1038/nj7615-239a [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Schimel, J. (2011). Writing science: How to write papers that get cited and proposals that get funded . Oxford University Press. [ Google Scholar ]
  • Sells, S. N. , Bassing, S. B. , Barker, K. J. , Forshee, S. C. , Keever, A. C. , Goerz, J. W. , & Mitchell, M. S. (2018). Increased scientific rigor will improve reliability of research and effectiveness of management . Journal of Wildlife Management , 82 , 485–494. 10.1002/jwmg.21413 [ CrossRef ] [ Google Scholar ]
  • Smaldino, P. E. , & McElreath, R. (2016). The natural selection of bad science . Royal Society Open Science , 3 ( 9 ), 160384. 10.1098/rsos.160384 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Stafleu, F. (1971). Lamarck: The birth of biology . Taxon , 20 , 397–442. 10.2307/1218244 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sutherland, W. J. , Spiegelhalter, D. , & Burgman, M. A. (2013). Twenty tips for interpreting scientific claims . Nature , 503 , 335–337. [ PubMed ] [ Google Scholar ]
  • Symes, L. B. , Serrell, N. , & Ayres, M. P. (2015). A pactical guide for mentoring scientific inquiry . The Bulletin of the Ecological Society of America , 96 , 352–367. [ Google Scholar ]
  • Tewksbury, J. J. , Anderson, J. G. T. , Bakker, J. D. , Billo, T. J. , Dunwiddie, P. W. , Groom, M. J. , Hampton, S. E. , Herman, S. G. , Levey, D. J. , Machnicki, N. J. , del Rio, C. M. , Power, M. E. , Rowell, K. , Salomon, A. K. , Stacey, L. , Trombulak, S. C. , & Wheeler, T. A. (2014). Natural history's place in science and society . BioScience , 64 , 300–310. 10.1093/biosci/biu032 [ CrossRef ] [ Google Scholar ]
  • Thompson, J. N. (2005). The geographic mosaic of coevolution . University of Chicago Press. [ Google Scholar ]
  • Thornton, A. , & Lee, P. (2000). Publication bias in meta‐analysis: Its causes and consequences . Journal of Clinical Epidemiology , 53 , 207–216. 10.1016/S0895-4356(99)00161-4 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • van Dijk, D. , Manor, O. , & Carey, L. B. (2014). Publication metrics and success on the academic job market . Current Biology , 24 , R516–R517. 10.1016/j.cub.2014.04.039 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Voosen, P. (2017). The AI detectives . Science , 357 , 22–27. 10.1126/science.357.6346.22 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Weinberg, R. (2010). Point: Hypotheses first . Nature , 464 , 678. 10.1038/464678a [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wilson, K. A. , McBride, M. F. , Bode, M. , & Possingham, H. P. (2006). Prioritizing global conservation efforts . Nature , 440 , 337. 10.1038/nature04366 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wolff, J. O. , & Krebs, C. J. (2008). Hypothesis testing and the scientific method revisited . Acta Zoologica Sinica , 54 , 383–386. [ Google Scholar ]

Pardon Our Interruption

As you were browsing something about your browser made us think you were a bot. There are a few reasons this might happen:

  • You've disabled JavaScript in your web browser.
  • You're a power user moving through this website with super-human speed.
  • You've disabled cookies in your web browser.
  • A third-party browser plugin, such as Ghostery or NoScript, is preventing JavaScript from running. Additional information is available in this support article .

To regain access, please make sure that cookies and JavaScript are enabled before reloading the page.

Logo for University of Minnesota Libraries

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

41 Testing the Red Queen Hypothesis

The Red Queen hypothesis—that sex evolved to combat our coevolving pathogens—can be tested by analyzing a few key predictions of this hypothesis:

  • Sex is most beneficial where there is a high risk of infection
  • Pathogens are more likely to attack common phenotypes (for example, clones) in a population, as opposed to the less-common counterparts (such as those that resulted from sex)
  • In sexually reproducing populations, individuals choose mates that maximize diversity in their offspring

Note that all of these predictions implicitly rely on the heritability of being healthy (in this case, the ability to combat pathogens); specifically, parents must be able to pass along to their offspring genes for avoiding pathogens. Testing these predictions has resulted in several lines of evidence supporting the Red Queen hypothesis.

Prediction 1: Sex is most beneficial where there is a high risk of infection

An excellent system for testing this prediction involves a flatworm parasite in the genus Microphallus, a duck, and a small mud snail ( Potamopyrgus antipodarum ; Figure 7.6). This species of snail is able to reproduce sexually or asexually. The extent of sexual reproduction in a population of snails can be quantified by counting the number of males—asexual snails are all female.

Image depicting Potamopyrgus antipodarum.

The flatworm’s life cycle begins inside of the snail, where the worm emerges from its egg. Infected snails are consumed by ducks. Once in the duck’s intestine, adult worms have sex and produce eggs. Flatworm eggs are released, with duck feces, into the water, where they are ingested by snails and the cycle continues (Figure 2). Snails are harmed by this flatworm, largely because a symptom of infection is sterilization (the flatworm’s scientific name, Microphallus , translates to “small penis”).

Image depicting the life cycle of Microphallus

Observations of this system in two New Zealand lakes (Alexandrina and Kaniere) revealed that snails are more likely to be sexual (measured by frequency of males) in shallow waters, where ducks feed, than in deeper waters, where ducks do not feed (Figure 3).

Image depicting the Infection rates in shallow-water and deep-water snails.

These results suggest that coevolutionary pressure is greater on the snails in the shallows, presumably because the feeding ducks effectively “close the circle” on the worm’s life cycle. Finally, higher infection rates in the shallows indicate that, in support of Prediction 1, above, sex is most beneficial where there is a high risk of infection.

Prediction 2:  Pathogens are more likely to attack common phenotypes in a population, as opposed to the less-common counterparts

In the Mexican desert there are isolated pools inhabited by a species of minnow.  Within these pools, populations of asexually reproducing individuals exist alongside sexually reproducing individuals.  Fish in these ponds exhibit “black spot disease,” which is caused by a parasitic flatworm.  Investigators have observed the frequency of sexual and asexual fish and the number of black spots in each type of fish in these ponds.  Clonal fish are likely to have the most common phenotype in these ponds (as they are genetically identical to each other), while the sexually reproducing fish will have a wide variety of infrequent phenotypes.  As the Red Queen predicts, the common type of fish (usually one of the clonal species) had the highest number of parasitic spots.  In ponds where there was a genetically diverse, sexually reproducing population, the sexual fish had fewer spots.

Photo depicts evening primrose blooms

Additional evidence comes from the evening primrose (Figure 4), a flowering plant that–like the minnows, snails, and water fleas discussed above–exists in sexual and asexual forms.  Evening primrose can be damaged by mildew from a pathogenic fungus. The plants produce an enzyme protein called chitinase to defend themselves against this fungus. A recent comparison indicated that the sexually reproducing primrose plants had greater variety in the gene that codes for chitinase than did the asexual plants. In addition, the overall amount of chitinase expressed was higher in the sexual plants than in the asexuals. Finally, the researchers found that the plants that were more resistant to mildew damage had higher fitness (they produced more fruit, and thus more offspring) in the presence of that pathogen.  In evening primrose, greater diversity in a key gene renders an individual less susceptible to a pathogen, supporting the prediction that parasites are more likely to attack the most common phenotype in a population, and providing additional evidence for The Red Queen.

Know Your Pathogens

A pathogen is something that infects and causes a fitness cost in another organism.  Pathogens come in a wide variety; some of them are not even considered living!

Prions – Prions are non-living infectious agents that are misfolded proteins.

Viruses – Whether you consider viruses alive or not depends on your definition of life.  Viruses are protein-encased DNA or RNA entities that hijack a cell’s replication machinery to reproduce.  Viral infections include influenza, HIV, HPV, and herpes.

Fungal pathogens – Fungi are responsible for a variety of infections including mildew, thrush, athlete’s foot and smut.

Bacteria – Bacteria are prokaryotic organisms that occur everywhere. There are more bacteria in and on you than there are cells in your body. Fortunately, the vast majority of bacteria are benign. However, some bacteria cause problems such as urinary-tract infections, some kinds of pneumonia, ear infections, pertussis (whooping cough), chlamydia, gonorrhea, and syphilis.

Protists – Protists are single-celled eukaryotes that cause diseases such as malaria and amoebic dysentery.

Animals –Common animal pathogens include lice, many types of worms, and parasitic wasps.

Prediction 3: In sexually reproducing populations, individuals choose mates that maximize diversity in their offspring

If there is a fitness advantage to diversity, parents can best maximize their offspring’s potential (and have more grand-offspring) with careful mate choice. There are numerous examples of organisms preferring mates that increase offspring diversity, and shunning mates that might do the opposite. Even many hermaphrodites, with both male and female sex organs, seek other hermaphrodites for copulation…even if they are capable of self-fertilization.

An excellent model for studying mate choice is Atlantic Salmon, an important commercial fish that lives its life in the ocean and returns to freshwaters to mate (or spawn ). Sofia Consuegra and Carlos Garcia de Leaniz compared offspring diversity of salmon that were mated in a commercial fish hatchery (and unable to choose their mates) against that of salmon allowed to choose mates in the wild. The hatchery-spawned fish exhibited lower diversity than did the wild-spawned fish. Furthermore, hatchery-spawned fish displayed a greater number of roundworm parasites ( Anisakis ) then did their wild-spawned counterparts (Figure 5).  These results support the prediction that individuals will choose mates that maximize diversity in their offspring. Also, this work adds fuel to The Red Queen hypothesis by illustrating a potential benefit to Atlantic Salmon–namely, parasite avoidance.

Variation in abundance of Anisakis per salmon (median parasite load) in the progeny of wild and hatchery Atlantic salmon returning to rivers to spawn. Box and whisker plots show median values with notches extending to 95% CI around the median, first and third quartiles (boxes), 90% of values (whiskers) and extreme data points (asterisks and circles). Compared with artificially bred salmon deprived of mate choice, the offspring of wild salmon that were allowed to mate freely show significantly lower parasite loads ( p

Introductory Biology: Evolutionary and Ecological Perspectives Copyright © by Various Authors - See Each Chapter Attribution is licensed under a Creative Commons Attribution 4.0 International License , except where otherwise noted.

Share This Book

bioRxiv

Ground nesting of soft eggs by extinct birds and a new parity mode switch hypothesis for the evolution of animal reproduction

  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for M. Jorge Guimarães
  • For correspondence: [email protected]
  • ORCID record for M. Fátima Cerqueira
  • ORCID record for Yi-Hsiu Chung
  • ORCID record for Pedro Alpuim
  • ORCID record for Tzu-Chen Yen
  • Info/History
  • Preview PDF

Nearshore ground nesting of soft eggs by extinct birds is demonstrated here, providing a new explanation for the abundance of bird fossils in early Cretaceous lacustrine environments, where humidity conditions required for soft egg incubation would have been present. This reinforces recent findings of Archaeopteryx soft eggs near Jurassic marine environments, the possibility that wings and elongated feathers developed primarily in association with nest protection on the ground and only secondarily with flight, and the origin of flight from the ground up. Notably, soft eggs preceded rigid eggs in evolution, but both crocodiles, whose ancestors seem to have antedated bird precursors, and extant birds reproduce exclusively via hard-shelled eggs. Therefore, an explanation is in order for how reproduction via soft eggs could have occurred in the bird lineage in-between two evolutionary moments of reproduction via rigid eggs. In alternative to the commonly accepted convergent evolution of viviparity and rigid eggshells, a parity mode switch hypothesis is presented here. It postulates the existence, since the rise of animals, of an inherited ancestral parity mode switch between viviparity and oviparity. This switch would have evolved to embrace hard-shelled oviparity after rigid eggshells appeared in evolution. Commitment to a particular parity mode or eggshell type may have conditioned survival of entire animal groups, especially during major extinction events, explaining, among others, the extinction of all birds that reproduced via soft eggshells.

Competing Interest Statement

The authors have declared no competing interest.

Figure 6 was corrected in relation to the relative positions of the “First amniote” and “Salamanders.”

View the discussion thread.

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Twitter logo

Citation Manager Formats

  • EndNote (tagged)
  • EndNote 8 (xml)
  • RefWorks Tagged
  • Ref Manager
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Evolutionary Biology
  • Animal Behavior and Cognition (5561)
  • Biochemistry (12644)
  • Bioengineering (9520)
  • Bioinformatics (30992)
  • Biophysics (15939)
  • Cancer Biology (13019)
  • Cell Biology (18633)
  • Clinical Trials (138)
  • Developmental Biology (10069)
  • Ecology (15055)
  • Epidemiology (2067)
  • Evolutionary Biology (19245)
  • Genetics (12798)
  • Genomics (17630)
  • Immunology (12764)
  • Microbiology (29868)
  • Molecular Biology (12453)
  • Neuroscience (65096)
  • Paleontology (483)
  • Pathology (2013)
  • Pharmacology and Toxicology (3479)
  • Physiology (5380)
  • Plant Biology (11159)
  • Scientific Communication and Education (1730)
  • Synthetic Biology (3072)
  • Systems Biology (7716)
  • Zoology (1737)

IMAGES

  1. Everything You Need To Know about Hypothesis Testing

    evolution of hypothesis testing

  2. Hypothesis Testing Infographic

    evolution of hypothesis testing

  3. What is Hypothesis Testing? Types and Methods

    evolution of hypothesis testing

  4. Hypothesis Testing Steps & Real Life Examples

    evolution of hypothesis testing

  5. Hypothesis testing

    evolution of hypothesis testing

  6. 5 Steps of Hypothesis Testing with Examples

    evolution of hypothesis testing

COMMENTS

  1. Testing a hypothesis

    Disciplinary Core Idea LS4.C: Adaptation. NOS Matrix understanding category 2. Scientific knowledge is based on empirical evidence. NOS Matrix understanding category 7. Science is a human endeavor. Science and Engineering Practice 3. Planning and carrying out investigations. Science and Engineering Practice 4. Analyzing and interpreting data.

  2. When are hypotheses useful in ecology and evolution?

    To examine hypothesis use over time and test whether hypothesis presence was associated with research type (basic vs. applied), journal impact factor, citation rates, and grants, we sampled the ecology and evolution literature using a stratified random sample of ecology and evolution journals in existence before 1991.

  3. Statistical hypothesis test

    The above image shows a table with some of the most common test statistics and their corresponding tests or models.. A statistical hypothesis test is a method of statistical inference used to decide whether the data sufficiently supports a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic.Then a decision is made, either by comparing the ...

  4. HyPhy 2.5—A Customizable Platform for Evolutionary Hypothesis Testing

    Abstract. HYpothesis testing using PHYlogenies (HyPhy) is a scriptable, open-source package for fitting a broad range of evolutionary models to multiple sequence alignments, and for conducting subsequent parameter estimation and hypothesis testing, primarily in the maximum likelihood statistical framework.

  5. When should we use one-tailed hypothesis testing?

    Methods in Ecology and Evolution is an open access journal publishing papers across a wide range of subdisciplines, disseminating new methods in ecology and evolution. ... we should emphasize that we agree with the sentiment that alternatives to null hypothesis testing should be given greater prominence by researchers (see Stephens 2005, 2007 ...

  6. Darwin and the Scientific Method

    Testing a hypothesis involves at least 4 different activities (Ayala, 1994). First, the hypothesis must be examined for internal consistency. ... The evolution of organisms, it is argued, is a historical process that depends on unique and unpredictable events, and thus is not subject to the formulation of testable hypotheses and theories. Such ...

  7. On Hypothesis Testing in Ecology and Evolution

    Theories of causality in ecology and evolution rarely lend themselves to analysis by the formal method of "hypothesis testing" envisioned by champions of a "strong inference" model of scientific method. The objective of biological research typically is to assess the relative contributions of a number of potential causal agents operating simultaneously. Sensibly stated hypotheses in the ...

  8. Hypothesis tests

    A hypothesis test is a procedure used in statistics to assess whether a particular viewpoint is likely to be true. They follow a strict protocol, and they generate a 'p-value', on the basis of which a decision is made about the truth of the hypothesis under investigation.All of the routine statistical 'tests' used in research—t-tests, χ 2 tests, Mann-Whitney tests, etc.—are all ...

  9. Testing hypotheses in the historical sciences

    Evolution. Ecology. In ecology and evolution, not all hypotheses are susceptible to experimentation: the system might be too large or complex; the events might have occurred in the past; or the process might be slow. Because we cannot devise an experiment to test them, such 'historical' hypotheses are commonly pooh-poohed as subjective and ...

  10. HyPhy: Hypothesis Testing Using Phylogenies

    Molecular Biology and Evolution, 10:1396-1401, 1993. Google Scholar Z. Yang. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods. Journal of Molecular Evolution., 39:105-111, 1994. Google Scholar Z. H. Yang.

  11. (PDF) When are hypotheses useful in ecology and evolution?

    hypothesis use has declined in ecology and evolution since the 1990s, given the substantial 20 advancement of tools further facilitating descriptive, correlative research (e.g., Cutler et al 2007,

  12. Testing Hypotheses of Molecular Evolution

    The maximum genetic diversity hypothesis is much better supported in this case than the molecular clock hypothesis (Example 3), though both hypotheses would be rejected at the 5% level according to null hypothesis significance testing . The numbers shown were calculated from the data of Huang [59, Table S3]

  13. Phylogenomic Testing of Root Hypotheses

    In particular, the testing of an LCA hypothesis of a single basal species partitioned from the other species is limited to those gene families that include the basal species. The last two partitions rejected in figure 5 are indeed single species partitions, and the number of gene trees that are informative relative both to them and to the ...

  14. 4 Examples of Hypothesis Testing in Real Life

    Example 1: Biology. Hypothesis tests are often used in biology to determine whether some new treatment, fertilizer, pesticide, chemical, etc. causes increased growth, stamina, immunity, etc. in plants or animals. For example, suppose a biologist believes that a certain fertilizer will cause plants to grow more during a one-month period than ...

  15. Hypothesis testing in evolutionary developmental biology: a

    Developmental data have the potential to give novel insights into morphological evolution. Because developmental data are time-consuming to obtain, support for hypotheses often rests on data from only a few distantly related species. ... Hypothesis testing in evolutionary developmental biology: a case study from insect wings J Hered. 2004 Sep ...

  16. An Introduction to Statistics: Understanding Hypothesis Testing and

    HYPOTHESIS TESTING. A clinical trial begins with an assumption or belief, and then proceeds to either prove or disprove this assumption. In statistical terms, this belief or assumption is known as a hypothesis. Counterintuitively, what the researcher believes in (or is trying to prove) is called the "alternate" hypothesis, and the opposite ...

  17. Understanding Hypothesis Testing

    Hypothesis testing is an important procedure in statistics. Hypothesis testing evaluates two mutually exclusive population statements to determine which statement is most supported by sample data. When we say that the findings are statistically significant, thanks to hypothesis testing.

  18. Evolution as fact and theory

    A fact is a hypothesis that is so firmly supported by evidence that we assume it is true, and act as if it were true. In the sense that evolution is overwhelmingly validated by the evidence, it is a fact. It is frequently said to be a fact in the same way as the Earth's revolution around the Sun is a fact.

  19. When are hypotheses useful in ecology and evolution?

    To examine hypothesis use over time and test whether hypothesis presence was associated with research type (basic vs. applied), journal impact factor, citation rates, and grants, we sampled the ecology and evolution literature using a stratified random sample of ecology and evolution journals in existence before 1991.

  20. Red Queen hypothesis

    The Red Queen's hypothesis is a hypothesis in evolutionary biology proposed in 1973, that species must constantly adapt, evolve, and proliferate in order to survive while pitted against ever-evolving opposing species.The hypothesis was intended to explain the constant (age-independent) extinction probability as observed in the paleontological record caused by co-evolution between competing ...

  21. Exploring the Evolution of Hypothesis Testing in Secondary

    The history of hypothesis testing can be traced back to the early 1900s, when statisticians such as Karl Pearson and Ronald Fisher developed the first methods. Pearson developed the p- value and chi-squared test both of which are topics discussed in secondary statistics classes. These methods were later refined and extended by other ...

  22. 41 Testing the Red Queen Hypothesis

    41. Testing the Red Queen Hypothesis. The Red Queen hypothesis—that sex evolved to combat our coevolving pathogens—can be tested by analyzing a few key predictions of this hypothesis: Sex is most beneficial where there is a high risk of infection. Pathogens are more likely to attack common phenotypes (for example, clones) in a population ...

  23. Ground nesting of soft eggs by extinct birds and a new parity mode

    Therefore, an explanation is in order for how reproduction via soft eggs could have occurred in the bird lineage in-between two evolutionary moments of reproduction via rigid eggs. In alternative to the commonly accepted convergent evolution of viviparity and rigid eggshells, a parity mode switch hypothesis is presented here.