Research Hypothesis In Psychology: Types, & Examples

Saul McLeod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

A research hypothesis, in its plural form “hypotheses,” is a specific, testable prediction about the anticipated results of a study, established at its outset. It is a key component of the scientific method .

Hypotheses connect theory to data and guide the research process towards expanding scientific understanding

Some key points about hypotheses:

  • A hypothesis expresses an expected pattern or relationship. It connects the variables under investigation.
  • It is stated in clear, precise terms before any data collection or analysis occurs. This makes the hypothesis testable.
  • A hypothesis must be falsifiable. It should be possible, even if unlikely in practice, to collect data that disconfirms rather than supports the hypothesis.
  • Hypotheses guide research. Scientists design studies to explicitly evaluate hypotheses about how nature works.
  • For a hypothesis to be valid, it must be testable against empirical evidence. The evidence can then confirm or disprove the testable predictions.
  • Hypotheses are informed by background knowledge and observation, but go beyond what is already known to propose an explanation of how or why something occurs.
Predictions typically arise from a thorough knowledge of the research literature, curiosity about real-world problems or implications, and integrating this to advance theory. They build on existing literature while providing new insight.

Types of Research Hypotheses

Alternative hypothesis.

The research hypothesis is often called the alternative or experimental hypothesis in experimental research.

It typically suggests a potential relationship between two key variables: the independent variable, which the researcher manipulates, and the dependent variable, which is measured based on those changes.

The alternative hypothesis states a relationship exists between the two variables being studied (one variable affects the other).

A hypothesis is a testable statement or prediction about the relationship between two or more variables. It is a key component of the scientific method. Some key points about hypotheses:

  • Important hypotheses lead to predictions that can be tested empirically. The evidence can then confirm or disprove the testable predictions.

In summary, a hypothesis is a precise, testable statement of what researchers expect to happen in a study and why. Hypotheses connect theory to data and guide the research process towards expanding scientific understanding.

An experimental hypothesis predicts what change(s) will occur in the dependent variable when the independent variable is manipulated.

It states that the results are not due to chance and are significant in supporting the theory being investigated.

The alternative hypothesis can be directional, indicating a specific direction of the effect, or non-directional, suggesting a difference without specifying its nature. It’s what researchers aim to support or demonstrate through their study.

Null Hypothesis

The null hypothesis states no relationship exists between the two variables being studied (one variable does not affect the other). There will be no changes in the dependent variable due to manipulating the independent variable.

It states results are due to chance and are not significant in supporting the idea being investigated.

The null hypothesis, positing no effect or relationship, is a foundational contrast to the research hypothesis in scientific inquiry. It establishes a baseline for statistical testing, promoting objectivity by initiating research from a neutral stance.

Many statistical methods are tailored to test the null hypothesis, determining the likelihood of observed results if no true effect exists.

This dual-hypothesis approach provides clarity, ensuring that research intentions are explicit, and fosters consistency across scientific studies, enhancing the standardization and interpretability of research outcomes.

Nondirectional Hypothesis

A non-directional hypothesis, also known as a two-tailed hypothesis, predicts that there is a difference or relationship between two variables but does not specify the direction of this relationship.

It merely indicates that a change or effect will occur without predicting which group will have higher or lower values.

For example, “There is a difference in performance between Group A and Group B” is a non-directional hypothesis.

Directional Hypothesis

A directional (one-tailed) hypothesis predicts the nature of the effect of the independent variable on the dependent variable. It predicts in which direction the change will take place. (i.e., greater, smaller, less, more)

It specifies whether one variable is greater, lesser, or different from another, rather than just indicating that there’s a difference without specifying its nature.

For example, “Exercise increases weight loss” is a directional hypothesis.

hypothesis

Falsifiability

The Falsification Principle, proposed by Karl Popper , is a way of demarcating science from non-science. It suggests that for a theory or hypothesis to be considered scientific, it must be testable and irrefutable.

Falsifiability emphasizes that scientific claims shouldn’t just be confirmable but should also have the potential to be proven wrong.

It means that there should exist some potential evidence or experiment that could prove the proposition false.

However many confirming instances exist for a theory, it only takes one counter observation to falsify it. For example, the hypothesis that “all swans are white,” can be falsified by observing a black swan.

For Popper, science should attempt to disprove a theory rather than attempt to continually provide evidence to support a research hypothesis.

Can a Hypothesis be Proven?

Hypotheses make probabilistic predictions. They state the expected outcome if a particular relationship exists. However, a study result supporting a hypothesis does not definitively prove it is true.

All studies have limitations. There may be unknown confounding factors or issues that limit the certainty of conclusions. Additional studies may yield different results.

In science, hypotheses can realistically only be supported with some degree of confidence, not proven. The process of science is to incrementally accumulate evidence for and against hypothesized relationships in an ongoing pursuit of better models and explanations that best fit the empirical data. But hypotheses remain open to revision and rejection if that is where the evidence leads.
  • Disproving a hypothesis is definitive. Solid disconfirmatory evidence will falsify a hypothesis and require altering or discarding it based on the evidence.
  • However, confirming evidence is always open to revision. Other explanations may account for the same results, and additional or contradictory evidence may emerge over time.

We can never 100% prove the alternative hypothesis. Instead, we see if we can disprove, or reject the null hypothesis.

If we reject the null hypothesis, this doesn’t mean that our alternative hypothesis is correct but does support the alternative/experimental hypothesis.

Upon analysis of the results, an alternative hypothesis can be rejected or supported, but it can never be proven to be correct. We must avoid any reference to results proving a theory as this implies 100% certainty, and there is always a chance that evidence may exist which could refute a theory.

How to Write a Hypothesis

  • Identify variables . The researcher manipulates the independent variable and the dependent variable is the measured outcome.
  • Operationalized the variables being investigated . Operationalization of a hypothesis refers to the process of making the variables physically measurable or testable, e.g. if you are about to study aggression, you might count the number of punches given by participants.
  • Decide on a direction for your prediction . If there is evidence in the literature to support a specific effect of the independent variable on the dependent variable, write a directional (one-tailed) hypothesis. If there are limited or ambiguous findings in the literature regarding the effect of the independent variable on the dependent variable, write a non-directional (two-tailed) hypothesis.
  • Make it Testable : Ensure your hypothesis can be tested through experimentation or observation. It should be possible to prove it false (principle of falsifiability).
  • Clear & concise language . A strong hypothesis is concise (typically one to two sentences long), and formulated using clear and straightforward language, ensuring it’s easily understood and testable.

Consider a hypothesis many teachers might subscribe to: students work better on Monday morning than on Friday afternoon (IV=Day, DV= Standard of work).

Now, if we decide to study this by giving the same group of students a lesson on a Monday morning and a Friday afternoon and then measuring their immediate recall of the material covered in each session, we would end up with the following:

  • The alternative hypothesis states that students will recall significantly more information on a Monday morning than on a Friday afternoon.
  • The null hypothesis states that there will be no significant difference in the amount recalled on a Monday morning compared to a Friday afternoon. Any difference will be due to chance or confounding factors.

More Examples

  • Memory : Participants exposed to classical music during study sessions will recall more items from a list than those who studied in silence.
  • Social Psychology : Individuals who frequently engage in social media use will report higher levels of perceived social isolation compared to those who use it infrequently.
  • Developmental Psychology : Children who engage in regular imaginative play have better problem-solving skills than those who don’t.
  • Clinical Psychology : Cognitive-behavioral therapy will be more effective in reducing symptoms of anxiety over a 6-month period compared to traditional talk therapy.
  • Cognitive Psychology : Individuals who multitask between various electronic devices will have shorter attention spans on focused tasks than those who single-task.
  • Health Psychology : Patients who practice mindfulness meditation will experience lower levels of chronic pain compared to those who don’t meditate.
  • Organizational Psychology : Employees in open-plan offices will report higher levels of stress than those in private offices.
  • Behavioral Psychology : Rats rewarded with food after pressing a lever will press it more frequently than rats who receive no reward.

Print Friendly, PDF & Email

5 Falsifiability

Textbook chapters (or similar texts).

  • Deductive Logic
  • Persuasive Reasoning and Fallacies
  • The Falsifiability Criterion of Science
  • Understanding Science

Journal articles

  • Why a Confirmation Strategy Dominates Psychological Science

*******************************************************

Inquiry-based Activity:  Popular media and falsifiability

Introduction : Falsifiability, or the ability for a statement/theory to be shown to be false, was noted by Karl Popper to be the clearest way to distinguish science from pseudoscience. While incredibly important to scientific inquiry, it is also important for students to understand how this criterion can be applied to the news and information they interact with in their day-to-day lives. In this activity, students will apply the logic of falsifiability to rumors and news they have heard of in the popular media, demonstrating the applicability of scientific thinking to the world beyond the classroom.

Question to pose to students : Think about the latest celebrity rumor you have heard about in the news or through social media. If you cannot think of one, some examples might include, “the CIA killed Marilyn Monroe” and “Tupac is alive.” Have students get into groups, discuss their rumors, and select one to work with.

Note to instructors: Please modify/update these examples if needed to work for the students in your course. Snopes is a good source for recent examples.

Students form a hypothesis : Thinking about that rumor, decide what evidence would be necessary to prove that it was correct. That is, imagine you were a skeptic and automatically did not believe the rumor – what would someone need to tell or show you to convince you that it was true?

Students test their hypotheses : Each group (A) should then pair up with one other group (B) and try to convince them their rumor is true, providing them with the evidence from above. Members of group B should then come up with any reasons they can think of why the rumor may still be false. For example – if “Tupac is alive” is the rumor and “show the death certificate” is a piece of evidence provided by group A, group B could posit that the death certificate was forged by whoever kidnapped Tupac. Once group B has evaluated all of group A’s evidence, have the groups switch such that group B is now trying to convince group A about their rumor.

Do the students’ hypotheses hold up? : Together, have the groups work out whether the rumors they discussed are falsifiable. That is, can it be “proven?” Remember, a claim is non-falsifiable if there can always be an explanation for the absence of evidence and/or an exhaustive search for evidence would be required. Depending on the length of your class, students can repeat the previous step with multiple groups.

Creative Commons License

Share This Book

  • Increase Font Size

psychology

Definition of Falsification Principle:

The Falsification Principle, also known as the doctrine of falsifiability, is a key concept in the philosophy of science developed by philosopher Karl Popper. It states that for a theory or hypothesis to be considered scientific, it must be capable of being proven false or refuted through empirical observations or experiments.

Key Points:

  • Empirical Approach: The Falsification Principle focuses on the empirical approach to scientific inquiry, emphasizing the importance of observation, experimentation, and evidence.
  • Falsifiability Criterion: According to this principle, a scientific theory should make specific claims or predictions that can be tested and potentially proven false. If a theory cannot be subjected to empirical scrutiny, it falls outside the realm of science.
  • Refutability: The ability to be refuted is a crucial aspect of scientific theories. A theory that is not falsifiable, meaning it cannot be proven false through observation or experimentation, is inherently unscientific as it cannot be tested against empirical evidence.
  • Demarcation Criteria: The Falsification Principle provides a demarcation criterion to differentiate between scientific and non-scientific statements or ideas. It helps set boundaries for what can be considered scientific knowledge and guides the scientific method.
  • Progress in Science: The Falsification Principle contributes to the progress of science by encouraging the formulation of testable hypotheses, allowing for refinement and improvement of theories through the rejection or modification of falsified ideas.

  • Foundations
  • Write Paper

Search form

  • Experiments
  • Anthropology
  • Self-Esteem
  • Social Anxiety
  • Foundations >
  • Reasoning >

Falsifiability

Karl popper's basic scientific principle, karl popper's basic scientific principle.

Falsifiability, according to the philosopher Karl Popper, defines the inherent testability of any scientific hypothesis.

This article is a part of the guide:

  • Inductive Reasoning
  • Deductive Reasoning
  • Hypothetico-Deductive Method
  • Scientific Reasoning
  • Testability

Browse Full Outline

  • 1 Scientific Reasoning
  • 2.1 Falsifiability
  • 2.2 Verification Error
  • 2.3 Testability
  • 2.4 Post Hoc Reasoning
  • 3 Deductive Reasoning
  • 4.1 Raven Paradox
  • 5 Causal Reasoning
  • 6 Abductive Reasoning
  • 7 Defeasible Reasoning

Science and philosophy have always worked together to try to uncover truths about the universe we live in. Indeed, ancient philosophy can be understood as the originator of many of the separate fields of study we have today, including psychology, medicine, law, astronomy, art and even theology.

Scientists design experiments and try to obtain results verifying or disproving a hypothesis, but philosophers are interested in understanding what factors determine the validity of scientific endeavors in the first place.

Whilst most scientists work within established paradigms, philosophers question the paradigms themselves and try to explore our underlying assumptions and definitions behind the logic of how we seek knowledge. Thus there is a feedback relationship between science and philosophy - and sometimes plenty of tension!

One of the tenets behind the scientific method is that any scientific hypothesis and resultant experimental design must be inherently falsifiable. Although falsifiability is not universally accepted, it is still the foundation of the majority of scientific experiments. Most scientists accept and work with this tenet, but it has its roots in philosophy and the deeper questions of truth and our access to it.

falsifiability hypothesis definition psychology

What is Falsifiability?

Falsifiability is the assertion that for any hypothesis to have credence, it must be inherently disprovable before it can become accepted as a scientific hypothesis or theory.

For example, someone might claim "the earth is younger than many scientists state, and in fact was created to appear as though it was older through deceptive fossils etc.” This is a claim that is unfalsifiable because it is a theory that can never be shown to be false. If you were to present such a person with fossils, geological data or arguments about the nature of compounds in the ozone, they could refute the argument by saying that your evidence was fabricated to appeared that way, and isn’t valid.

Importantly, falsifiability doesn’t mean that there are currently arguments against a theory, only that it is possible to imagine some kind of argument which would invalidate it. Falsifiability says nothing about an argument's inherent validity or correctness. It is only the minimum trait required of a claim that allows it to be engaged with in a scientific manner – a dividing line between what is considered science and what isn’t. Another important point is that falsifiability is not any claim that has yet to be proven true. After all, a conjecture that hasn’t been proven yet is just a hypothesis.

The idea is that no theory is completely correct , but if it can be shown both to be falsifiable  and supported with evidence that shows it's true, it can be accepted as truth.

For example, Newton's Theory of Gravity was accepted as truth for centuries, because objects do not randomly float away from the earth. It appeared to fit the data obtained by experimentation and research , but was always subject to testing.

However, Einstein's theory makes falsifiable predictions that are different from predictions made by Newton's theory, for example concerning the precession of the orbit of Mercury, and gravitational lensing of light. In non-extreme situations Einstein's and Newton's theories make the same predictions, so they are both correct. But Einstein's theory holds true in a superset of the conditions in which Newton's theory holds, so according to the principle of Occam's Razor , Einstein's theory is preferred. On the other hand, Newtonian calculations are simpler, so Newton's theory is useful for almost any engineering project, including some space projects. But for GPS we need Einstein's theory. Scientists would not have arrived at either of these theories, or a compromise between both of them, without the use of testable, falsifiable experiments. 

Popper saw falsifiability as a black and white definition; that if a theory is falsifiable, it is scientific , and if not, then it is unscientific. Whilst some "pure" sciences do adhere to this strict criterion, many fall somewhere between the two extremes, with  pseudo-sciences  falling at the extreme end of being unfalsifiable. 

falsifiability hypothesis definition psychology

Pseudoscience

According to Popper, many branches of applied science, especially social science, are not truly scientific because they have no potential for falsification.

Anthropology and sociology, for example, often use case studies to observe people in their natural environment without actually testing any specific hypotheses or theories.

While such studies and ideas are not falsifiable, most would agree that they are scientific because they significantly advance human knowledge.

Popper had and still has his fair share of critics, and the question of how to demarcate legitimate scientific enquiry can get very convoluted. Some statements are logically falsifiable but not practically falsifiable – consider the famous example of “it will rain at this location in a million years' time.” You could absolutely conceive of a way to test this claim, but carrying it out is a different story.

Thus, falsifiability is not a simple black and white matter. The Raven Paradox shows the inherent danger of relying on falsifiability, because very few scientific experiments can measure all of the data, and necessarily rely upon generalization . Technologies change along with our aims and comprehension of the phenomena we study, and so the falsifiability criterion for good science is subject to shifting.

For many sciences, the idea of falsifiability is a useful tool for generating theories that are testable and realistic. Testability is a crucial starting point around which to design solid experiments that have a chance of telling us something useful about the phenomena in question. If a falsifiable theory is tested and the results are significant , then it can become accepted as a scientific truth.

The advantage of Popper's idea is that such truths can be falsified when more knowledge and resources are available. Even long accepted theories such as Gravity, Relativity and Evolution are increasingly challenged and adapted.

The major disadvantage of falsifiability is that it is very strict in its definitions and does not take into account the contributions of sciences that are observational and descriptive .

  • Psychology 101
  • Flags and Countries
  • Capitals and Countries

Martyn Shuttleworth , Lyndsay T Wilson (Sep 21, 2008). Falsifiability. Retrieved Sep 12, 2024 from Explorable.com: https://explorable.com/falsifiability

You Are Allowed To Copy The Text

The text in this article is licensed under the Creative Commons-License Attribution 4.0 International (CC BY 4.0) .

This means you're free to copy, share and adapt any parts (or all) of the text in the article, as long as you give appropriate credit and provide a link/reference to this page.

That is it. You don't need our permission to copy the article; just include a link/reference back to this page. You can use it freely (with some kind of link), and we're also okay with people reprinting in publications like books, blogs, newsletters, course-material, papers, wikipedia and presentations (with clear attribution).

Want to stay up to date? Follow us!

Save this course for later.

Don't have time for it all now? No problem, save it as a course and come back to it later.

Footer bottom

  • Privacy Policy

falsifiability hypothesis definition psychology

  • Subscribe to our RSS Feed
  • Like us on Facebook
  • Follow us on Twitter

Encyclopedia Britannica

  • History & Society
  • Science & Tech
  • Biographies
  • Animals & Nature
  • Geography & Travel
  • Arts & Culture
  • Games & Quizzes
  • On This Day
  • One Good Fact
  • New Articles
  • Lifestyles & Social Issues
  • Philosophy & Religion
  • Politics, Law & Government
  • World History
  • Health & Medicine
  • Browse Biographies
  • Birds, Reptiles & Other Vertebrates
  • Bugs, Mollusks & Other Invertebrates
  • Environment
  • Fossils & Geologic Time
  • Entertainment & Pop Culture
  • Sports & Recreation
  • Visual Arts
  • Demystified
  • Image Galleries
  • Infographics
  • Top Questions
  • Britannica Kids
  • Saving Earth
  • Space Next 50
  • Student Center

Aristotle (384-322 BC), Ancient Greek philosopher and scientist. One of the most influential philosophers in the history of Western thought, Aristotle established the foundations for the modern scientific method of enquiry. Statue

criterion of falsifiability

Our editors will review what you’ve submitted and determine whether to revise the article.

  • CORE - An Analysis of the Falsification Criterion of Karl Popper: A Critical Review
  • Simply Psychology - Karl Popper: Theory of Falsification
  • Academia - Is Popper’s Criterion of Falsifiability a Solution to the Problem of Demarcation?

criterion of falsifiability , in the philosophy of science , a standard of evaluation of putatively scientific theories, according to which a theory is genuinely scientific only if it is possible in principle to establish that it is false. The British philosopher Sir Karl Popper (1902–94) proposed the criterion as a foundational method of the empirical sciences. He held that genuinely scientific theories are never finally confirmed, because disconfirming observations (observations that are inconsistent with the empirical predictions of the theory) are always possible no matter how many confirming observations have been made. Scientific theories are instead incrementally corroborated through the absence of disconfirming evidence in a number of well-designed experiments. According to Popper, some disciplines that have claimed scientific validity—e.g., astrology , metaphysics , Marxism , and psychoanalysis —are not empirical sciences, because their subject matter cannot be falsified in this manner.

Algor Cards

Cosa ne pensi di noi?

Il tuo nome

La tua email

falsifiability hypothesis definition psychology

Falsifiability in Psychology

Mappa concettuale.

falsifiability hypothesis definition psychology

Exploring the principle of falsifiability in psychology, this content delves into how hypotheses must be testable to be scientifically viable. It discusses the influence of research paradigms on psychological studies and the various perspectives on truth within the field. The text highlights the importance of crafting falsifiable hypotheses across different psychological domains and the role of deductive reasoning in testing these hypotheses. It also touches on the challenges of seeking absolute truth in psychological research, given the subjective nature of human experience.

Mostra di più

Falsifiability as a Critical Principle

Definition of falsifiability.

Falsifiability is the principle that a hypothesis must be able to be disproven through empirical testing

Importance of Falsifiability in Psychology

Distinguishing Scientific Theories from Subjective Beliefs

Falsifiability ensures that psychological theories are based on empirical evidence rather than personal beliefs

Facilitating Progress in Psychological Science

Falsifiability allows for paradigm shifts and the advancement of psychological knowledge

Agreement on Falsifiability in Psychology

Despite differing perspectives, the field of psychology recognizes the importance of falsifiability for theories and hypotheses

Research Paradigms in Psychology

Definition of research paradigms.

Research paradigms are fundamental perspectives that shape how psychological phenomena are studied and understood

Influence of Intellectual Climate on Research Paradigms

Research paradigms are subject to change as new scientific evidence emerges, influenced by the intellectual climate of the times

Thomas Kuhn's Theory of Scientific Revolutions

Kuhn's theory explains how anomalies in the prevailing paradigm can lead to paradigm shifts in psychological science

Schools of Thought in Psychology

Definition of schools of thought.

Schools of thought in psychology offer different interpretations of truth and understanding of human behavior

Humanistic Psychology

Humanistic psychologists prioritize individual agency and free will in understanding behavior

Biological Psychology

Biological psychologists focus on the deterministic roles of genetics and physiology in behavior

Psychodynamic Psychology

Psychodynamic theorists explore the unconscious influences on behavior

Cognitive Psychology

Cognitive psychologists examine the processes of thought and perception

Falsifiability in Psychological Research

Examples of falsifiable hypotheses.

Hypotheses in psychology, such as those in biological, cognitive, psychodynamic, and behaviorist perspectives, are structured to be testable

Role of Deductive Reasoning in Falsifiability

Deductive reasoning, where specific hypotheses are derived from general theories and then tested, is essential in the principle of falsifiability

Importance of Falsifiability in Correcting Errors and Enhancing Understanding

Falsifiability allows for the correction of errors and the advancement of knowledge, as seen in the discovery of black swans in Australia

Vuoi creare mappe dal tuo materiale?

Inserisci un testo, carica una foto o un audio su Algor. In pochi secondi Algorino lo trasformerà per te in mappa concettuale, riassunto e tanto altro!

Impara con le flashcards di Algor Education

Clicca sulla singola scheda per saperne di più sull'argomento.

falsifiability hypothesis definition psychology

The principle of ______ is crucial in psychology to separate scientifically grounded theories from those based on ______ or non-empirical claims.

falsifiability subjective beliefs

falsifiability hypothesis definition psychology

Definition of research paradigms in psychology

Fundamental perspectives shaping study and understanding of psychological phenomena, including assumptions, theories, and methods.

falsifiability hypothesis definition psychology

Influence on research paradigms

Intellectual climate and scientific evidence impact paradigm evolution and shifts in psychology.

falsifiability hypothesis definition psychology

Kuhn's theory relevance to psychology

Describes how anomalies in current paradigms can lead to scientific revolutions, resulting in new frameworks for psychological science.

falsifiability hypothesis definition psychology

______ psychologists emphasize the importance of personal choice and ______, contrasting with biological psychologists who look at the influence of ______ and ______.

Humanistic free will genetics physiology

While ______ theorists delve into the unseen forces shaping our actions, ______ psychologists focus on how we think and perceive the world. All schools in psychology, however, uphold the principle of ______ for their theories.

Psychodynamic cognitive falsifiability

Importance of falsifiability in psychology

Falsifiability allows empirical testing and validation or refutation of hypotheses in psychology.

Role of empirical investigation in psychology

Empirical investigation uses observation and experiment to test psychological theories.

Systematic desensitization in behaviorism

Behaviorist approach to extinguish phobias through gradual exposure and relaxation techniques.

The identification of ______ swans in ______ disproved the belief that all swans are white, demonstrating the power of ______ to amend mistakes and improve comprehension.

black Australia falsifiability

Subjective nature of human experience impact

Human experience varies greatly, making it difficult to establish universal psychological truths.

Role of multiple psychological perspectives

Different theories offer diverse views on behavior and mental processes, suggesting varied truths.

Falsification criteria in psychological theories

Debate exists on how many observations are needed to disprove a theory, highlighting research complexity.

Ecco un elenco delle domande più frequenti su questo argomento

What is the principle of falsifiability and who introduced it, how do research paradigms affect psychological science, how do different psychological schools view truth, can you give examples of falsifiable hypotheses in different areas of psychology, what role does deductive reasoning play in the principle of falsifiability, why is achieving absolute truth challenging in psychological research, contenuti simili, esplora altre mappe su argomenti simili.

Child reaching for caregiver's hand in a tranquil park with a lush tree, empty bench, and blue ball on a sunny day, symbolizing nurturing relationships.

The Role of Attachment Figures in Child Development

Dimly lit prison interior with a row of three open-barred cells, each with a plain cot, and a solitary wooden chair facing them, casting long shadows on the polished stone floor.

The BBC Prison Study: Understanding Social Identity and Power Dynamics

Infant in yellow onesie reaching for adult hands on white blanket, with a blurred wooden toy in the background.

Attachment in Developmental Psychology

Organized library with a glossy wooden table featuring an open book and a beaker of liquid, surrounded by colorful bookshelves and a vibrant potted plant.

Role of Classification Systems

Caregiver cradling a content infant in a cozy nursery, with warm lighting enhancing their affectionate exchange and peaceful surroundings.

Caregiver-Infant Interactions and Attachment Development

Study desk with open textbook, colorful highlighters in a glass, potted plant, digital tablet, and eyeglasses on a wooden surface in a softly lit room.

Memory and its Processes

Neatly organized psychiatrist's office with a blue chair, black leather office chair, wooden side table with green leaves in a vase, and a bookshelf against a light wall.

Overview of Schizophrenia

falsifiability hypothesis definition psychology

Non trovi quello che cercavi?

Cerca un argomento inserendo una frase o una parola chiave

The Principle of Falsifiability in Psychological Science

Laboratory scene with a beaker of blue liquid, mortar and pestle, textbooks, stopwatch, and a scientist analyzing data on a computer.

Research Paradigms and Their Influence in Psychology

Perspectives on truth in psychological theories, crafting falsifiable hypotheses in diverse psychological domains, deductive reasoning and its role in falsifiability, the elusive nature of absolute truth in psychological research.

Modifica disponibile

Falsificationism is not just ‘potential’ falsifiability, but requires ‘actual’ falsification: Social psychology, critical rationalism, and progress in science

  • September 2017
  • Journal for the Theory of Social Behaviour 47(3)

Peter Holtz at Leibniz-Institut für Wissensmedien

  • Leibniz-Institut für Wissensmedien

Peter Monnerjahn at Freie Universität Berlin

  • Freie Universität Berlin

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations

Meral Gezici Yalçın

  • Federico Milicich

Lara Rozas

  • Fernando Adrián Nuñez D'Agostino
  • Gerald J. Haeffel

Jill G. Morawski

  • William O’Donohue

Joachim Krueger

  • Maarten Derksen

Russell Searight

  • J THEOR SOC BEHAV

Dario Krpan

  • Pierre Maurice Marie Duhem
  • Joseph Agassi
  • Paul K. Feyerabend
  • Francis Bacon

Norman I. Platnick

  • Karl R. Popper

Joseph Agassi

  • Open Science Collaboration

Wolfgang Stroebe

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

The Discovery of the Falsifiability Principle

  • First Online: 01 January 2023

Cite this chapter

falsifiability hypothesis definition psychology

  • Friedel Weinert 2  

Part of the book series: Springer Biographies ((SPRINGERBIOGS))

289 Accesses

Popper is most famous for his principle of falsifiability . It is striking that, throughout his career, he used three terms synonymously: falsifiability , refutability and testability . In order to appreciate the importance of these criteria it is helpful to understand how he arrived at these notions, whether they can be used interchangeably and whether scientists find this terminology helpful.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

In a letter (30/11/32) to the publisher Paul Buske, Popper mentioned that J. Kraft had proposed two alternative titles: either ‘The Philosophical Preconditions of Natural Science’ or ‘The Problem of Natural Laws’ [Hansen 3.2; my translation]. Buske was one of the publishers on whom Popper pinned his hopes. Hacohen (2000): Chap. 6 provides a detailed account of the tortuous path of Popper’s manuscript to its publication as Logik der Forschung . See also Autobiography (1974): 67.

Gomperz realized that Popper’s book criticized the Vienna Circle, as he wrote to Popper (27/12/32). In a reference letter (21/12/32) to the publisher Paul Siebeck (of J. C. B. Mohr), Gomperz praised Popper’s book for propounding, in clear language, a ‘methodology of scientific knowledge’, which remained close to the ‘procedure of the mathematical natural sciences’ and differed essentially from that of the Vienna Circle [Hansen 3.2; my translation].

Walter Schiff, Popper’s maternal uncle, taught economics and statistics at the University of Vienna.

Schlick was murdered by a former student on 22 June, 1936, as he was leaving the university. In an undated handwritten note ‘In Honour of Moritz Schlick’ Popper conveyed the general impression at the time that he had been murdered by a Nazi [252.01], which is probably true.

In 1977, Stachel became the first editor of the Einstein Papers Project, then based at Boston University.

See, for instance, his Outline of Psychoanalysis (1938) and my discussion in Copernicus , Darwin and Freud (2009: Chap. 3).

The others were the perihelion advance of Mercury and the redshift of light in gravitational fields. In 1964, Irwin I. Shapiro proposed a fourth classic test: the time delay of electromagnetic radiation (such as radar signals) passing the Sun. Gravitational fields also have an effect on the ticking of clocks: a clock in a weak gravitational field runs faster than a clock in a strong gravitational field. In recent years, satellite-based tests have ‘confirmed’ (or in Popper’s terminology, ‘corroborated’) the results of the classic tests.

This logical rule states that if in a conditional sentence: ‘If p, then q’, the consequent q does not hold, then the antecedent p must be negated. So we infer from non-q to non-p. If p stands for a theory and q stands for, say, a prediction, then the falsity of the prediction implies the falsity of the theory.

See Logic 1980: §§3, 22; Realism/Aim 1985: xxii; Alles Leben 1996: 26; All Life 1999: 10; cf. Corvi 1997: Pt. II. In the Introduction to Grundprobleme (1979: XXXVI, 2009: XXXV; cf. C&R 1963: 228) Popper rejected the term ‘falsificationism’ because it conflated ‘falsification’ and ‘falsifiabiliy’. He preferred the term ‘fallibilism’.

Popper dealt with such a situation in an article in Nature (1940). He discusses three interpretations of nebular red shifts: ‘The three theories are logically equivalent, and therefore do not describe alternative facts , but the same facts in alternative languages .’ (‘Interpretation’ 1940: 69–70; italics in original) (He would write further articles in Nature on the arrow of time in the 1950s and 1960s.)

See K. Popper, ‘On theories as nets’, New Scientist (1982, 319–320). Popper repeatedly used this image of theories as nets, starting in Grundprobleme (1979: 487, 2009: 492). ‘We try to examine the world exhaustively by our nets; but its mesh will always let some small fish escape: there will always be enough play for indeterminism.’ (Popper, Open Universe 1982: 47)

Popper’s concern with probability in Logik later led to his well-known propensity interpretation of probability.

This is not just an issue of terminology. The German sociologist Ulrich Beck uses Popper’s criterion of ‘practical fallibilism’ as an element in his theory of the ‘risk society’, because it undermines the traditional image of science, which Popper himself rejected. (Beck 1992: Pt. III, Chap. 7)

On the question of proliferation of hypotheses, David Miller told me that ‘he (Popper) had learnt from his geologist colleague Bob Allan in NZ about Chamberlin's paper ‘The Method of Multiple Working Hypotheses’, which was published in the Journal of Geology ( 5 1897: 837–48, and reprinted in Science in 1965 http://science.sciencemag.org/content/148/3671/754 ). Jeremy Shearmur procured him a copy [349.13].

I understand the difference between alternative and rival theories as that between alternative versions of the same theory, which agree on first principles, and conflicting theories, which disagree on first principles.

Popper frequently stressed the importance of a dogmatic phase, not only in his publications— Autobiography 1974: §§10, 16; ‘Replies’ 1974: 984; Myth 1994: 16; Alles Leben 1996: 121; All Life 1999: 41; Realism/Aim 1983/1985: Introduction 1982: xxii—but also in his correspondence. In a letter to the American physicist and philosopher Abner Shimony (01/02/70), whom he met at Brandeis, he emphasized that, against the slogan of verification, he had to stress the ‘virtues of testing’. He added that “dogmatic thinking” and the defence of a theory against criticism are needed, if we wish to come to a sound appreciation of the value of a theory: if we give in too easily, we shall never find out what is the strength of the theory, and what deserves preservation’. Not happy with Popper’s version of fallibilism, Shimony hoped to persuade him of the power of scientific inference [350.07].

Some of the leading proponents of string theory also embrace the Anthropic Principle. (Susskind 2006: 197) It does not just claim that the world is the way it is because we are here. No, the Anthropic Principle serves to explain the fine-tuning of the constants of nature, without which (intelligent) life would be impossible.

Joseph J. Thomson proposed the ‘plum-pudding’ model in 1904, after his discovery of the electron (1897). The negatively charged electrons were embedded in a positively charged volume, but there was no nucleus. It was replaced by Rutherford’s nucleus model. For more on these models see my book The Scientist as Philosopher (2004) and my articles ‘The Structure of Atom Models’ (2000) and ‘The Role of Probability Arguments in the History of Science’ (2010).

Bondi is famous for his contribution to cosmology. He rejected the Big Bang theory and proposed, in cooperation with Fred Hoyle and Thomas Gold, the alternative steady-state model. Fred Hoyle’s biographer Simon Mitton, of Cambridge University, told me in a private email (06/03/2020) that Hoyle never mentioned Popper. Popper dismissed the Big Bang theory as ‘unimportant’ ( Offene Gesellschaft 1986: 48–50), even as ‘metaphysical’. ( Zukunft 4 1990: 69–70)

For instance the great American physicist Richard Feynman who held that science is not certain, that it starts with ‘guesses’ whose consequences must be compared to experience.

In our conversation at the LSE John Worrall sounded a note of caution with reference to Peter Medawar and Paul Nurse: ‘well, quite honestly, I don’t know whether you really need to read Popper to know pretty soon when you are doing your scientific work that you are not inductively generalizing data, that you do make hypotheses, that you do need to check that these hypotheses are true or not’. But he agreed that ‘far and away more than any other philosopher he does seem to have been generally influential. And generally regarded as a significant figure, more outside the field than within the field, I think’.

Equate Newton’s second law of motion and his law of gravitation: mg = \(G\frac{m{M}_{E}}{{r}^{2}}\) and solve for M E . Here g is the acceleration near the surface of the earth, r is the radius between the centres of the two bodies and G is the gravitational constant.

Winzer (2019); cf. Kneale’s example of Anderson’s discovery of the positron. Kneale (1974: 206–208). Settle (1974: 701–702) discusses some further examples of ‘non-Popperian’ progress in science.

Note that national or racial prejudices are based on inductive steps: from our experience with some people of a nation or a race to all people of that nation or race.

Note that Newton’s theory does not require that all planets rotate from west to east. In our solar system both Venus and Pluto spin from east to west. So, the east-bound spin of most planets in the solar system could not be a universal, all-inclusive law.

According to Hacohen (2000: 133–134, 144), he accepted the method of induction in his psychological work until 1929. As he wrote to John Stachel it was not until then that he realized the close link between induction and demarcation.

John Norton, of the University of Pittsburgh, has recently proposed a richly illustrated material theory of induction, according to which inductive inferences (both enumerative and eliminative) are legitimate as long as they occur on a ‘case-by-case’ basis. Norton (2021: v–viii; 4–8) claims that ‘all induction is local’ and that ‘no universal rules of induction’ exist. Particular inferences are warranted by ‘background facts in some domain’ which ‘tell us what are good and bad inductive inferences in that domain’.

Several articles in O’Hear ed. (1995), for instance by Newton-Smith and Lipton, elaborate on these inductive elements. There are, therefore, in Popper’s account inductive assumptions. One of the authors who pointed out that ‘falsificationism’ requires inductive assumptions, was my former colleague Anthony O’Hear (1980). Popper complained to him that he did not like his book, (although he admits that his own account contains a ‘whiff of verificationism’). Anthony told me in an email (28/06/20): ‘He (Popper) added that I was “product of the modern education”—by which he meant that I was a follower of Moore and Wittgenstein. But perhaps things were not quite as abrasive as it might have appeared at the time (1980). I found out a lot later that he had told a friend of mine that he (the friend) ought to read my book. He (Popper) did not like it, but it was a serious book, or words to that effect’. Miller (1994: Chap. 2) lists a number of such inductive elements and attempts to eliminate them from Popper’s account.

In his work on political philosophy he condemned the dogmatism, which he detected at work in Plato, Hegel and Marx.

Popper was prone to exaggerations: induction does not exist, a large part of the knowledge of organisms is inborn, all tests boil down to attempted falsifications or everything is a propensity.

In his later work he regarded the notion of verisimilitude (or truthlikeness ) as a more realistic aim of science. ( Objective Knowledge 1972: 57–58) In a panel discussion in the 1980s, he rejected the view, attributed to him, that ‘theories are never true’. ‘This is nonsense. Scientific theories are the ones, which have survived the elimination process’ ( Zukunft 4 1990: 101; my translation).

The theories themselves may be generated from conjectures, intuition or inductive generalization.

Now Appendix *ix of his Logic of Scientific Discovery. Popper ( Myth 1994: 86–87) acknowledges that Bacon was aware of the defect of simple induction by enumeration.

Author information

Authors and affiliations.

Faculty of Social Sciences, University of Bradford, Bradford, UK

Friedel Weinert

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Friedel Weinert .

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this chapter

Weinert, F. (2022). The Discovery of the Falsifiability Principle. In: Karl Popper. Springer Biographies. Springer, Cham. https://doi.org/10.1007/978-3-031-15424-9_3

Download citation

DOI : https://doi.org/10.1007/978-3-031-15424-9_3

Published : 01 January 2023

Publisher Name : Springer, Cham

Print ISBN : 978-3-031-15423-2

Online ISBN : 978-3-031-15424-9

eBook Packages : Religion and Philosophy Philosophy and Religion (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

falsifiability hypothesis definition psychology

Reference Library

Collections

  • See what's new
  • All Resources
  • Student Resources
  • Assessment Resources
  • Teaching Resources
  • CPD Courses
  • Livestreams

Study notes, videos, interactive activities and more!

Psychology news, insights and enrichment

Currated collections of free resources

Browse resources by topic

  • All Psychology Resources

Resource Selections

Currated lists of resources

Falsifiability

Falsifiability is an important feature of science. It is the principle that a proposition or theory could only be considered scientific if in principle it was possible to establish it as false. One of the criticisms of some branches of psychology, e.g. Freud’s theory, is that they lack falsifiability.

  • Share on Facebook
  • Share on Twitter
  • Share by Email

Example Answers for Research Methods: A Level Psychology, Paper 2, June 2019 (AQA)

Exam Support

Research Methods: MCQ Revision Test 1 for AQA A Level Psychology

Topic Videos

Example Answers for Research Methods: A Level Psychology, Paper 2, June 2018 (AQA)

A level psychology topic quiz - research methods.

Quizzes & Activities

Our subjects

  • › Criminology
  • › Economics
  • › Geography
  • › Health & Social Care
  • › Psychology
  • › Sociology
  • › Teaching & learning resources
  • › Student revision workshops
  • › Online student courses
  • › CPD for teachers
  • › Livestreams
  • › Teaching jobs

Boston House, 214 High Street, Boston Spa, West Yorkshire, LS23 6AD Tel: 01937 848885

  • › Contact us
  • › Terms of use
  • › Privacy & cookies

© 2002-2024 Tutor2u Limited. Company Reg no: 04489574. VAT reg no 816865400.

  • Skip to main content
  • Skip to primary sidebar

IResearchNet

Falsification

Falsification definition.

One cannot prove whether a theory or hypothesis is true. One can only prove that it is false, a process called falsification. Falsification is a tool that distinguishes scientific social psychology from folk social psychology, which does not use the process of falsification.

Falsification History and Theory

Falsification

A scientific theory consists of several statements that are linked together in a logical manner. If the statements are proven false, then it becomes unreasonable to support the theory any longer. Therefore, of the old (falsified) theory is replaced by a newer (unfalsified) theory. Some researchers try to avoid the falsification of their theory by adding further statements, which account for the anomaly.

For Popper, the falsifiability of a theory is a criterion to distinguish science from nonscience. Consequently, researchers can never finally prove that their scientific theories are true; they can only confirm or disprove them. Each time a theory survives an attempt to falsify it, it becomes a more believable theory. To advance the science, one has to replace the falsified theories with new theories. These new theories should account for the phenomena previously falsified.

Falsification Criticisms and Modern Application in Social Sciences

Several philosophers and various researchers have criticized the falsification principle. In social sciences, where tests are very sensitive, many observations may be argued to be fallible and wrong. Hence, it is easy to make an argument against the falsification of a theory, by referring to observational errors.

In contrast to Popper, some philosophers see the development of additional statements that defend the old theory as a natural process. Other scholars later reformulated the falsification principle. Some argued that the shift from one theory to another could not be performed due to falsification of the many statements of a theory, but that a whole change of the paradigm was needed among the scientists, who share ideas about the same theory.

Falsification has been widely used in the social psychology. Current social science is multiparadigmatic. Generating several hypotheses on the same phenomenon is seen as additional help for researchers to overcome the subjective resistance of rejecting their theory.

References:

  • Ellsworth, P. C. (2004). Clapping with both hands: Numbers, people, and simultaneous hypotheses. In J. T. Jost, M. R. Banaji, & D. A. Prentice (Eds.), Perspectivism in social psychology: The yin and yang of scientific progress (pp. 261-274). Washington, DC: American Psychological Association.
  • Lakatos, I., & Musgrove, A. (Eds.). (1970). Criticism and the growth of knowledge. Cambridge, UK: Cambridge University Press.
  • Popper, K. R. (1959). The logic of scientific discovery. New York: Science Editions.
  • Social Psychology Research Methods

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Psychol

Replication, falsification, and the crisis of confidence in social psychology

Brian d. earp.

1 Uehiro Centre for Practical Ethics, University of Oxford, Oxford, UK

2 Department of History and Philosophy of Science, University of Cambridge, Cambridge, UK

David Trafimow

3 Department of Psychology, New Mexico State University, Las Cruces, NM, USA

The (latest) crisis in confidence in social psychology has generated much heated discussion about the importance of replication, including how it should be carried out as well as interpreted by scholars in the field. For example, what does it mean if a replication attempt “fails”—does it mean that the original results, or the theory that predicted them, have been falsified? And how should “failed” replications affect our belief in the validity of the original research? In this paper, we consider the replication debate from a historical and philosophical perspective, and provide a conceptual analysis of both replication and falsification as they pertain to this important discussion. Along the way, we highlight the importance of auxiliary assumptions (for both testing theories and attempting replications), and introduce a Bayesian framework for assessing “failed” replications in terms of how they should affect our confidence in original findings.

“Only when certain events recur in accordance with rules or regularities, as in the case of repeatable experiments, can our observations be tested—in principle—by anyone.… Only by such repetition can we convince ourselves that we are not dealing with a mere isolated ‘coincidence,’ but with events which, on account of their regularity and reproducibility, are in principle inter-subjectively testable.” – Karl Popper (1959, p. 45)

Introduction

Scientists pay lip-service to the importance of replication. It is the “coin of the scientific realm” (Loscalzo, 2012 , p. 1211); “one of the central issues in any empirical science” (Schmidt, 2009 , p. 90); or even the “demarcation criterion between science and nonscience” (Braude, 1979 , p. 2). Similar declarations have been made about falsifiability , the “demarcation criterion” proposed by Popper in his seminal work of 1959 (see epigraph). As we will discuss below, the concepts are closely related—and also frequently misunderstood. Nevertheless, their regular invocation suggests a widespread if vague allegiance to Popperian ideals among contemporary scientists, working from a range of different disciplines (Jordan, 2004 ; Jost, 2013 ). The cosmologist Hermann Bondi once put it this way: “There is no more to science than its method, and there is no more to its method than what Popper has said” (quoted in Magee, 1973 , p. 2).

Experimental social psychologists have fallen in line. Perhaps in part to bolster our sense of identity with the natural sciences (Danzinger, 1997 ), we psychologists have been especially keen to talk about replication. We want to trade in the “coin” of the realm. As Billig ( 2013 ) notes, psychologists “cling fast to the belief that the route to knowledge is through the accumulation of [replicable] experimental findings” (p. 179). The connection to Popper is often made explicit. One recent example comes from Kepes and McDaniel ( 2013 ), from the field of industrial-organizational psychology: “The lack of exact replication studies [in our field] prevents the opportunity to disconfirm research results and thus to falsify [contested] theories” (p. 257). They cite The Logic of Scientific Discovery .

There are problems here. First, there is the “ lack ” of replication noted in the quote from Kepes and McDaniel. If replication is so important, why isn't it being done? This question has become a source of crisis-level anxiety among psychologists in recent years, as we explore in a later section. The anxiety is due to a disconnect: between what is seen as being necessary for scientific credibility—i.e., careful replication of findings based on precisely-stated theories—and what appears to be characteristic of the field in practice (Nosek et al., 2012 ). Part of the problem is the lack of prestige associated with carrying out replications (Smith, 1970 ). To put it simply, few would want to be seen by their peers as merely “copying” another's work (e.g., Mulkay and Gilbert, 1986 ); and few could afford to be seen in this way by tenure committees or by the funding bodies that sponsor their research. Thus, while “a field that replicates its work is [seen as] rigorous and scientifically sound”—according to Makel et al. ( 2012 )—psychologists who actually conduct those replications “are looked down on as bricklayers and not [as] advancing [scientific] knowledge” (p. 537). In consequence, actual replication attempts are rare.

A second problem is with the reliance on Popper—or, at any rate, a first-pass reading of Popper that seems to be uninformed by subsequent debates in the philosophy of science. Indeed, as critics of Popper have noted, since the 1960s and consistently thereafter, neither his notion of falsification nor his account of experimental replicability seem strictly amenable to being put into practice (e.g., Mulkay and Gilbert, 1981 ; see also Earp, 2011 )—at least not without considerable ambiguity and confusion. What is more, they may not even be fully coherent as stand-alone “abstract” theories, as has been repeatedly noted as well (cf. Cross, 1982 ).

The arguments here are familiar. Let us suppose that—at the risk of being accused of laying down bricks—Researcher B sets up an experiment to try to “replicate” a controversial finding that has been reported by Researcher A. She follows the original methods section as closely as she can (assuming that this has been published in detail; or even better, she simply asks Researcher A for precise instructions). She calibrates her equipment. She prepares the samples and materials just so. And she collects and then analyzes the data. If she gets a different result from what was reported by Researcher A—what follows? Has she “falsified” the other lab's theory? Has she even shown the original result to be erroneous in some way?

The answer to both of these questions, as we will demonstrate in some detail below, is “no.” Perhaps Researcher B made a mistake (see Trafimow, 2014 ). Perhaps the other lab did. Perhaps one of B's research assistants wrote down the wrong number. Perhaps the original effect is a genuine effect, but can only be obtained under specific conditions—and we just don't know yet what they are (Cesario, 2014 ). Perhaps it relies on “tacit” (Polanyi, 1962 ) or “unofficial” (Westen, 1988 ) experimental knowledge that can only be acquired over the course of several years, and perhaps Researcher B has not yet acquired this knowledge (Collins, 1975 ).

Or perhaps the original effect is not a genuine effect, but Researcher A's theory can actually accommodate this fact. Perhaps Researcher A can abandon some auxiliary hypothesis, or take on board another, or re-formulate a previously unacknowledged background assumption—or whatever (cf. Lakatos, 1970 ; Cross, 1982 ; Folger, 1989 ). As Lakatos ( 1970 ) once put it: “given sufficient imagination, any theory… can be permanently saved from ‘refutation’ by some suitable adjustment in the background knowledge in which it is embedded” (p. 184). We will discuss some of these potential “adjustments” below. The upshot, however, is that we simply do not know, and cannot know, exactly what the implications of a given replication attempt are, no matter which way the data come out. There are no critical tests of theories; and there are no objectively decisive replications.

Popper ( 1959 ) was not blind to this problem. “In point of fact,” he wrote, in an under-appreciated passage of his famous book, “no conclusive disproof of a theory can ever be produced, for it is always possible to say that the experimental results are not reliable, or that the discrepancies which are asserted to exist between the experimental results and the theory are only apparent” (p. 50, emphasis added). Hence as Mulkay and Gilbert ( 1981 ) explain:

… in relation to [actual] scientific practice, one can only talk of positive and negative results, and not of proof or disproof. Negative results, that is, results which seem inconsistent with a given hypothesis [or with a putative finding from a previous experiment], may incline a scientist to abandon [the] hypothesis but they will never require him to abandon it… Whether or not he does so may depend on the amount and quality of positive evidence, on his confidence in his own and others' experimental skills and on his ability to conceive of alternative interpretations of the negative findings. (p. 391)

Drawing hard and fast conclusions, therefore, about “negative” results—such as those that may be produced by a “failed” replication attempt—is much more difficult than Kepes and McDaniel seem to imagine (see e.g., Chow, 1988 for similarly problematic arguments). This difficulty may be especially acute in the field of psychology. As Folger ( 1989 ) notes, “Popper himself believed that too many theories, particularly in the social sciences , were constructed so loosely that they could be stretched to fit any conceivable set of experimental results, making them… devoid of testable content” (p. 156, emphasis added). Furthermore, as Collins ( 1985 ) has argued, the less secure a field's foundational theories—and especially at the field's “frontier”—the more room there is for disagreement about what should “count” as a proper replication 1 .

Related to this problem is that it can be difficult to know in what specific sense a replication study should be considered to be “the same” as the original (e.g., Van IJzendoorn, 1994 ). Consider that the goal for these kinds of studies is to rule out flukes and other types of error. Thus, we want to be able to say that the same experiment , if repeated one more time, would produce the same result as was originally observed. But an original study and a replication study cannot, by definition, be identical—at the very least, some time will have passed and the participants will all be new 2 —and if we don't yet know which differences are theory-relevant, we won't be able to control for their effects. The problem with a field like psychology, whose theoretical predictions are often “constructed so loosely,” as noted above, is precisely that we do not know—or at least, we do not in a large number of cases—which differences are in fact relevant to the theory.

Finally, human behavior is notoriously complex. We are not like billiard balls, or beavers, or planets, or paramecia (i.e., relatively simple objects or organisms). This means that we should expect our behavioral responses to vary across a “wide range of moderating individual difference and experimental context variables” (Cesario, 2014 , p. 41)—many of which are not yet known, and some of which may be difficult or even impossible to uncover (Meehl, 1990a ). Thus, in the absence of “well-developed theories for specifying such [moderating] variables, the conclusions of replication failures will be ambiguous” (Cesario, 2014 , p. 41; see also Meehl, 1978 ).

Summing up the problem

Hence we have two major points to consider. First, due to a lack of adequate incentives in the reward structure of professional science (e.g., Nosek and the Open Science Collaboration, 2012 ), actual replication attempts are rarely carried out. Second, to the extent that they are carried out, it can be well-nigh impossible to say conclusively what they mean, whether they are “successful” (i.e., showing similar, or apparently similar, results to the original experiment) or “unsuccessful” (i.e., showing different, or apparently different, results to the original experiment). Thus, Collins ( 1985 ) came to the conclusion that, in physics at least, disputes over contested findings are likelier to be resolved by social and reputational negotiations —over, e.g., who should be considered a competent experimenter—than by any “objective” consideration of the experiments themselves. Meehl ( 1990b ) drew a similar conclusion about the field of social psychology, although he identified sheer boredom (rather than social/reputational negotiation) as the alternative to decisive experimentation:

… theories in the “soft areas” of psychology have a tendency to go through periods of initial enthusiasm leading to large amounts of empirical investigation with ambiguous over-all results. This period of infatuation is followed by various kinds of amendment and the proliferation of ad hoc hypotheses. Finally, in the long run, experimenters lose interest rather than deliberately discard a theory as clearly falsified. (p. 196)

So how shall we take stock of what has been said? A cynical reader might conclude that—far from being a “demarcation criterion between science and nonscience”—replication is actually closer to being a waste of time. Indeed, if even replications in physics are sometimes not conclusive, as Collins ( 1975 , 1981 , 1985 ) has convincingly shown, then what hope is there for replications in psychology?

Our answer is simply as follows. Replications do not need to be “conclusive” in order to be informative . In this paper, we highlight some of the ways in which replication attempts can be more, rather than less, informative, and we discuss—using a Bayesian framework—how they can reasonably affect a researcher's confidence in the validity of an original finding. The same is true of “falsification.” Whilst a scientist should not simply abandon her favorite theory on account of a single (apparently) contradictory result—as Popper himself was careful to point out 3 (1959, pp. 66–67; see also Earp, 2011 )—she might reasonably be open to doubt it, given enough disconfirmatory evidence, and assuming that she had stated the theory precisely. Rather than being a “waste of time,” therefore, experimental replication of one's own and others' findings can be a useful tool for restoring confidence in the reliability of basic effects—provided that certain conditions are met. The work of the latter part of this essay is to describe and to justify at least a few of those essential conditions. In this context, we draw a distinction between “conceptual” or “reproductive” replications (cf. Cartwright, 1991 )—which may conceivably be used to bolster confidence in a particular theory —and “direct” or “close” replications, which may be used to bolster confidence in a finding (Schmidt, 2009 ; see also Earp et al., 2014 ). Since it is doubt about the findings that seems to have prompted the recent “crisis” in social psychology, it is the latter that will be our focus. But first we must introduce the crisis.

The (latest) crisis in social psychology and calls for replication

“Is there currently a crisis of confidence in psychological science reflecting an unprecedented level of doubt among practitioners about the reliability of research findings in the field? It would certainly appear that there is.” So write Pashler and Wagenmakers ( 2012 , p. 529) in a recent issue of Perspectives on Psychological Science . The “crisis” is not unique to psychology; it is rippling through biomedicine and other fields as well (Ioannidis, 2005 ; Loscalzo, 2012 ; Earp and Darby, 2015 )—but psychology will be the focus of this paper, if for no other reason than that the present authors have been closer to the facts on the ground.

Some of the causes of the crisis are fairly well known. In 2011, an eminent Dutch researcher confessed to making up data and experiments, producing a résumé-full of “findings” that he had simply invented out of whole cloth (Carey, 2011 ). He was outed by his own students, however, and not by peer review nor by any attempt to replicate his work. In other words, he might just as well have not been found out, had he only been a little more careful (Stroebe et al., 2012 ). An unsettling prospect was thus aroused: Could other fraudulent “findings” be circulating—undetected, and perhaps even undetectable—throughout the published record? After an exhaustive analysis of the Dutch fraud case, Stroebe et al. ( 2012 ) concluded that the notion of self-correction in science was actually a “myth” (p. 670); and others have offered similar pronouncements (Ioannidis, 2012a ).

But fraud, it is hoped, is rare. Nevertheless, as Ioannidis ( 2005 , 2012a ) and others have argued, the line between explicitly fraudulent behavior and merely “questionable” research practices is perilously thin, and the latter are probably common. John et al. ( 2012 ) conducted a massive, anonymous survey of practicing psychologists and showed that this conjecture is likely correct. Psychologists admitted to such questionable research practices as failing to report all of the dependent measures for which they had collected data (78%) 4 , collecting additional data after checking to see whether preliminary results were statistically significant (72%), selectively reporting studies that “worked” (67%), claiming to have predicted an unexpected finding (54%), and failing to report all of the conditions that they ran (42%). Each of these practices alone, and even more so when combined, reduces the interpretability of the final reported statistics, casting doubt upon any claimed “effects” (e.g., Simmons et al., 2011 ).

The motivation behind these practices, though not necessarily conscious or deliberate, is also not obscure. Professional journals have long had a tendency to publish only or primarily novel, “statistically significant” effects, to the exclusion of replications—and especially “failed” replications—or other null results. This problem, known as “publication bias,” leads to a file-drawer effect whereby “negative” experimental outcomes are simply “filed away” in a researcher's bottom drawer, rather than written up and submitted for publication (e.g., Rosenthal, 1979 ). Meanwhile, the “questionable research practices” carry on in full force, since they increase the researcher's chances of obtaining a “statistically significant” finding—whether it turns out to be reliable or not.

To add insult to injury, in 2012, an acrimonious public skirmish broke out in the form of dueling blog posts between the distinguished author of a classic behavioral priming study 5 and a team of researchers who had questioned his findings (Yong, 2012 ). The disputed results had already been cited more than 2000 times—an extremely large number for the field—and even been enshrined in introductory textbooks. What if they did turn out to be a fluke? Should other “priming studies” be double-checked as well? Coverage of the debate ensued in the mainstream media (e.g., Bartlett, 2013 ).

Another triggering event resulted in “widespread public mockery” (Pashler and Wagenmakers, 2012 , p. 528). In contrast to the fraud case described above, which involved intentional, unblushing deception, the psychologist Daryl Bem relied on well-established and widely-followed research and reporting practices to generate an apparently fantastic result, namely evidence that participants' current responses could be influenced by future events (Bem, 2011 ). Since such paranormal precognition is inconsistent with widely-held theories about “the fundamental nature of time and causality” (Lebel and Peters, p. 371), few took the findings seriously. Instead, they began to wonder about the “well-established and widely-followed research and reporting practices” that had sanctioned the findings in the first place (and allowed for their publication in a leading journal). As Simmons et al. ( 2011 ) concluded—reflecting broadly on the state of the discipline—“it is unacceptably easy to publish ‘statistically significant’ evidence consistent with any hypothesis” (p. 1359) 6 .

The main culprit for this phenomenon is what Simmons et al. ( 2011 ) identified as researcher degrees of freedom :

In the course of collecting and analyzing data, researchers have many decisions to make: Should more data be collected? Should some observations be excluded? Which conditions should be combined and which ones compared? Which control variables should be considered? Should specific measures be combined or transformed or both?… It is rare, and sometimes impractical, for researchers to make all these decisions beforehand. Rather, it is common (and accepted practice) for researchers to explore various analytic alternatives, to search for a combination that yields “statistical significance” and to then report only what “worked.” (p. 1359)

One unfortunate consequence of such a strategy—involving, as it does, some of the very same questionable research practices later identified by John et al. ( 2012 ) in their survey of psychologists—is that it inflates the possibility of producing a false positive (or a Type 1 error). Since such practices are “common” and even “accepted,” the literature may be replete with erroneous results. Thus, as Ioannidis ( 2005 ) declared after performing a similar analysis in his own field of biomedicine, “ most published research findings” may be “false” (p. 0696, emphasis added). This has led to the “unprecedented level of doubt” referred to by Pashler and Wagenmakers ( 2012 ) in the opening quote to this section.

This not the first crisis for psychology. Giner-Sorolla ( 2012 ) points out that “crises” of one sort or another “have been declared regularly at least since the time of Wilhelm Wundt”—with turmoil as recent as the 1970s inspiring particular déjà vu (p. 563). Then, as now, a string of embarrassing events—including the publication in mainstream journals of literally unbelievable findings 7 —led to “soul searching” amongst leading practitioners. Standard experimental methods, statistical strategies, reporting requirements, and norms of peer review were all put under the microscope; numerous sources of bias were carefully rooted out (e.g., Greenwald, 1975 ). While various calls for reform were put forward—some more energetically than others—a single corrective strategy seemed to emerge from all the din: the need for psychologists to replicate their work . Since “ all flawed research practices yield findings that cannot be reproduced,” critics reasoned, replication could be used to separate the wheat from the chaff (Koole and Lakens, 2012 , p. 608, emphasis added; see also Elms, 1975 ).

The same calls reverberate today. “For psychology to truly adhere to the principles of science,” write Ferguson and Heene ( 2012 ), “the need for replication of research results [is] important… to consider” (p. 556). LeBel and Peters ( 2011 ) put it like this: “Across all scientific disciplines, close replication is the gold standard for corroborating the discovery of an empirical phenomenon” and “the importance of this point for psychology has been noted many times” (p. 375). Indeed, “leading researchers [in psychology]” agree, according to Francis ( 2012 ), that “experimental replication is the final arbiter in determining whether effects are true or false” (p. 585).

We have already seen that such calls must be heeded with caution: replication is not straightforward, and the outcome of replication studies may be difficult to interpret. Indeed they can never be conclusive on their own. But we suggested that replications could be more or less informative ; and in the following sections we discuss some strategies for making them “more” rather than “less.” We begin with a discussion of “direct” vs. “conceptual” replication.

Increasing replication informativeness: “direct” vs. “conceptual” replication

In a systematic review of the literature, encompassing multiple academic disciplines, Gómez et al. ( 2010 ) identified 18 different types of replication. Three of these were from Lykken ( 1968 ), who drew a distinction between “literal,” “operational,” and “constructive”—which Schmidt ( 2009 ) then winnowed down (and re-labeled) to arrive at “direct” and “conceptual” in an influential paper. As Makel et al. ( 2012 ) have pointed out, it is Schmidt's particular framework that seems to have crystallized in the field of psychology, shaping most of the subsequent discussion on this issue. We have no particular reason to rock the boat; indeed these categories will suit our argument just fine.

The first step in making a replication informative is to decide what specifically it is for. “Direct” replications and “conceptual” replications are “for” different things; and assigning them their proper role and function will be necessary for resolving the crisis. First, some definitions:

A “direct” replication may be defined as an experiment that is intended to be as similar to the original as possible (Schmidt, 2009 ; Makel et al., 2012 ). This means that along every conceivable dimension—from the equipment and materials used, to the procedure, to the time of day, to the gender of the experimenter, etc.—the replicating scientist should strive to avoid making any kind of change or alteration. The purpose here is to “check” the original results. Some changes will be inevitable, of course; but the point is that only the inevitable changes (such as the passage of time between experiments) are ideally tolerated in this form of replication. In a “conceptual” replication, by contrast, at least certain elements of the original experiment are intentionally altered, (ideally) systematically so, toward the end of achieving a very different sort of purpose—namely to see whether a given phenomenon, assuming that it is reliable, might obtain across a range of variable conditions. But as Doyen et al. ( 2014 ) note in a recent paper:

The problem with conceptual replication in the absence of direct replication is that there is no such thing as a “conceptual failure to replicate.” A failure to find the same “effect” using a different operationalization can be attributed to the differences in method rather than to the fragility of the original effect. Only the successful conceptual replications will be published, and the unsuccessful ones can be dismissed without challenging the underlying foundations of the claim. Consequently, conceptual replication without direct replication is unlikely to change beliefs about the underlying effect (p. 28) .

In simplest terms, therefore, a “direct” replication seeks to validate a particular fact or finding ; whereas a “conceptual” replication seeks to validate the underlying theory or phenomenon —i.e., the theory that has been proposed to “predict” the effect that was obtained by the initial experiment—as well to establish the boundary conditions within which the theory holds true (Nosek et al., 2012 ). The latter is impossible without the former. In other words, if we cannot be sure that our finding is reliable to begin with (because it turns out to have been a coincidence, or else a false alarm due to questionable research practices, publication bias, or fraud), then we are in no position to begin testing the theory by which it is supposedly explained (Cartwright, 1991 ; see also Earp et al., 2014 ).

Of course both types of replication are important, and there is no absolute line between them. Rather, as Asendorpf et al. ( 2013 ) point out, “direct replicability [is] one extreme pole of a continuous dimension extending to broad generalizability [via ‘conceptual’ replication] at the other pole, ranging across multiple, theoretically relevant facets of study design” (p. 139). Collins made a similar point in 1985 (e.g., p. 37). But so long as we remain largely ignorant about exactly which “facets of study design” are “theoretically relevant” to begin with—as is the case with much of current social psychology (Meehl, 1990b ), and nearly all of the most heavily-contested experimental findings—we need to orient our attention more toward the “direct” end of the spectrum 8 .

How else can replication be made more informative ? Brandt et al. ( 2014 )'s “Replication Recipe” offers several important factors, one of which must be highlighted to begin with. This is their contention that a “convincing” replication should be carried out outside the lab of origin . Clearly this requirement shifts away from the “direct” extreme of the replication gradient that we have emphasized so far, but such a change from the original experiment, in this case, is justified. As Ioannidis ( 2012b ) points out, replications by the original researchers—while certainly important and to be encouraged as a preliminary step—are not sufficient to establish “convincing” experimental reliability. This is because allegiance and confirmation biases, which may apply especially to the original team, would be less of an issue for independent replicators.

Partially against this view, Schnall ( 2014 , np) argues that “authors of the original work should be allowed to participate in the process of having their work replicated.” On the one hand, this might have the desirable effect of ensuring that the replication attempt faithfully reproduces the original procedure. It seems reasonable to think that the original author would know more than anyone else about how the original research was conducted—so her viewpoint is likely to be helpful. On the other hand, however, too much input by the original author could compromise the independence of the replication: she might have a strong motivation to make the replication a success, which could subtly influence the results (see Earp and Darby, 2015 ). Whichever position one takes on the appropriate degree of input and/or oversight from the original author, however, Schnall ( 2014 , np) is certainly right to note that “the quality standards for replications need to be at least as high as for the original findings. Competent evaluation by experts is absolutely essential, and is especially important if replication authors have no prior expertise with a given research topic.”

Other ingredients in increasing the informativeness of replication attempts include: (1) carefully defining the effects and methods that the researchers intend to replicate; (2) following as exactly as possible the methods of the original study (as described above); (3) having high statistical power (i.e., an adequate sample size to detect an effect if one is really present); (4) making complete details about the replication available, so that interested experts can fully evaluate the replication attempt (or attempt another replication themselves); and (5) evaluating the replication results, comparing them critically to the results of the study (Brandt et al., 2014 , p. 218, paraphrased). This list is not exhaustive, but it gives a concrete sense of how “stabilizing” procedures (see Radder, 1992 ) can be employed to give greater credence to the quality and informativeness of replication efforts.

Replication, falsification, and auxiliary assumptions

Brandt et al.'s ( 2014 ) “replication recipe” provides a vital tool for researchers seeking to conduct high quality replications. In this section, we offer an additional “ingredient” to the discussion, by highlighting the role of auxiliary assumptions in increasing replication informativeness, specifically as these pertain to the relationship between replication and falsification. Consider the logical fallacy of affirming the consequent that provided an important basis for Popper's falsification argument.

If the theory is true,
        an observation should occur ( → )(Premise 1)
The observation occurs ( )(Premise 2)
Therefore, the theory is true ( )(Conclusion)

Obviously, the conclusion does not follow. Any number of things might have led to the observation that have nothing to do with the theory being proposed (see Earp, 2015 for a similar argument). On the other hand, denying the consequent ( modus tollens ) does invalidate the theory, strictly according to the logic given:

If the theory is true,
        an observation should occur ( → )(Premise 1)
The observation does not occur (~ )(Premise 2)
Therefore, the theory is not true (~ )(Conclusion)

Given this logical asymmetry, then, between affirming and denying the consequent of a theoretical prediction (see Earp and Everett, 2013 ), Popper opted for the latter. By doing so, he famously defended a strategy of disconfirming rather than confirming theories. Yet if the goal is to disconfirm theories, then the theories must be capable of being disconfirmed in the first place; hence, a basic requirement of scientific theories (in order to count as properly scientific) is that they have this feature: they must be falsifiable .

As we hinted at above, however, this basic framework is an oversimplification. As Popper himself noted, and as was made particularly clear by Lakatos ( 1978 ; also see Duhem, 1954 ; Quine, 1980 ), scientists do not derive predictions only from a given theory, but rather from a combination of the theory and auxiliary assumptions . The auxiliary assumptions are not part of the theory proper, but they serve several important functions. One of these functions is to show the link between the sorts of outcomes that a scientist can actually observe (i.e., by running an experiment), and the non-observable, “abstract” content of the theory itself. To pick one classic example from psychology, according to the theory of reasoned action (e.g., Fishbein, 1980 ), attitudes determine subjective norms. One implication of this theoretical assumption is that researchers should be able to obtain strong correlations between attitudes and behavioral intentions. But this assumes, among other things, that a check mark on an attitude scale really indicates a person's attitude, and that a check mark on an intention scale really indicates a person's intention. The theory of reasoned action has nothing to say about whether check marks on scales indicate attitudes or intentions; these are assumptions that are peripheral to the basic theory. They are auxiliary assumptions that researchers use to connect non-observational terms such as “attitude” and “intention” to observable phenomena such as check marks. Fishbein and Ajzen ( 1975 ) recognized this and took great pains to spell out, as well as possible, the auxiliary assumptions that best aid in measuring theoretically relevant variables (see also Ajzen and Fishbein, 1980 ).

The existence of auxiliary assumptions complicates the project of falsification. This is because the major premise of the modus tollens argument—denying the consequent of the theoretical prediction—must be stated somewhat differently. It must be stated like this: “If the theory is true and a set of auxiliary assumptions is true , an observation should occur.” Keeping the second premise the same implies that either the theory is not true or that at least one auxiliary assumption is not true, as the following syllogism (in symbols only) illustrates.

& ( & … ) → (Premise 1)
~ O(Premise 2)
∴ ~ ~ ( & … ) =
  ~ ~ ~ … ~ (Conclusion)

Consider an example. It often is said that Newton's gravitational theory predicted where planets would be at particular times. But this is not precisely accurate. It would be more accurate to say that such predictions were derived from a combination of Newton's theory and auxiliary assumptions not contained in that theory (e.g., about the present locations of the planets). To return to our example about attitudes and intentions from psychology, consider the mini-crisis in social psychology from the 1960s, when it became clear to researchers that attitudes—the kingly construct—failed to predict behaviors. Much of the impetus for the theory of reasoned action (e.g., Fishbein, 1980 ) was Fishbein's realization that there was a problem with attitude measurement at the time: when this problem was fixed, strong attitude-behavior (or at least attitude-intention) correlations became the rule rather than the exception. This episode provides a compelling illustration of a case in which attention to the auxiliary assumptions that bore on actual measurement played a larger role in resolving a crisis in psychology than debates over the theory itself.

What is the lesson here? Due to the fact that failures to obtain a predicted observation can be blamed either on the theory itself or on at least one auxiliary assumption, absolute theory falsification is about as problematic as is absolute theory verification. In the Newton example, when some of Newton's planetary predictions were shown to be wrong, he blamed the failures on incorrect auxiliary assumptions rather than on his theory, arguing that there were additional but unknown astronomical bodies that skewed his findings—which turned out to be a correct defense of the theory. Likewise, in the attitude literature, the theoretical connection between attitudes and behaviors turned out to be correct (as far as we know) with the problem having been caused by incorrect auxiliary assumptions pertaining to attitude measurement.

There is an additional consequence to the necessity of giving explicit consideration to one's auxiliary assumptions. Suppose, as often happens in psychology, that a researcher deems a theory to be unfalsifiable because he or she does not see any testable predictions. Is the theory really unfalsifiable or is the problem that the researcher has not been sufficiently thorough in identifying the necessary auxiliary assumptions that would lead to falsifiable predictions? Given that absolute falsification is impossible, and that researchers are therefore limited to some kind of “reasonable” falsification, Trafimow ( 2009 ) has argued that many allegedly unfalsifiable theories are reasonably falsifiable after all: it is just a matter of researchers having to be more thoughtful about considering auxiliary assumptions. Trafimow documented examples of theories that had been described as unfalsifiable that one could in fact falsify by proposing better auxiliary assumptions than had been imagined by previous researchers.

The notion that auxiliary assumptions can vary in quality is relevant for replication. Consider, for example, the case alluded to earlier regarding a purported failure to replicate Bargh et al.'s ( 1996 ) famous priming results. In the replication attempt of this well-known “walking time” study (Doyen et al., 2012 ), laser beams were used to measure the speed with which participants left the laboratory, rather than students with stopwatches. Undoubtedly, this adjustment was made on the basis of a reasonable auxiliary assumption that methods of measuring time that are less susceptible to human idiosyncrasies would be superior to methods that are more susceptible to them. Does the fact that the failed replication was not exactly like the original experiment disqualify it as invalid? At least with regard to this particular feature of this particular replication attempt, the answer is clearly “no.” If a researcher uses a better auxiliary assumption than in the original experiment, this should add to its validity rather than subtract from it 9 .

But suppose, for a particular experiment, that we are not in a good position to judge the superiority of alternative auxiliary assumptions. We might invoke what Meehl ( 1990b ) termed the ceteris paribus (all else equal) assumption. This idea, applied to the issue of direct replications, suggests that for researchers to be confident that a replication attempt is a valid one, the auxiliary assumptions in the replication have to be sufficiently similar to those in the original experiment that any differences in findings cannot reasonably be attributed to differences in the assumptions. Put another way, all of the unconsidered auxiliary assumptions should be indistinguishable in the relevant way: that is, all have to be sufficiently equal or sufficiently right or sufficiently irrelevant so as not to matter to the final result.

What makes it allowable for a researcher to make the ceteris paribus assumption? In a strict philosophical sense, of course, it is not allowable. To see this, suppose that Researcher A has published an experiment, Researcher B has replicated it, but the replication failed. If Researcher A claims that Researcher B made a mistake in performing the replication, or just got unlucky, there is no way to disprove Researcher A's argument absolutely. But suppose that Researchers C, D, E, and F also attempt replications, and also fail. It becomes increasingly difficult to support the contention that Researchers B–F all “did it wrong” or were unlucky, and that we should continue to accept Researcher A's version of the experiment. Even if a million researchers attempted replications, and all of them failed, it is theoretically possible that Researcher A's version is the unflawed one and all the others are flawed. But most researchers would conclude (and in our view, would be right to conclude) that it is more likely that it is Researcher A who got it wrong and not the million researchers who failed to replicate the observation. Thus, we are not arguing that replications, whether successful or not, are definitive. Rather, our argument is that replications (of sufficient quality) are informative.

Introducing a bayesian framework

To see why this is the case, we shall employ a Bayesian framework similar to Trafimow ( 2010 ). Suppose that an aficionado of Researcher A believes that the prior probability of anything Researcher A said or did is very high. Researcher B attempts a replication of an experiment by Researcher A and fails. The aficionado might continue confidently to believe in Researcher A's version, but the aficionado's confidence likely would be decreased slightly. Well then, as there are more replication failures, the aficionado's confidence would continue to decrease accordingly, and at some point the decrease in confidence would push the aficionado's confidence below the 50% mark, in which case the aficionado would put more credence in the replication failures than on the success obtained by Researcher A.

In the foregoing scenario, we would want to know the probability that the original result is actually true given Researcher B's replication failure [ p ( T | F )]. As Equation (1) shows, this depends on the aficionado's prior level of confidence that the original result is true [ p ( T )], the probability of failing to replicate given that the original result is true [ p ( F | T )], and the probability of failing to replicate [ p ( F )], as Equation (1) shows.

Alternatively, we could frame what we want to know in terms of a confidence ratio that the original result is true or not true given the failure to replicate [ p ( T | F ) p ( ~ T | F ) ] . This would be a function of the aficionado's prior confidence ratio about the truth of the finding [ p ( T ) p ( ~ T ) ] and the ratio of probabilities of failing given that the original result is true or not [ p ( F | T ) p ( F | ~ T ) ] . Thus, Equation (2) gives the posterior confidence ratio.

Suppose that the aficionado is a very strong one, so that the prior confidence ratio is 50. In addition, the probability ratio pertaining to failing to replicate is 0.5. It is worthwhile to clarify two points about this probability ratio. First, we assume that the probability of failing to replicate is less if the original finding is true than if it is not true, so that the ratio ought to be substantially less than 1. Second, how much less than 1 this ratio will be depends largely on the quality of the replication; as the replication becomes closer to meeting the ideal ceteris paribus condition, the ratio will deviate increasingly from 1. Put more generally, as the quality of the auxiliary assumptions going into the replication attempt increases, the ratio will decrease. Given these two ratios of 50 and 0.5, the posterior confidence ratio is 25. Although this is a substantial decrease in confidence from 50, the aficionado still believes that the finding is extremely likely to be true. But suppose there is another replication failure and the probability ratio is 0.8. In that case, the new confidence ratio is (25)(0.8) = 20. The pattern should be clear here: As there are more replication failures, a rational person, even if that person is an aficionado of the original researcher, will experience continually decreasing confidence as the replication failures mount.

If we imagine that there are N attempts to replicate the original finding that fail, the process described in the foregoing paragraph can be summarized in a single equation that gives the ratio of posterior confidences in the original finding, given that there have been N failures to replicate. This is a function of the prior confidence ratio and the probability ratios in the first replication failure, the second replication failure, and so on.

For example, staying with our aficionado with a prior confidence ratio of 50, imagine a set of 10 replication failures, with the following probability ratios: 0.5, 0.8, 0.7, 0.65, 0.75, 0.56, 0.69, 0.54, 0.73, and 0.52. The final confidence ratio, according to Equation (3), would be:

Note the following. First, even with an extreme prior confidence ratio (we had set it at 50 for the aficionado), it is possible to overcome it with a reasonable number of replication failures providing that the person tallying the replication failures is a rational Bayesian (and there is reason to think that those attempting the replications are sufficiently competent in the subject area and methods to be qualified to undertake them). Second, it is possible to go from a state of extreme confidence to one of substantial lack of confidence. To see this in the example, take the reciprocal of the final confidence ratio (0.54), which equals 1.84. In other words, the Bayesian aficionado now believes that the finding is 1.84 times as likely to be not true as true. If we imagine yet more failed attempts to replicate, it is easy to foresee that the future belief that the original finding is not true could eventually become as powerful, or more powerful, than the prior belief that the original finding is true.

In summary, auxiliary assumptions play a role, not only for original theory-testing experiments but also in replications—even in replications concerned only with the original finding and not with the underlying theory. A particularly important auxiliary assumption is the ever-present ceteris paribus assumption, and the extent to which it applies influences the “convincingness” of the replication attempt. Thus, a change in confidence in the original finding is influenced both by the quality and quantity of the replication attempts, as Equation (3) illustrates.

In presenting Equations (1–3), we reduced the theoretical content as much as possible, and more than is realistic in actual research 10 , in considering so-called “direct” replications. As the replications serve other purposes, such as “conceptual” replications, the amount of theoretical content is likely to increase. To link that theoretical content to the replication attempt, more auxiliary assumptions will become necessary. For example, in a conceptual replication of an experiment finding that attitudes influence behavior, the researcher might use a different attitude manipulation or a different behavior measure. How do we know that the different manipulation and measure are sufficiently theoretically unimportant that the conceptual replication really is a replication (i.e., a test of the underlying theory)? We need new auxiliary assumptions linking the new manipulation and measure to the corresponding constructs in the theory, just as an original set of auxiliary assumptions was necessary in the original experiment to link the original manipulation and measure to the corresponding constructs in the theory. Auxiliary assumptions always matter—and they should be made explicit so far as possible. In this way, it will be easier to identify where in the chain of assumptions a “breakdown” must have occurred, in attempting to explain an apparent failure to replicate.

Replication is not a silver bullet. Even carefully-designed replications, carried out in good faith by expert investigators, will never be conclusive on their own. But as Tsang and Kwan ( 1999 ) point out:

If replication is interpreted in a strict sense, [conclusive] replications or experiments are also impossible in the natural sciences.… So, even in the “hardest” science (i.e., physics) complete closure is not possible. The best we can do is control for conditions that are plausibly regarded to be relevant. (p. 763)

Nevertheless, “failed” replications, especially, might be dismissed by an original investigator as being flawed or “incompetently” performed—but this sort of accusation is just too easy. The original investigator should be able to describe exactly what parameters she sees as being theoretically relevant, and under what conditions her “effect” should obtain. If a series of replications is carried out, independently by different labs, and deliberately tailored to the parameters and conditions so described—yet they reliably fail to produce the original result—then this should be considered informative . At the very least, it will suggest that the effect is sensitive to theoretically-unspecified factors, whose specification is sorely needed. At most, it should throw the existence of the effect into doubt, possibly justifying a shift in research priorities. Thus, while “falsification” can in principle be avoided ad infinitum, with enough creative effort by one who wished to defend a favored theory, scientists should not seek to “rescue” a given finding at any empirical cost 11 . Informative replications can reasonably factor into scientists' assessment about just what that cost might be; and they should pursue such replications as if the credibility of their field depended on it. In the case of experimental social psychology, it does.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

Thanks are due to Anna Alexandrova for feedback on an earlier draft.

1 There are two steps to understanding this idea. First, because the foundational theories are so insecure, and the field's findings so under dispute, the “correct” empirical outcome of a given experimental design is unlikely to have been firmly established. Second, and insofar the first step applies, the standard by which to judge whether a replication has been competently performed is equally unavailable—since that would depend upon knowing the “correct” outcome of just such an experiment. Thus, a “competently performed” experiment is one that produces the “correct” outcome; while the “correct” outcome is defined by whatever it is that is produced by a “competently performed” experiment. As Collins ( 1985 ) states: “Where there is disagreement about what counts as a competently performed experiment, the ensuing debate is coextensive with the debate about what the proper outcome of the experiment is” (p. 89). This is the infamously circular experimenter's regress . Of course, a competently performed experiment should produce satisfactory (i.e., meaningful, useful) results on “outcome neutral” tests.

2 Assuming that it is a psychology experiment. Note that even if the “same” participants are run through the experiment one more time, they'll have changed in at least one essential way: they'll have already gone through the experiment (opening the door for practice effects, etc.).

3 On Popper's view, one must set up a “falsifying hypothesis,” i.e., a hypothesis specifying how another experimenter could recreate the falsifying evidence. But then, Popper says, the falsifying hypothesis itself should be severely tested and corroborated before it is accepted as falsifying the main theory. Interestingly, as a reviewer has suggested, the distinction between a falsifying hypothesis and the main theory may also correspond to the distinction between direct vs. conceptual replications that we discuss in a later section. On this view, direct replications (attempt to) reproduce what the falsifying hypothesis states is necessary to generate the original predicted effect, whereas conceptual replications are attempts to test the main theory.

4 The percentages reported here are the geometric mean of self-admission rates, prevalence estimates by the psychologists surveyed, and prevalence estimates derived by John et al. from the other two figures.

5 Priming has been defined a number of different ways. Typically, it refers to the ability of subtle cues in the environment to affect an individual's thoughts and behavior, often outside of her awareness or control (e.g., Bargh and Chartrand, 1999 ).

6 Even more damning, Trafimow ( 2003 ; Trafimow and Rice, 2009 ; Trafimow and Marks, 2015 ) has argued that the standard significance tests used in psychology are invalid even when they are done “correctly.” Thus, even if psychologists were to follow the prescriptions of Simmons et al.—and reduce their researcher degrees of freedom (see the discussion following this footnote)—this would still fail to address the core problem that such tests should not be used in the first place.

7 For example, a “study found that eating disorder patients were significantly more likely than others to see frogs in a Rorschach test, which the author interpreted as showing unconscious fear of oral impregnation and anal birth…” (Giner-Sorolla, 2012 , p. 562).

8 Asendorpf et al. ( 2013 ) explain why this is so: “[direct] replicability is a necessary condition for further generalization and thus indispensible for building solid starting points for theoretical development. Without such starting points, research may become lost in endless fluctuation between alternative generalization studies that add numerous boundary conditions but fail to advance theory about why these boundary conditions exist” (p. 140, emphasis added).

9 There may be other reasons why the “failed” replication by Doyen et al. should not be considered conclusive, of course; for further discussion see, e.g., Lieberman ( 2012 ).

10 Indeed, we have presented our analysis in this section in abstract terms so that the underlying reasoning could be seen most clearly. However, this necessarily raises the question of how to go about implementing these ideas in practice. As a reviewer points out, to calculate probabilities, the theory being tested would need to be represented as a probability model; then in effect one would have Bayes factors to deal with. We note that both Dienes ( 2014 ) and Verhagen and Wagenmakers ( 2014 ) have presented methods for assessing the strength of evidence of a replication attempt (i.e., in confirming the original result) along these lines, and we refer the reader to their papers for further consideration.

11 As Doyen et al. ( 2014 , p. 28, internal references omitted) recently argued: “Given the existence of publication bias and the prevalence of questionable research practices, we know that the published literature likely contains some false positive results. Direct replication is the only way to correct such errors. The failure to find an effect with a well-powered direct replication must be taken as evidence against the original effect. Of course, one failed direct replication does not mean the effect is non-existent—science depends on the accumulation of evidence. But, treating direct replication as irrelevant makes it impossible to correct Type 1 errors in the published literature.”

  • Ajzen I., Fishbein M. (1980). Understanding Attitudes and Predicting Social Behavior . Englewood Cliffs, NJ: Prentice-Hall. [ Google Scholar ]
  • Asendorpf J. B., Conner M., De Fruyt F., De Houwer J., Denissen J. J., Fiedler K., et al. (2013). Replication is more than hitting the lottery twice . Euro. J. Person . 27 , 108–119 10.1002/per.1919 [ CrossRef ] [ Google Scholar ]
  • Bargh J. A., Chartrand T. L. (1999). The unbearable automaticity of being . Am. Psychol . 54 , 462–479 10.1037/0003-066X.54.7.462 [ CrossRef ] [ Google Scholar ]
  • Bargh J. A., Chen M., Burrows L. (1996). Automaticity of social behavior: direct effects of trait construct and stereotype activation on action . J. Person. Soc. Psychol . 71 , 230–244. 10.1037/0022-3514.71.2.230 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bartlett T. (2013). Power of suggestion . Chron. High. Educ. Available online at: http://chronicle.com/article/Power-of-Suggestion/136907
  • Bem D. J. (2011). Feeling the future: experimental evidence for anomalous retroactive influences on cognition and affect . J. Person. Soc. Psychol . 100 , 407–425. 10.1037/a0021524 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Billig M. (2013). Learn to Write Badly: How to Succeed in the Social Sciences . Cambridge: Cambridge University Press. [ Google Scholar ]
  • Brandt M. J., IJzerman H., Dijksterhuis A., Farach F. J., Geller J., Giner-Sorolla R., et al. (2014). The replication recipe: what makes for a convincing replication? J. Exp. Soc. Psychol . 50 , 217–224 10.2139/ssrn.2283856 [ CrossRef ] [ Google Scholar ]
  • Braude S. E. (1979). ESP and Psychokinesis: a Philosophical Examination . Philadelphia, PA: Temple University Press. [ Google Scholar ]
  • Carey B. (2011). Fraud case seen as a red flag for psychology research . N. Y. Times . Available online at: http://www.nytimes.com/2011/11/03/health/research/noted-dutch-psychologist-stapel-accused-of-research-fraud.html
  • Cartwright N. (1991). Replicability, reproducibility, and robustness: comments on Harry Collins . Hist. Pol. Econ . 23 , 143–155 10.1215/00182702-23-1-143 [ CrossRef ] [ Google Scholar ]
  • Cesario J. (2014). Priming, replication, and the hardest science . Perspect. Psychol. Sci . 9 , 40–48 10.1177/1745691613513470 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chow S. L. (1988). Significance test or effect size? Psychol. Bullet . 103 , 105–110 10.1037/0033-2909.103.1.105 [ CrossRef ] [ Google Scholar ]
  • Collins H. M. (1975). The seven sexes: a study in the sociology of a phenomenon, or the replication of experiments in physics . Sociology 9 , 205–224 10.1177/003803857500900202 [ CrossRef ] [ Google Scholar ]
  • Collins H. M. (1981). Son of seven sexes: the social destruction of a physical phenomenon . Soc. Stud. Sci . 11 , 33–62 10.1177/030631278101100103 [ CrossRef ] [ Google Scholar ]
  • Collins H. M. (1985). Changing Order: Replication and Induction in Scientific Practice . Chicago, IL: University of Chicago Press. [ Google Scholar ]
  • Cross R. (1982). The Duhem-Quine thesis, Lakatos and the appraisal of theories in macroeconomics . Econ. J . 92 , 320–340 10.2307/2232443 [ CrossRef ] [ Google Scholar ]
  • Danzinger K. (1997). Naming the Mind . London: Sage. [ Google Scholar ]
  • Dienes Z. (2014). Using Bayes to get the most out of non-significant results . Front. Psychol . 5 : 781 . 10.3389/fpsyg.2014.00781 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Doyen S., Klein O., Pichon C. L., Cleeremans A. (2012). Behavioral priming: it's all in the mind, but whose mind? PLoS ONE 7 :e29081. 10.1371/journal.pone.0029081 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Doyen S., Klein O., Simons D. J., Cleeremans A. (2014). On the other side of the mirror: priming in cognitive and social psychology . Soc. Cognit . 32 , 12–32 10.1521/soco.2014.32.supp.12 [ CrossRef ] [ Google Scholar ]
  • Duhem P. (1954). The Aim and Structure of Physical Theory. Transl. P. P. Wiener . Princeton, NJ: Princeton University Press. [ Google Scholar ]
  • Earp B. D. (2011). Can science tell us what's objectively true? New Collect . 6 , 1–9 Available online at: https://www.academia.edu/625642/Can_science_tell_us_whats_objectively_true [ Google Scholar ]
  • Earp B. D. (2015). Does religion deserve a place in secular medicine? J. Med. Ethics . E-letter. Available online at: https://www.academia.edu/11118590/Does_religion_deserve_a_place_in_secular_medicine 10.1136/medethics-2013-101776 [ PubMed ] [ CrossRef ]
  • Earp B. D., Darby R. J. (2015). Does science support infant circumcision? Skeptic 25 , 23–30 Available online at: https://www.academia.edu/9872471/Does_science_support_infant_circumcision_A_skeptical_reply_to_Brian_Morris [ Google Scholar ]
  • Earp B. D., Everett J. A. C. (2013). Is the N170 face-specific? Controversy, context, and theory . Neuropsychol. Trends 13 , 7–26 10.7358/neur-2013-013-earp [ CrossRef ] [ Google Scholar ]
  • Earp B. D., Everett J. A. C., Madva E. N., Hamlin J. K. (2014). Out, damned spot: can the “Macbeth Effect” be replicated? Basic Appl. Soc. Psychol . 36 , 91–98 10.1080/01973533.2013.856792 [ CrossRef ] [ Google Scholar ]
  • Elms A. C. (1975). The crisis of confidence in social psychology . Am. Psychol . 30 , 967–976 10.1037/0003-066X.30.10.967 [ CrossRef ] [ Google Scholar ]
  • Ferguson C. J., Heene M. (2012). A vast graveyard of undead theories publication bias and psychological science's aversion to the null . Perspect. Psychol. Sci . 7 , 555–561 10.1177/1745691612459059 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fishbein M. (1980). Theory of reasoned action: some applications and implications , in Nebraska Symposium on Motivation, 1979 , eds Howe H., Page M. (Lincoln, OR: University of Nebraska Press; ), 65–116. [ PubMed ] [ Google Scholar ]
  • Fishbein M., Ajzen I. (1975). Belief, Attitude, Intention and Behavior: an Introduction to Theory and Research . Reading, MA: Addison-Wesley. [ Google Scholar ]
  • Folger R. (1989). Significance tests and the duplicity of binary decisions . Psychol. Bullet . 106 , 155–160 10.1037/0033-2909.106.1.155 [ CrossRef ] [ Google Scholar ]
  • Francis G. (2012). The psychology of replication and replication in psychology . Perspect. Psychol. Sci . 7 , 585–594 10.1177/1745691612459520 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Giner-Sorolla R. (2012). Science or art? How aesthetic standards grease the way through the publication bottleneck but undermine science . Perspect. Psychol. Sci . 7 , 562–571 10.1177/1745691612457576 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gómez O. S., Juzgado N. J., Vegas S. (2010). Replications types in experimental disciplines , in Proceedings of the International Symposium on Empirical Software Engineering and Measurement (Bolzano: ESEM; ). [ Google Scholar ]
  • Greenwald A. G. (1975). Consequences of prejudice against the null hypothesis . Psychol. Bullet . 82 , 1–20 10.1037/h0076157 [ CrossRef ] [ Google Scholar ]
  • Ioannidis J. P. (2005). Why most published research findings are false . PLoS Med . 2 :e124. 10.1371/journal.pmed.0020124 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ioannidis J. P. (2012a). Why science is not necessarily self-correcting . Perspect. Psychol. Sci . 7 , 645–654 10.1177/1745691612464056 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ioannidis J. P. (2012b). Scientific inbreeding and same-team replication: type D personality as an example . J. Psychosom. Res . 73 , 408–410. 10.1016/j.jpsychores.2012.09.014 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • John L. K., Loewenstein G., Prelec D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling . Psychol. Sci . 23 , 524–532. 10.1177/0956797611430953 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jordan G. (2004). Theory Construction in Second Language Acquisition . Philadelphia, PA: John Benjamins. [ Google Scholar ]
  • Jost J. (2013). Introduction to: an Additional Future for Psychological Science . Perspect. Psychol. Sci . 8 , 414–423 10.1177/1745691613491270 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kepes S., McDaniel M. A. (2013). How trustworthy is the scientific literature in industrial and organizational psychology? Ind. Organi. Psychol . 6 , 252–268 10.1111/iops.12045 [ CrossRef ] [ Google Scholar ]
  • Koole S. L., Lakens D. (2012). Rewarding replications a sure and simple way to improve psychological science . Perspect. Psychol. Sci . 7 , 608–614 10.1177/1745691612462586 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lakatos I. (1970). Falsification and the methodology of scientific research programmes , in Criticism and the growth of knowledge , eds Lakatos I., Musgrave A. (London: Cambridge University Press; ), 91–196. [ Google Scholar ]
  • Lakatos I. (1978). The Methodology of Scientific Research Programmes . Cambridge: Cambridge University Press. [ Google Scholar ]
  • LeBel E. P., Peters K. R. (2011). Fearing the future of empirical psychology: Bem's 2011 evidence of psi as a case study of deficiencies in modal research practice . Rev. Gen. Psychol . 15 , 371–379 10.1037/a0025172 [ CrossRef ] [ Google Scholar ]
  • Lieberman M. (2012). Does thinking of grandpa make you slow? What the failure to replicate results: does and does not mean . Psychol. Today . Available online at http://www.psychologytoday.com/blog/social-brain-social-mind/201203/does-thinking-grandpa-make-you-slow
  • Loscalzo J. (2012). Irreproducible experimental results: causes, (mis) interpretations, and consequences . Circulation 125 , 1211–1214. 10.1161/CIRCULATIONAHA.112.098244 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lykken D. T. (1968). Statistical significance in psychological research . Psychol. Bullet . 70 , 151–159. 10.1037/h0026141 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Magee B. (1973). Karl Popper . New York, NY: Viking Press. [ Google Scholar ]
  • Makel M. C., Plucker J. A., Hegarty B. (2012). Replications in psychology research how often do they really occur? Perspect. Psychol. Sci . 7 , 537–542 10.1177/1745691612460688 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Meehl P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology . J. Consult. Clin. Psychol . 46 , 806–834 10.1037/0022-006X.46.4.806 [ CrossRef ] [ Google Scholar ]
  • Meehl P. E. (1990a). Appraising and amending theories: the strategy of Lakatosian defense and two principles that warrant using it . Psychol. Inquiry 1 , 108–141 10.1207/s15327965pli0102_1 [ CrossRef ] [ Google Scholar ]
  • Meehl P. E. (1990b). Why summaries of research on psychological theories are often uninterpretable . Psychol. Reports 66 , 195–244. [ Google Scholar ]
  • Mulkay M., Gilbert G. N. (1981). Putting philosophy to work: Karl Popper's influence on scientific practice . Philos. Soc. Sci . 11 , 389–407 10.1177/004839318101100306 [ CrossRef ] [ Google Scholar ]
  • Mulkay M., Gilbert G. N. (1986). Replication and mere replication . Philos. Soc. Sci . 16 , 21–37 10.1177/004839318601600102 [ CrossRef ] [ Google Scholar ]
  • Nosek B. A., the Open Science Collaboration. (2012). An open, large-scale, collaborative effort to estimate the reproducibility of psychological science . Perspect. Psychol. Sci . 7 , 657–660 10.1177/1745691612462588 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Nosek B. A., Spies J. R., Motyl M. (2012). Scientific utopia II. Restructuring incentives and practices to promote truth over publishability . Perspect. Psychol. Sci . 7 , 615–631 10.1177/1745691612459058 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Pashler H., Wagenmakers E. J. (2012). Editors' introduction to the special section on replicability in psychological science: a crisis of confidence? Perspect. Psychol. Sci . 7 , 528–530 10.1177/1745691612465253 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Polanyi M. (1962). Tacit knowing: Its bearing on some problems of philosophy . Rev. Mod. Phys . 34 , 601–615 10.1103/RevModPhys.34.601 [ CrossRef ] [ Google Scholar ]
  • Popper K. (1959). The Logic of Scientific Discovery . London: Hutchison. [ Google Scholar ]
  • Quine W. V. O. (1980). Two dogmas of empiricism , in From a Logical Point of View, 2n Edn., ed Quine W. V. O. (Cambridge, MA: Harvard University Press; ), 20–46. [ Google Scholar ]
  • Radder H. (1992). Experimental reproducibility and the experimenters' regress , in PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association (Chicago, IL: University of Chicago Press; ). [ Google Scholar ]
  • Rosenthal R. (1979). The file drawer problem and tolerance for null results . Psychol. Bullet . 86 , 638–641 10.1037/0033-2909.86.3.638 [ CrossRef ] [ Google Scholar ]
  • Schmidt S. (2009). Shall we really do it again? The powerful concept of replication is neglected in the social sciences . Rev. Gen. Psychol . 13 , 90–100 10.1037/a0015108 [ CrossRef ] [ Google Scholar ]
  • Schnall S. (2014). Simone Schnall on her experience with a registered replication project . SPSP Blog . Available online at: http://www.spspblog.org/simone-schnall-on-her-experience-with-a-registered-replication-project/
  • Simmons J. P., Nelson L. D., Simonsohn U. (2011). False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant . Psychol. Sci . 22 , 1359–1366. 10.1177/0956797611417632 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Smith N. C. (1970). Replication studies: a neglected aspect of psychological research . Am. Psychol . 25 , 970–975 10.1037/h0029774 [ CrossRef ] [ Google Scholar ]
  • Stroebe W., Postmes T., Spears R. (2012). Scientific misconduct and the myth of self-correction in science . Perspect. Psychol. Sci . 7 , 670–688 10.1177/1745691612460687 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Trafimow D. (2003). Hypothesis testing and theory evaluation at the boundaries: surprising insights from Bayes's theorem . Psychol. Rev . 110 , 526–535. 10.1037/0033-295X.110.3.526 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Trafimow D. (2009). The theory of reasoned action: a case study of falsification in psychology . Theory Psychol . 19 , 501–518 10.1177/0959354309336319 [ CrossRef ] [ Google Scholar ]
  • Trafimow D. (2010). On making assumptions about auxiliary assumptions: reply to Wallach and Wallach . Theory Psychol . 20 , 707–711 10.1177/0959354310374379 [ CrossRef ] [ Google Scholar ]
  • Trafimow D. (2014). Editorial . Basic Appl. Soc. Psychol . 36 , 1–2, 10.1080/01973533.2014.865505 [ CrossRef ] [ Google Scholar ]
  • Trafimow D., Marks M. (2015). Editorial . Basic Appl. Soc. Psychol . 37 , 1–2 10.1080/01973533.2015.1012991 [ CrossRef ] [ Google Scholar ]
  • Trafimow D., Rice S. (2009). A test of the NHSTP correlation argument . J. Gen. Psychol . 136 , 261–269. 10.3200/GENP.136.3.261-270 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tsang E. W., Kwan K. M. (1999). Replication and theory development in organizational science: a critical realist perspective . Acad. Manag. Rev . 24 , 759–780 10.2307/259353 [ CrossRef ] [ Google Scholar ]
  • Van IJzendoorn M. H. (1994). A Process Model of Replication Studies: on the Relation between Different Types of Replication . Leiden University Library. Available online at: https://openaccess.leidenuniv.nl/bitstream/handle/1887/1483/168_149.pdf?sequence=1
  • Verhagen J., Wagenmakers E. J. (2014). Bayesian tests to quantify the result of a replication attempt . J. Exp. Psychol . 143 , 1457–1475. 10.1037/a0036731 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Westen D. (1988). Official and unofficial data . New Ideas Psychol . 6 , 323–331 10.1016/0732-118X(88)90044-X [ CrossRef ] [ Google Scholar ]
  • Yong E. (2012). A failed replication attempt draws a scathing personal attack from a psychology professor . Discover Magazine . Available online at http://blogs.discovermagazine.com/notrocketscience/2012/03/10/failed-replication-bargh-psychology-study-doyen/#.VVGC-M6Gjds

Psychology Dictionary

FALSIFIABILITY

was first argued by Austria-born British philosopher Karl Popper (1902 - 1994) as one of the staple canons of the general idea surrounding a science. If a concept can be disproved or proven incorrect, it is falsifiable.

Avatar photo

Leave a Reply

Your email address will not be published. Required fields are marked *

Latest Posts

falsifiability hypothesis definition psychology

What Happens At An ADHD Assessment

falsifiability hypothesis definition psychology

A Quick Look at the History Behind Hypnosis

falsifiability hypothesis definition psychology

A Brief History of Brainwashing: The Science of Thought Control

falsifiability hypothesis definition psychology

A Deep Dive into the Social Psychology of Leadership

falsifiability hypothesis definition psychology

Counseling Approaches to Client Care: Theories to Apply in Practice

falsifiability hypothesis definition psychology

The Future Of Education: Can You Earn A Psychology Degree Online?

falsifiability hypothesis definition psychology

Insomnia & Mental Illness: What is the Correlation?

Psychology of Decision Making

Stop Guessing: Here Are 3 Steps to Data-Driven Psychological Decisions

falsifiability hypothesis definition psychology

Getting Help with Grief: Understanding Therapy & How It Can Help

falsifiability hypothesis definition psychology

Exploring the Psychology of Risk and Reward

falsifiability hypothesis definition psychology

Understanding ADHD in Women: Symptoms, Treatment & Support

falsifiability hypothesis definition psychology

Meeting the Milestones: A Guide to Piaget's Child Developmental Stages

Popular psychology terms, medical model, hypermnesia, affirmation, brainwashing, backup reinforcer, affiliative behavior, message-learning approach, behavioral sequence, contrast effect, basic anxiety.

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Do testability and falsifiability have statistical definitions?

Psychology: the Core Concepts says

Psychology differs from the pseudosciences in that it employs the scientific method to test its ideas empirically. The scientific method relies on testable theories and falsifiable hypotheses.

Do testability and falsifiability have statistical definitions? If not, what do they mean roughly, and how are they related and different?

Are theory and hypothesis the same concepts?

  • hypothesis-testing
  • terminology
  • philosophical

mkt's user avatar

4 Answers 4

Testability and falsifiability are general ideas that are discussed at length in the philosophy of science, but they manifest in statistics and aspects of these concepts can be framed in statistical or probabilistic terms. It is useful to have a broad understanding of the philosophy of science and its historical development to understand these concepts, but it is also useful to see how they arise in the context of probability and statistics. Below we examine the latter.

The principle of "falsifiability" is a consequence of the law of total probability

The principle of falsifiability means that in a valid experimental situation relating to a hypothesis, there must be at least one possible outcome that would count as evidence against the hypothesis; if there is not, then the experiment cannot ever be considered to give evidence in favour of the hypothesis. This principle is built into probability theory via the law of total probability , and it occurs in Bayesian reasoning. This rule of probability ensures that if there can be confirmatory evidence for a hypothesis, then it must also be possible for there to be disconfirmatory evidence for that same hypothesis . This property of probability theory is captured in the following simple theorem.

Theorem (Principle of falsifiability for countable space): Consider a hypothesis $H$ and suppose we have a partition of the sample space $\mathscr{E}$ composed of a countable number of events. Suppose that there is at least one piece of confirmatory evidence $E \in \mathscr{E}$ that is in favour of the hypothesis ---i.e., a piece of evidence such that: $$\mathbb{P}(H|E) > \mathbb{P}(H).$$ Then there must exist at least one event $E' \in \mathscr{E}$ that is disconfirmatory to the hypothesis ---i.e., a piece of evidence such that: $$\mathbb{P}(H|E') < \mathbb{P}(H).$$ Proof: We will use a proof by contradiction. Suppose ---contra the theorem--- that for all $R \in \mathscr{E}$ we have $\mathbb{P}(H|R) \geqslant \mathbb{P}(H)$ . Using the law of total probability, we then have: $$\begin{align} \mathbb{P}(H) &= \sum_{R} \mathbb{P}(H|R) \mathbb{P}(R) \\[6pt] &= \bigg[ \mathbb{P}(H|E) \mathbb{P}(E) + {\sum_{R \neq E} \mathbb{P}(H|R) \mathbb{P}(R)} \bigg] \\[6pt] &\geqslant \bigg[ \mathbb{P}(H|E) \mathbb{P}(E) + {\sum_{R \neq E} \mathbb{P}(H) \mathbb{P}(R)} \bigg] \\[6pt] &> \bigg[ \mathbb{P}(H) \mathbb{P}(E) + {\sum_{R \neq E} \mathbb{P}(H) \mathbb{P}(R)} \bigg] \\[6pt] &= \mathbb{P}(H) \bigg[ \mathbb{P}(E) + {\sum_{R \neq E} \mathbb{P}(R)} \bigg] \\[6pt] &= \mathbb{P}(H) {\sum_{R} \mathbb{P}(R)} \\[8pt] &= \mathbb{P}(H), \\[6pt] \end{align}$$ which is a contradiction. This establishes the theorem. $\blacksquare$

If you would like to see an application of this principle within Bayesian reasoning, you might be interested in reading O'Neill (2014) on the famous "doomsday argument". This paper argues that the doomsday argument is an example of erroneous reasoning in which there is an argument to a foregone conclusion, in contradiction to the proper application of Bayes' rule. You might also be interested in reading Kadane et al (1996) , which talks generally about the notion of "reasoning to a foregone conclusion" (i.e., without the possibility of falsification) and gives sufficient foundational conditions for probabilistic reasoning under which this cannot occur.

The principle of "testability" relates to experimental design and other statistical principles

The notion of "testability" means that it is possible to create an experiment that can provide sufficient evidence to test the hypothesis. This idea therefore forms a part of the field of experimental design, which can be regarded as a subfield of statistics. Wrapped up in this notion is the determination of the requirements that would be needed to form a valid experiment for a hypothesis, any protocols that need to be applied (e.g., randomisation, blinding, etc.), and how much evidence needs to be accumulated in order to get sufficient evidence on the hypothesis of interest to make an inference at some minimum level of confidence/accuracy. The last of these is usually determined by making sample size calculations using statistical rules.

Testability can be framed in many different ways, with stronger or weaker requirements for particular contexts. It is often framed as requiring it to be possible to determine whether the hypothesis is true or false, and in some contexts it might require one to test causal hypothesis, which then imposes additional requirements. In any case, those are strong notions of testability, but if we are willing to think probabilistically, then a very weak notion of testability would merely require that it is possible to create an experiment that can provide either confirmatory or disconfirmatory evidence to some degree. This is a lot weaker than some demands for testability but it does provide a potential starting point.

If we are willing accept a very weak notion of testability then this occurs when it is possible to construct an experiment where the evidence can shift our posterior belief away from our prior belief under some observable evidence. We have already seen above that if it is possible to see confirmatory evidence then it must also be possible to see disconfirmatory evidence, so a belief-shift either way could potentially occur. Weak testability would occur so long as the evidence in the experiment is not (statistically) independent of the hypothesis of interest. If we want to impose a stronger requirement for testability then it might entail having a larger amount of evidence (e.g., a minimal required sample size), or it might require imposing a particular experimental design or experimental protocols.

Ben's user avatar

  • 3 $\begingroup$ Well said. I'll just add that in most situations decision making is the most applicable approach (and one for which Bayes is tremendously suited). One can make the needed decision without having hypotheses or discussing falsifiability etc. Bayesian posterior inference allows one to play the odds, just as we do in everyday life. Full Bayesian decisions based on maximizing expected utility formalizes and optimizes the process. $\endgroup$ –  Frank Harrell Commented Aug 13, 2023 at 11:58
  • $\begingroup$ Choosing a personal prior probability of the truth of an hypothesis does not sound like the scientific method to me. Or are you restricting your answer to some form of 'objective' Bayesian approach? (I understand that many Bayesians object to the 'objective' approach.) $\endgroup$ –  Graham Bornholt Commented Aug 13, 2023 at 18:44
  • $\begingroup$ Comments have been moved to chat ; please do not continue the discussion here. Before posting a comment below this one, please review the purposes of comments . Comments that do not request clarification or suggest improvements usually belong as an answer , on Cross Validated Meta , or in Cross Validated Chat . Comments continuing discussion may be removed. $\endgroup$ –  Scortchi - Reinstate Monica ♦ Commented Aug 18, 2023 at 7:51
  • 1 $\begingroup$ " [...] all with positive probability ", darn, and here I was hoping to use negative probability ! 😉 $\endgroup$ –  Galen Commented Dec 20, 2023 at 21:03
  • 1 $\begingroup$ @Galen: Actually, the assumption is there to rule out events with zero probability, so as to allow me to put the probabilities in the denominator of a fraction. :) $\endgroup$ –  Ben Commented Dec 21, 2023 at 4:10

They don't have statistical definitions but statistics may be helpful in showing either.

"Testable" means ... well "can be tested". If (sticking to psychology) is that, say, "In college, men prefer male professors and women prefer female ones" then that is testable because I can (at least in theory) look at a whole bunch of male and female college students and get them to rate male and female professors. This is a case where statistics will be very helpful in doing the test. On the other hand, if my hypothesis is "every male student dislikes every female professor" then statistics isn't needed. One example is enough.

"Falsifiable" means that there is some possible evidence which would lead me to reject my hypothesis. Maybe statistical, maybe not.

EDIT: In the second paragraph, I mean "reject" in the common language sense, rather than in the sense of statistics (although they could overlap).

Peter Flom's user avatar

  • 1 $\begingroup$ @Tim "Testable theories" and "falsifiable hypotheses" are broad concepts in science that statistical methods are often used to address. Perhaps your specific questions in your above comments would be better addressed in new questions. $\endgroup$ –  Graham Bornholt Commented Aug 13, 2023 at 6:02
  • 3 $\begingroup$ The last paragraph could do with some claification as "reject" is a jargon term in statistics. If you reject a hypothesis (e.g. H0) it does not mean that you have shown it to be false (that seems close to the p-value fallacy). Frequentist approaches cannot assign a probability to the true of a particular hypothesis, which seems to make falsification of a particular hypothesis rather problematic. Rejecting a hypothesis in an NHST sense is not the same as falsification. $\endgroup$ –  Dikran Marsupial Commented Aug 13, 2023 at 11:00
  • 1 $\begingroup$ @DikranMarsupial on the flip side of that coin, though, "falsifiable" is a jargon term in philosophy of science. It does not generally imply "proven to be false" in the mathematical sense of the term "proof". I'm not sure how this note fits in here, but I don't think your objection is wholly justified. $\endgroup$ –  Him Commented Aug 15, 2023 at 17:50
  • 2 $\begingroup$ "A theory is to be called 'empirical' or 'falsifiable' if it divides the class of all possible basic statements unambiguously into the following two nonempty subclasses. First, the class of all those basic statements with which it is inconsistent (or which it rules out, or prohibits): we call this the class of the potential falsifiers of the theory; and secondly, the class of those basic statements which it does not contradict (or which it 'permits'). We can put this more briefly by saying: a theory is falsifiable if the class of its potential falsifiers is not empty." --K.Popper $\endgroup$ –  Him Commented Aug 15, 2023 at 17:51
  • 1 $\begingroup$ and a "basic statement": "statements asserting that an observable event is occurring in a certain individual region of space and time" --K.Popper $\endgroup$ –  Him Commented Aug 15, 2023 at 17:51

This is meant to add an aspect to the other answers, which have some valuable material.

When talking about "falsification" in statistics, in most cases this isn't the same as logical falsification in the sense that a hypothesis would be logically falsified if something happens that under the hypothesis is impossible , so the hypothesis is strictly incompatible with the observed data and must therefore be false (even though philosophy of science teaches that it isn't quite that easy even without taking into account statistical variation, as in many cases something that supposedly refutes a theory/hypothesis can be explained in other ways such as erroneous measurements, or failure of an auxiliary hypothesis that connects the main hypothesis to the data rather than failure of the main hypothesis of interest itself).

In statistics, however, standard models will assign nonzero probability (or at least density) to every potential outcome, be the hypothesis true or not, and this means that any claim that a hypothesis "is falsified by the data" will come with a nonzero error probability, i.e., observed data were just very unlikely but not strictly impossible (regardless of whether we're talking Bayesian or frequentist inference). This also means that if we want to make statements such as "the hypothesis is falsified by the data", this needs to be based on a threshold (how small a probability is small enough to talk of "falsification"), or otherwise we can only make "graded" statements (such as p-value or posterior probability of hypothesis or Bayes factor equal to 0.03), but ultimately an interpretation in words is needed anyway.

Regarding what the hypotheses are, statistical hypotheses are probability models (often parametric with a restrictive specification of parameter values), whereas "research hypotheses" are often informal (in some fields that strongly rely on mathematics, formal research hypotheses are the standard, but in some other fields almost everything is informal). So a research hypothesis needs to be "translated" into a statistical model, and this usually involves model assumptions such as independence or certain distributional shapes that can be doubted (and checked, but only to a limited extent) and may affect the interpretation of any outcome of the statistical analyses. In fact this adds an additional source of uncertainty on top of any uncertainty already modelled by the statistical model.

The question regarding theory vs. hypothesis makes sense regarding the "research hypothesis", but chances are that in different fields and situations connections between what people call "theory" and what people call "research hypothesis" may differ (a research hypothesis may often be a more specified instance/special case of a more general theory); in any case the statistical hypothesis will not normally be identical to the research hypothesis let alone the scientific theory of interest, but will come with additional restrictions (and/or add-ons, as the theory to be tested may not involve observational variation, which is modelled by the statistical hypothesis).

This also implies that for any result of a statistical analysis it makes sense to ask: "Could these data have led us to a different conclusion had we chosen another statistical model that is as well compatible with the research hypothesis of interest and - as far as this can be tested - the data?"

Another uncomfortable implication is that if many statistical hypothesis are up for "falsification" at a certain probability standard, the probability that an error ("statistically falsifying" a hypothesis that is in fact true) occurs can become quite large, as the probability that one out of many (falsification) events obtains can be quite large even if the probability for every single such event is very small (referred to as the problem of multiple testing in statistics).

Christian Hennig's user avatar

  • $\begingroup$ "This also implies that for any result of a statistical analysis it makes sense to ask: "Could these data have led us to a different conclusion had we chosen another statistical model that is as well compatible with the research hypothesis of interest and - as far as this can be tested - the data?" ...." Such a question would seem to be an unsurmoutable challenge for a Bayesian, given the flexibility provided by the choices for a prior. $\endgroup$ –  Graham Bornholt Commented Dec 20, 2023 at 23:58
  • 1 $\begingroup$ @GrahamBornholt There won't be an exhaustive answer to that question, neither for frequentists, but this doesn't mean that we shouldn't even start asking (some sensitivity analysis is better than none). $\endgroup$ –  Christian Hennig Commented Dec 21, 2023 at 1:11
  • $\begingroup$ Yes, I agree with you. $\endgroup$ –  Graham Bornholt Commented Dec 21, 2023 at 4:21

These are important concepts that are in many ways more fundamental than statistics, they are about philosophy of science. I know only a little about this, so what follows is mainly a summary of my thinking and not an expert answer.

Theories are organising frameworks that contain several related ideas. A single theory can lead to or contain several hypotheses . A hypothesis is much more specific: it is an explanation for a phenomenon that includes a specific mechanism. Note that these are research or scientific hypotheses , not the statistical hypotheses that we frame in our models 1 . A research or scientific hypothesis has to be translated into a statistical hypothesis to be amenable to quantitative analysis (or perhaps 'quantitative falsification', to coin a phrase). I've yet to find a good discussion about how best to go about this; it is an interesting and complex challenge.

A hypothesis can lead to one or more predictions , which can be tested. The more specific and novel the predictions that arise from a hypothesis, the more useful that hypothesis is. If a precise prediction is found to be wrong through appropriate experimentation or observation, we can judge the hypothesis to be incorrect i.e. falsified . This falsification step usually involves statistical analysis but it not may require it, especially if a single observation is sufficient for falsification (this can be captured using Bayesian philosophy/approaches, but it may not be needed). The proverbial black swan is of course the usual go-to example for the power of a single observation. Real-world examples are harder to come by, but Eddington's observation of light bending around the sun during a solar eclipse (predicted by Einstein's general theory of relativity) might be one; I don't know enough about the details to be sure though.

I've found Popper's essay on Science as Falsification 2 to be a short, clear and insightful exposition of this viewpoint. His vision is an excellent goal for science to aspire to, even if it is an incomplete or incorrect description of how science actually functions. It's a strongly contested view, and I'd encourage looking at alternate viewpoints as well.

I find 'theory' and 'testable' are harder to pin down than the other terms. Especially the latter - one can in principle refer to theories or predictions as testable, which muddies the waters somewhat. On this front, Kuhn and Lakatos raised important and interesting ideas about how it is that theories rise and fall. The processes they describe have a strong social dynamic and are much messier than the Popperian falsification of hypotheses described above. Hypotheses that form part of a theory can be falsified while the theory itself survives in some reduced or modified form, for example. Often, the theory is jettisoned only when a substantial portion of its core ideas are falsified or replaced by a theory that improves on it - by making new predictions, or more precise and accurate ones.

1 I think statistics has done science a disservice by causing so much confusion around the term 'hypothesis'. Null hypothesis statistical testing has arguably led to a great deal of poor science in part because people think that they are pursuing some high scientific goal by 'testing hypotheses' even though it's the uninteresting kind of hypotheses.

2 Popper, K. R. (1963). Science as falsification. Conjectures and refutations, 1(1963), 33-39.

  • 1 $\begingroup$ Are you able to clarify why you think the hypotheses selected for testing are " the uninteresting kind of hypotheses"? What type of hypotheses are the interesting kind of hypotheses in your view? $\endgroup$ –  Graham Bornholt Commented Dec 21, 2023 at 4:28
  • $\begingroup$ @GrahamBornholt In my opinion, $\beta = 0$ (the standard statistical hypothesis tested) is uninteresting in almost all models one encounters. Often, it would be surprising if $\beta$ was precisely 0. Interesting and memorable hypotheses are often remembered as theories once they survive serious attempts at falsification and gain support. Darwinian evolution is a fantastic example of this. Another case that I'm less familiar with is the existence of the Higgs boson/field to explain why particles have mass. $\endgroup$ –  mkt Commented Dec 22, 2023 at 9:53
  • 1 $\begingroup$ These are some of the greatest cases in science, so perhaps it's not fair to hold every study to that standard. So instead, let me offer a nice example of an interesting hypothesis from my field that does NOT appear to be true. The burglar alarm hypothesis tried to explain why some unicellular algae in the ocean are bioluminescent. This is a puzzle because bioluminescence is complex and costly, and doesn't serve an obvious purpose in these algae. Bioluminescence is often used to attract mates, prey, or organisms that help disperse offspring, none of which applies in these organisms. $\endgroup$ –  mkt Commented Dec 22, 2023 at 9:59
  • $\begingroup$ @GrahamBornholt It also attracts attention when that seems to be a bad idea. One curious feature is that it happens when the surrounding water is disturbed. The burglar alarm hypothesis says that this disturbance is often caused by the swimming of the predators of the algae, and that cells flash to attract the predators of their predators . In other words, it's like a plant using a literal flashing light to call the attention of a wolf and indicate that there's a tasty deer right there. $\endgroup$ –  mkt Commented Dec 22, 2023 at 10:07
  • $\begingroup$ While we haven't fully ruled this hypothesis out, it's not consistent with much else we have learned, and it appears very unlikely that it is true. But it's an appealing and thought-provoking idea, and I commend the people who came up with it. I think this side of science - the creation of good research hypotheses - is woefully neglected, and I partly blame the conflation with statistical hypotheses for this problem. $\endgroup$ –  mkt Commented Dec 22, 2023 at 10:11

Your Answer

Sign up or log in, post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged hypothesis-testing terminology definition psychology philosophical or ask your own question .

  • Featured on Meta
  • User activation: Learnings and opportunities
  • Site maintenance - Mon, Sept 16 2024, 21:00 UTC to Tue, Sept 17 2024, 2:00...

Hot Network Questions

  • Why is the area covered by 1 steradian (in a sphere) circular in shape?
  • Why are my empty files not being assigned the correct mimetype?
  • Inequality involving finite number of nonnegative real numbers
  • Why does documentation take a large storage?
  • Assumptions of Linear Regression (homoscedasticity and normality of residuals)
  • Spacing between mathrel and mathord same as between mathrel and mathopen
  • Why were there so many OSes that had the name "DOS" in them?
  • "Truth Function" v.s. "Truth-Functional"
  • Why was Panama Railroad in poor condition when US decided to build Panama Canal in 1904?
  • What is a natural-sounding verb form for the word dorveille?
  • How do Protestants define what constitutes getting married?
  • How can I analyze the anatomy of a humanoid species to create sounds for their language?
  • Did Queen (or Freddie Mercury) really not like Star Wars?
  • For glacier winds to exist, are circulation cells needed?
  • Where to put acknowledgments in a math paper
  • Is Produce Flame a spell that the caster casts upon themself?
  • Are data in the real world "sampled" in the statistical sense?
  • If a friend hands me a marijuana edible then dies of a heart attack am I guilty of felony murder?
  • Is it possible for one wing to stall due to icing while the other wing doesn't ice?
  • How can I support a closet rod where there's no shelf?
  • If Act A repeals another Act B, and Act A is repealed, what happens to the Act B?
  • What film is it where the antagonist uses an expandable triple knife?
  • Taylor Swift - Use of "them" in her text "she fights for the rights and causes I believe need a warrior to champion them"
  • When does a finite group have finitely many indecomposable representations?

falsifiability hypothesis definition psychology

IMAGES

  1. Everything to Know About Principle of Falsifiability

    falsifiability hypothesis definition psychology

  2. Karl Popper: Falsification Theory

    falsifiability hypothesis definition psychology

  3. The Criterion of Falsifiability (Module 1 1b 2)

    falsifiability hypothesis definition psychology

  4. falsifiability

    falsifiability hypothesis definition psychology

  5. PPT

    falsifiability hypothesis definition psychology

  6. 👍 Falsifiability psychology. Psychology as a Science. 2019-01-20

    falsifiability hypothesis definition psychology

VIDEO

  1. Psychology: Falsifiable

  2. mod05lec25

  3. What does hypothesis mean?

  4. mod05lec26

  5. The Scientific Method (a lecture for students starting research)

  6. Gregg Rosenberg: Can you Falsify Theories of Experience?

COMMENTS

  1. Karl Popper: Theory of Falsification

    The Falsification Principle, proposed by Karl Popper, is a way of demarcating science from non-science. It suggests that for a theory to be considered scientific, it must be able to be tested and conceivably proven false. For example, the hypothesis that "all swans are white" can be falsified by observing a black swan.

  2. Research Hypothesis In Psychology: Types, & Examples

    Examples. A research hypothesis, in its plural form "hypotheses," is a specific, testable prediction about the anticipated results of a study, established at its outset. It is a key component of the scientific method. Hypotheses connect theory to data and guide the research process towards expanding scientific understanding.

  3. Definition:

    Falsifiability is a fundamental principle in the scientific method as it emphasizes the importance of empirical evidence and objective testing in evaluating and refining scientific knowledge. By requiring theories to be potentially refutable, falsifiability encourages scientists to design experiments that could potentially disprove their ...

  4. APA Dictionary of Psychology

    falsifiability. n. the condition of admitting falsification: the logical possibility that an assertion, hypothesis, or theory can be shown to be false by an observation or experiment. The most important properties that make a statement falsifiable in this way are (a) that it makes a prediction about an outcome or a universal claim of the type ...

  5. Falsifiability

    Falsifiability (or refutability) is a deductive standard of evaluation of scientific theories and hypotheses, introduced by the philosopher of science Karl Popper in his book The Logic of Scientific Discovery (1934). [B] A theory or hypothesis is falsifiable if it can be logically contradicted by an empirical test.

  6. Falsifiability

    Inquiry-based Activity: Popular media and falsifiability. Introduction: Falsifiability, or the ability for a statement/theory to be shown to be false, was noted by Karl Popper to be the clearest way to distinguish science from pseudoscience. While incredibly important to scientific inquiry, it is also important for students to understand how ...

  7. Falsification Principle

    Definition of Falsification Principle: The Falsification Principle, also known as the doctrine of falsifiability, is a key concept in the philosophy of science developed by philosopher Karl Popper. It states that for a theory or hypothesis to be considered scientific, it must be capable of being proven false or refuted through empirical ...

  8. Popper: Proving the Worth of Hypotheses

    Popper enunciates a number of such rules which are based on methodological decisions about how to go about accepting and rejecting hypotheses. An example of such a rule is the following. Once a hypothesis has been proposed and tested, and has proved its mettle, it may not be allowed to drop out without 'good reason'.

  9. Falsifiability

    Definition. Definition of falsifiable: a property of a theory such that one can conduct an empirical study that will show the theory is false if it is actually false. Scientific theories are models for making predictions about the world. These models can be evaluated based on how accurately they predict the aspects of the world they model ...

  10. Degrees of riskiness, falsifiability, and truthlikeness

    In this paper, we take a fresh look at three Popperian concepts: riskiness, falsifiability, and truthlikeness (or verisimilitude) of scientific hypotheses or theories. First, we make explicit the dimensions that underlie the notion of riskiness. Secondly, we examine if and how degrees of falsifiability can be defined, and how they are related to various dimensions of the concept of riskiness ...

  11. Falsifiability

    Falsifiability, according to the philosopher Karl Popper, defines the inherent testability of any scientific hypothesis. Science and philosophy have always worked together to try to uncover truths about the universe we live in. Indeed, ancient philosophy can be understood as the originator of many of the separate fields of study we have today ...

  12. Replication, falsification, and the crisis of confidence in social

    1 Uehiro Centre for Practical Ethics, University of Oxford, Oxford, UK; 2 Department of History and Philosophy of Science, University of Cambridge, Cambridge, UK; 3 Department of Psychology, New Mexico State University, Las Cruces, NM, USA; The (latest) crisis in confidence in social psychology has generated much heated discussion about the importance of replication, including how it should be ...

  13. Criterion of falsifiability

    criterion of falsifiability, in the philosophy of science, a standard of evaluation of putatively scientific theories, according to which a theory is genuinely scientific only if it is possible in principle to establish that it is false.The British philosopher Sir Karl Popper (1902-94) proposed the criterion as a foundational method of the empirical sciences.

  14. Falsifiability in Psychology

    Exploring the principle of falsifiability in psychology, this content delves into how hypotheses must be testable to be scientifically viable. ... Definition of Falsifiability; Falsifiability is the principle that a hypothesis must be able to be disproven through empirical testing. Importance of Falsifiability in Psychology;

  15. (PDF) Falsificationism is not just 'potential' falsifiability, but

    Falsificationism is not just 'potential' falsifiability, but requires 'actual' falsification: Social psychology, critical rationalism, and progress in science

  16. Falsificationism is not just 'potential' falsifiability, but requires

    The Journal for the Theory of Social Behaviour is a theoretical social psychology journal covering human behaviour, psychology, sociology, social policy & more. ... the guiding principle of 'falsificationism' is reduced to a mere 'falsifiability' and some central elements of critical rationalism are left out - those that are ...

  17. The Discovery of the Falsifiability Principle

    Footnote 9 Falsifiability is a logical property of theoretical systems. They must be formulated in such a way that they can be refuted by empirical evidence. Their logical form must allow their potential falsification. Falsification is the act of falsifying a body of hypotheses. It reveals that a hypothesis or theory clashes with reality.

  18. Falsifiability

    Falsifiability. Falsifiability is an important feature of science. It is the principle that a proposition or theory could only be considered scientific if in principle it was possible to establish it as false. One of the criticisms of some branches of psychology, e.g. Freud's theory, is that they lack falsifiability.

  19. Falsification (SOCIAL PSYCHOLOGY)

    Falsification Definition. One cannot prove whether a theory or hypothesis is true. One can only prove that it is false, a process called falsification. Falsification is a tool that distinguishes scientific social psychology from folk social psychology, which does not use the process of falsification.

  20. Replication, falsification, and the crisis of confidence in social

    Replication, falsification, and auxiliary assumptions. Brandt et al.'s "replication recipe" provides a vital tool for researchers seeking to conduct high quality replications.In this section, we offer an additional "ingredient" to the discussion, by highlighting the role of auxiliary assumptions in increasing replication informativeness, specifically as these pertain to the ...

  21. Objectivity & The Empirical Method; Replicability and Falsifiability

    Falsifiability is the ability of a study or theory to be found to be wrong i.e. false which means that scientific methods can be used to test the theory/hypothesis to see if it is indeed wrong (which is why significance testing is based on either rejecting or accepting the null hypothesis)

  22. What is FALSIFIABILITY? definition of ...

    was first argued by Austria-born British philosopher Karl Popper (1902 - 1994) as one of the staple canons of the general idea surrounding a science. If a concept can be disproved or proven incorrect, it is falsifiable. Cite this page: N., Sam M.S., "FALSIFIABILITY," in PsychologyDictionary.org, May 11, 2013, https://psychologydictionary.org ...

  23. hypothesis testing

    Psychology: the Core Concepts says. Psychology differs from the pseudosciences in that it employs the scientific method to test its ideas empirically. The scientific method relies on testable theories and falsifiable hypotheses. Do testability and falsifiability have statistical definitions?